Class JaroWinklerMeasure
- java.lang.Object
-
- org.aksw.limes.core.measures.measure.AMeasure
-
- org.aksw.limes.core.measures.measure.string.StringMeasure
-
- org.aksw.limes.core.measures.measure.string.JaroWinklerMeasure
-
- All Implemented Interfaces:
IMeasure,IStringMeasure,ITrieFilterableStringMeasure
public class JaroWinklerMeasure extends StringMeasure implements ITrieFilterableStringMeasure
This class implements the Jaro-Winkler algorithm that was designed as a string subsequence alignment method for matching names in the US Census. It is thus optimized for relatively small sized strings of latin letters only. It provides all the features of the original C implementation by William E. Winkler, although the features that made it specific for name matching may be disabled.To overcome the complexity O(n*m) for non matching cases a filter is added. Given a threshold it can identify pairs whose Jaro-Winkler proximity is confidently less than or equal to that threshold.
- Author:
- Kevin Dreßler
-
-
Field Summary
Fields Modifier and Type Field Description static doublewinklerBoostThreshold
-
Constructor Summary
Constructors Constructor Description JaroWinklerMeasure()JaroWinklerMeasure(boolean uppercaseOn, boolean longStringsOn, boolean characterSimilarityOn)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description doublecharacterFrequencyUpperBound(int l1, int l2, int m)intcharacterMatchLowerBound(int l1, int l2, double threshold)JaroWinklerMeasureclone()Clone method for parallel executionbooleancomputableViaOverlap()Returns true if this similarity function can be computed just via the getSimilarity(overlag, lengthA, lengthB)intgetAlpha(int xTokensNumber, int yTokensNumber, double threshold)Threshold for the positional filteringchar[]getArrayRepresentation(String s)intgetMidLength(int tokensNumber, double threshold)Theshold for the length of the tokens to be indexedStringgetName()Returns name of a measure.LinkedList<org.apache.commons.lang3.tuple.ImmutableTriple<Integer,Integer,Integer>>getPartitionBounds(int maxSize, double threshold)intgetPrefixLength(int tokensNumber, double threshold)Length of prefix to consider when mapping the input string with other strings.doublegetRuntimeApproximation(double mappingSize)Returns the runtime approximation of a measure.doublegetSimilarity(int overlap, int lengthA, int lengthB)Returns the similarity of two strings given their length and the overlap.doublegetSimilarity(Object object1, Object object2)Returns the similarity between two objects.doublegetSimilarity(Instance instance1, Instance instance2, String property1, String property2)Returns the similarity between two instances, given their corresponding properties.doublegetSizeFilteringThreshold(int tokensNumber, double threshold)StringgetType()Returns type of a measure.intlengthLowerBound(int l1, double threshold)intlengthUpperBound(int l1, double threshold)doubleproximity(char[] yin, char[] yang)Calculate the proximity of two input strings if proximity is assured to be over given threshold threshold.doubleproximity(String yi, String ya)Calculate the proximity of two input strings if proximity is assured to be over given threshold threshold.
-
-
-
Method Detail
-
clone
public JaroWinklerMeasure clone()
Clone method for parallel execution
-
proximity
public double proximity(char[] yin, char[] yang)Calculate the proximity of two input strings if proximity is assured to be over given threshold threshold.- Parameters:
yin- string to align onyang- string to align on- Returns:
- similarity score (proximity)
-
proximity
public double proximity(String yi, String ya)
Calculate the proximity of two input strings if proximity is assured to be over given threshold threshold.- Specified by:
proximityin interfaceITrieFilterableStringMeasure- Parameters:
yi- string to be alignedya- string to align on- Returns:
- similarity score (proximity)
-
getArrayRepresentation
public char[] getArrayRepresentation(String s)
-
characterFrequencyUpperBound
public double characterFrequencyUpperBound(int l1, int l2, int m)- Specified by:
characterFrequencyUpperBoundin interfaceITrieFilterableStringMeasure
-
characterMatchLowerBound
public int characterMatchLowerBound(int l1, int l2, double threshold)- Specified by:
characterMatchLowerBoundin interfaceITrieFilterableStringMeasure
-
lengthUpperBound
public int lengthUpperBound(int l1, double threshold)- Specified by:
lengthUpperBoundin interfaceITrieFilterableStringMeasure
-
lengthLowerBound
public int lengthLowerBound(int l1, double threshold)- Specified by:
lengthLowerBoundin interfaceITrieFilterableStringMeasure
-
getPartitionBounds
public LinkedList<org.apache.commons.lang3.tuple.ImmutableTriple<Integer,Integer,Integer>> getPartitionBounds(int maxSize, double threshold)
- Specified by:
getPartitionBoundsin interfaceITrieFilterableStringMeasure
-
getPrefixLength
public int getPrefixLength(int tokensNumber, double threshold)Description copied from interface:IStringMeasureLength of prefix to consider when mapping the input string with other strings.- Specified by:
getPrefixLengthin interfaceIStringMeasure- Parameters:
tokensNumber- Size of input string inthreshold- Similarity threshold- Returns:
- Prefix length
-
getMidLength
public int getMidLength(int tokensNumber, double threshold)Description copied from interface:IStringMeasureTheshold for the length of the tokens to be indexed- Specified by:
getMidLengthin interfaceIStringMeasure- Parameters:
tokensNumber- Number of tokens of current inputthreshold- Similarity threshold- Returns:
- Length of tokens to be indexed
-
getSizeFilteringThreshold
public double getSizeFilteringThreshold(int tokensNumber, double threshold)- Specified by:
getSizeFilteringThresholdin interfaceIStringMeasure
-
getAlpha
public int getAlpha(int xTokensNumber, int yTokensNumber, double threshold)Description copied from interface:IStringMeasureThreshold for the positional filtering- Specified by:
getAlphain interfaceIStringMeasure- Parameters:
xTokensNumber- Size of the first input stringyTokensNumber- Size of the first input stringthreshold- Similarity threshold- Returns:
- Threshold for positional filtering
-
getSimilarity
public double getSimilarity(int overlap, int lengthA, int lengthB)Description copied from interface:IStringMeasureReturns the similarity of two strings given their length and the overlap. Useful when these values are known so that no computation of known values have to be carried out anew- Specified by:
getSimilarityin interfaceIStringMeasure- Parameters:
overlap- Overlap of strings A and BlengthA- Length of AlengthB- Length of B- Returns:
- Similarity of A and B
-
computableViaOverlap
public boolean computableViaOverlap()
Description copied from interface:IStringMeasureReturns true if this similarity function can be computed just via the getSimilarity(overlag, lengthA, lengthB)- Specified by:
computableViaOverlapin interfaceIStringMeasure- Returns:
- True if it's possible, else false;
-
getSimilarity
public double getSimilarity(Object object1, Object object2)
Description copied from interface:IMeasureReturns the similarity between two objects.- Specified by:
getSimilarityin interfaceIMeasure- Parameters:
object1- , the source objectobject2- , the target object- Returns:
- The similarity of the objects
-
getType
public String getType()
Description copied from interface:IMeasureReturns type of a measure.
-
getSimilarity
public double getSimilarity(Instance instance1, Instance instance2, String property1, String property2)
Description copied from interface:IMeasureReturns the similarity between two instances, given their corresponding properties.- Specified by:
getSimilarityin interfaceIMeasure- Parameters:
instance1- , the source instanceinstance2- , the target instanceproperty1- , the source propertyproperty2- , the target property- Returns:
- The similarity of the instances
-
getName
public String getName()
Description copied from interface:IMeasureReturns name of a measure.
-
getRuntimeApproximation
public double getRuntimeApproximation(double mappingSize)
Description copied from interface:IMeasureReturns the runtime approximation of a measure.- Specified by:
getRuntimeApproximationin interfaceIMeasure- Parameters:
mappingSize- , the mapping size returned by the measure- Returns:
- The runtime of the measure
-
-