public class JaroWinklerMeasure extends StringMeasure implements ITrieFilterableStringMeasure
To overcome the complexity O(n*m) for non matching cases a filter is added. Given a threshold it can identify pairs whose Jaro-Winkler proximity is confidently less than or equal to that threshold.
| Modifier and Type | Field and Description |
|---|---|
static double |
winklerBoostThreshold |
| Constructor and Description |
|---|
JaroWinklerMeasure() |
JaroWinklerMeasure(boolean uppercaseOn,
boolean longStringsOn,
boolean characterSimilarityOn) |
| Modifier and Type | Method and Description |
|---|---|
double |
characterFrequencyUpperBound(int l1,
int l2,
int m) |
int |
characterMatchLowerBound(int l1,
int l2,
double threshold) |
JaroWinklerMeasure |
clone()
Clone method for parallel execution
|
boolean |
computableViaOverlap()
Returns true if this similarity function can be computed just via the
getSimilarity(overlag, lengthA, lengthB)
|
int |
getAlpha(int xTokensNumber,
int yTokensNumber,
double threshold)
Threshold for the positional filtering
|
char[] |
getArrayRepresentation(String s) |
int |
getMidLength(int tokensNumber,
double threshold)
Theshold for the length of the tokens to be indexed
|
String |
getName()
Returns name of a measure.
|
LinkedList<org.apache.commons.lang3.tuple.ImmutableTriple<Integer,Integer,Integer>> |
getPartitionBounds(int maxSize,
double threshold) |
int |
getPrefixLength(int tokensNumber,
double threshold)
Length of prefix to consider when mapping the input string with other
strings.
|
double |
getRuntimeApproximation(double mappingSize)
Returns the runtime approximation of a measure.
|
double |
getSimilarity(Instance instance1,
Instance instance2,
String property1,
String property2)
Returns the similarity between two instances, given their corresponding
properties.
|
double |
getSimilarity(int overlap,
int lengthA,
int lengthB)
Returns the similarity of two strings given their length and the overlap.
|
double |
getSimilarity(Object object1,
Object object2)
Returns the similarity between two objects.
|
double |
getSizeFilteringThreshold(int tokensNumber,
double threshold) |
String |
getType()
Returns type of a measure.
|
int |
lengthLowerBound(int l1,
double threshold) |
int |
lengthUpperBound(int l1,
double threshold) |
double |
proximity(char[] yin,
char[] yang)
Calculate the proximity of two input strings if proximity is assured to
be over given threshold threshold.
|
double |
proximity(String yi,
String ya)
Calculate the proximity of two input strings if proximity is assured to
be over given threshold threshold.
|
public JaroWinklerMeasure()
public JaroWinklerMeasure(boolean uppercaseOn,
boolean longStringsOn,
boolean characterSimilarityOn)
public JaroWinklerMeasure clone()
public double proximity(char[] yin,
char[] yang)
yin - string to align onyang - string to align onpublic double proximity(String yi, String ya)
proximity in interface ITrieFilterableStringMeasureyi - string to be alignedya - string to align onpublic char[] getArrayRepresentation(String s)
public double characterFrequencyUpperBound(int l1,
int l2,
int m)
characterFrequencyUpperBound in interface ITrieFilterableStringMeasurepublic int characterMatchLowerBound(int l1,
int l2,
double threshold)
characterMatchLowerBound in interface ITrieFilterableStringMeasurepublic int lengthUpperBound(int l1,
double threshold)
lengthUpperBound in interface ITrieFilterableStringMeasurepublic int lengthLowerBound(int l1,
double threshold)
lengthLowerBound in interface ITrieFilterableStringMeasurepublic LinkedList<org.apache.commons.lang3.tuple.ImmutableTriple<Integer,Integer,Integer>> getPartitionBounds(int maxSize, double threshold)
getPartitionBounds in interface ITrieFilterableStringMeasurepublic int getPrefixLength(int tokensNumber,
double threshold)
IStringMeasuregetPrefixLength in interface IStringMeasuretokensNumber - Size of input string inthreshold - Similarity thresholdpublic int getMidLength(int tokensNumber,
double threshold)
IStringMeasuregetMidLength in interface IStringMeasuretokensNumber - Number of tokens of current inputthreshold - Similarity thresholdpublic double getSizeFilteringThreshold(int tokensNumber,
double threshold)
getSizeFilteringThreshold in interface IStringMeasurepublic int getAlpha(int xTokensNumber,
int yTokensNumber,
double threshold)
IStringMeasuregetAlpha in interface IStringMeasurexTokensNumber - Size of the first input stringyTokensNumber - Size of the first input stringthreshold - Similarity thresholdpublic double getSimilarity(int overlap,
int lengthA,
int lengthB)
IStringMeasuregetSimilarity in interface IStringMeasureoverlap - Overlap of strings A and BlengthA - Length of AlengthB - Length of Bpublic boolean computableViaOverlap()
IStringMeasurecomputableViaOverlap in interface IStringMeasurepublic double getSimilarity(Object object1, Object object2)
IMeasuregetSimilarity in interface IMeasurepublic String getType()
IMeasurepublic double getSimilarity(Instance instance1, Instance instance2, String property1, String property2)
IMeasuregetSimilarity in interface IMeasurepublic String getName()
IMeasurepublic double getRuntimeApproximation(double mappingSize)
IMeasuregetRuntimeApproximation in interface IMeasureCopyright © 2018. All rights reserved.