Class EuclideanBlockingModule

  • All Implemented Interfaces:
    IBlockingModule

    public class EuclideanBlockingModule
    extends Object
    implements IBlockingModule
    Author:
    Axel-C. Ngonga Ngomo (ngonga@informatik.uni-leipzig.de)
    • Constructor Detail

      • EuclideanBlockingModule

        public EuclideanBlockingModule​(String props,
                                       String measureName,
                                       double threshold)
        Initializes the generator. The basic idea here is the following: First, pick a random instance origin. That is the center upon which the block ids will be computed. Each measure can return the threshold for blocking that is equivalent to the similarity threshold given in by the user. For euclidean metrics, this value is the same. Yet, for metrics that squeeze space, this might not be the case. It is important to notice that the generation assumes that the size of props.split("|") is the same as dimensions.
        Parameters:
        props - List of properties that make up each dimension
        measureName - Name of the measure to be used to compute the similarity of instances
        threshold - General similarity threshold for the metric. This threshold is transformed into a distance threshold, as sim = a, d = (1 - a)/a. The space tiling is carried out according to distances, not similarities. Still, we can ensure that all points within the similarity range are found.