Class BlockIdGenerator


  • public class BlockIdGenerator
    extends Object
    Version:
    Jul 16, 2016
    Author:
    Axel-C. Ngonga Ngomo (ngonga@informatik.uni-leipzig.de), Mohamed Sherif (sherif@informatik.uni-leipzig.de)
    • Constructor Detail

      • BlockIdGenerator

        public BlockIdGenerator​(String props,
                                String measureName,
                                double threshold)
        Initializes the generator. The basic idea here is the following: First, pick a random instance origin. That is the center upon which the block ids will be computed. Each measure can return the threshold for blocking that is equivalent to the similarity threshold given in by the user. For euclidean metrics, this value is the same. Yet, for metrics that squeeze space, this might not be the case. It is important to notice that the generation assumes that the size of props.split("|") is the same as dimensions.
        Parameters:
        props - List of properties that make up each dimension
        measureName - Name of the measure to be used to compute the similarity of instances
        threshold - General similarity threshold for blocking
    • Method Detail

      • getBlocksToCompare

        public static ArrayList<ArrayList<Integer>> getBlocksToCompare​(ArrayList<Integer> blockId)
        Computes the ids of all the blocks surrounding a given block for comparison Will be extremely useful for parallelizing as we can use blocking on T and S as then put use locality
        Parameters:
        blockId -
        Returns:
        the ids of all the blocks surrounding a given block for comparison
      • getBlockId

        public ArrayList<Integer> getBlockId​(Instance a)
        Computes the block ID for a given instance a. The idea behind the blocking is to tile the target space into blocks of dimension thresdhold^dimensions. Each instance s from the source space is then compared with the blocks lying directly around s's block and the block where s is.
        Parameters:
        a - The instance whose blockId is to be computed
        Returns:
        The ID for the block of a