Class WindowSupportingLuceneCorpusAdapter

java.lang.Object
org.aksw.palmetto.corpus.lucene.LuceneCorpusAdapter
org.aksw.palmetto.corpus.lucene.WindowSupportingLuceneCorpusAdapter
All Implemented Interfaces:
BooleanDocumentSupportingAdapter, CorpusAdapter, WindowSupportingAdapter

public class WindowSupportingLuceneCorpusAdapter extends LuceneCorpusAdapter implements WindowSupportingAdapter
  • Field Details

    • histogram

      protected int[][] histogram
    • docLengthFieldName

      protected String docLengthFieldName
    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • HISTOGRAM_FILE_SUFFIX

      public static final String HISTOGRAM_FILE_SUFFIX
      See Also:
      Constant Field Values
  • Constructor Details

    • WindowSupportingLuceneCorpusAdapter

      protected WindowSupportingLuceneCorpusAdapter(org.apache.lucene.index.DirectoryReader dirReader, org.apache.lucene.index.AtomicReader[] reader, org.apache.lucene.index.AtomicReaderContext[] contexts, String textFieldName, String docLengthFieldName, int[][] histogram)
  • Method Details

    • create

      public static WindowSupportingLuceneCorpusAdapter create(String indexPath, String textFieldName, String docLengthFieldName) throws org.apache.lucene.index.CorruptIndexException, IOException
      Throws:
      org.apache.lucene.index.CorruptIndexException
      IOException
    • getDocumentSizeHistogram

      public int[][] getDocumentSizeHistogram()
      Description copied from interface: WindowSupportingAdapter
      Returns the histogram of the document sizes of the corpus.
      Specified by:
      getDocumentSizeHistogram in interface WindowSupportingAdapter
      Returns:
      the histogram of the document sizes
    • requestWordPositionsInDocuments

      public com.carrotsearch.hppc.IntObjectOpenHashMap<com.carrotsearch.hppc.IntArrayList[]> requestWordPositionsInDocuments(String[] words, com.carrotsearch.hppc.IntIntOpenHashMap docLengths)
      Description copied from interface: WindowSupportingAdapter
      Returns the positions of the given words inside the corpus.
      Specified by:
      requestWordPositionsInDocuments in interface WindowSupportingAdapter
      Parameters:
      words - the words for which the positions inside the documents should be determined
      docLengths - empty int int map in which the document lengths and counts are inserted
      Returns:
      the positions of the given words inside the corpus
    • requestDocumentsWithWord

      protected void requestDocumentsWithWord(String word, com.carrotsearch.hppc.IntObjectOpenHashMap<com.carrotsearch.hppc.IntArrayList[]> positionsInDocs, com.carrotsearch.hppc.IntIntOpenHashMap docLengths, int wordId, int numberOfWords)