Class LuceneCorpusAdapter

java.lang.Object
org.aksw.palmetto.corpus.lucene.LuceneCorpusAdapter
All Implemented Interfaces:
BooleanDocumentSupportingAdapter, CorpusAdapter
Direct Known Subclasses:
WindowSupportingLuceneCorpusAdapter

public class LuceneCorpusAdapter extends Object implements BooleanDocumentSupportingAdapter
This class can make usage of a given Lucene index as corpus.
Author:
m.roeder
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected org.apache.lucene.index.AtomicReaderContext[]
     
    protected org.apache.lucene.index.DirectoryReader
     
    protected String
     
    private static org.slf4j.Logger
     
    protected org.apache.lucene.index.AtomicReader[]
     
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    protected
    LuceneCorpusAdapter​(org.apache.lucene.index.DirectoryReader dirReader, org.apache.lucene.index.AtomicReader[] reader, org.apache.lucene.index.AtomicReaderContext[] contexts, String fieldName)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Closes the Lucene index.
    create​(String indexPath, String fieldName)
    Creates a corpus adapter which uses the Lucene index with the given path and searches on the field with the given field name.
    void
    getDocumentsWithWord​(String word, com.carrotsearch.hppc.IntArrayList documents)
    Determines the documents containing the given word.
    void
    getDocumentsWithWordAsSet​(String word, com.carrotsearch.hppc.IntOpenHashSet documents)
    Determines the documents containing the given word.
    void
    getDocumentsWithWords​(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,​com.carrotsearch.hppc.IntArrayList> wordDocMapping)
    Determines the documents containing the words used as key in the given map.
    void
    getDocumentsWithWordsAsSet​(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,​com.carrotsearch.hppc.IntOpenHashSet> wordDocMapping)
    Determines the documents containing the words used as key in the given map.
    int
    Returns the number of documents the corpus contains.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • fieldName

      protected String fieldName
    • dirReader

      protected org.apache.lucene.index.DirectoryReader dirReader
    • reader

      protected org.apache.lucene.index.AtomicReader[] reader
    • contexts

      protected org.apache.lucene.index.AtomicReaderContext[] contexts
  • Constructor Details

    • LuceneCorpusAdapter

      protected LuceneCorpusAdapter(org.apache.lucene.index.DirectoryReader dirReader, org.apache.lucene.index.AtomicReader[] reader, org.apache.lucene.index.AtomicReaderContext[] contexts, String fieldName)
  • Method Details

    • create

      public static LuceneCorpusAdapter create(String indexPath, String fieldName) throws org.apache.lucene.index.CorruptIndexException, IOException
      Creates a corpus adapter which uses the Lucene index with the given path and searches on the field with the given field name.
      Parameters:
      indexPath -
      fieldName -
      Returns:
      Throws:
      org.apache.lucene.index.CorruptIndexException
      IOException
    • getDocumentsWithWordAsSet

      public void getDocumentsWithWordAsSet(String word, com.carrotsearch.hppc.IntOpenHashSet documents)
      Description copied from interface: BooleanDocumentSupportingAdapter
      Determines the documents containing the given word. The ids of the found documents are inserted into the given set.
      Specified by:
      getDocumentsWithWordAsSet in interface BooleanDocumentSupportingAdapter
      Parameters:
      word - the word which should be searched
      documents - the set in which the document ids will be stored
    • close

      public void close()
      Closes the Lucene index.
      Specified by:
      close in interface CorpusAdapter
    • getNumberOfDocuments

      public int getNumberOfDocuments()
      Description copied from interface: BooleanDocumentSupportingAdapter
      Returns the number of documents the corpus contains.
      Specified by:
      getNumberOfDocuments in interface BooleanDocumentSupportingAdapter
      Returns:
      the number of documents
    • getDocumentsWithWordsAsSet

      public void getDocumentsWithWordsAsSet(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,​com.carrotsearch.hppc.IntOpenHashSet> wordDocMapping)
      Description copied from interface: BooleanDocumentSupportingAdapter
      Determines the documents containing the words used as key in the given map. The resulting sets contain the ids of the documents and are inserted into the map.
      Specified by:
      getDocumentsWithWordsAsSet in interface BooleanDocumentSupportingAdapter
      Parameters:
      wordDocMapping - a mapping of words to documents in which the results are stored
    • getDocumentsWithWords

      public void getDocumentsWithWords(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,​com.carrotsearch.hppc.IntArrayList> wordDocMapping)
      Description copied from interface: BooleanDocumentSupportingAdapter
      Determines the documents containing the words used as key in the given map. The resulting int arrays contain the ids of the documents and are inserted into the map.
      Specified by:
      getDocumentsWithWords in interface BooleanDocumentSupportingAdapter
      Parameters:
      wordDocMapping - a mapping of words to documents in which the results are stored
    • getDocumentsWithWord

      public void getDocumentsWithWord(String word, com.carrotsearch.hppc.IntArrayList documents)
      Description copied from interface: BooleanDocumentSupportingAdapter
      Determines the documents containing the given word. The ids of the found documents are appended into the given list.
      Specified by:
      getDocumentsWithWord in interface BooleanDocumentSupportingAdapter
      Parameters:
      word - the word which should be searched
      documents - the list to the document ids will be added