Package org.aksw.palmetto.corpus.lucene
Class LuceneCorpusAdapter
java.lang.Object
org.aksw.palmetto.corpus.lucene.LuceneCorpusAdapter
- All Implemented Interfaces:
BooleanDocumentSupportingAdapter,CorpusAdapter
- Direct Known Subclasses:
WindowSupportingLuceneCorpusAdapter
This class can make usage of a given Lucene index as corpus.
- Author:
- m.roeder
-
Field Summary
Fields -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedLuceneCorpusAdapter(org.apache.lucene.index.DirectoryReader dirReader, org.apache.lucene.index.AtomicReader[] reader, org.apache.lucene.index.AtomicReaderContext[] contexts, String fieldName) -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Closes the Lucene index.static LuceneCorpusAdapterCreates a corpus adapter which uses the Lucene index with the given path and searches on the field with the given field name.voidgetDocumentsWithWord(String word, com.carrotsearch.hppc.IntArrayList documents)Determines the documents containing the given word.voidgetDocumentsWithWordAsSet(String word, com.carrotsearch.hppc.IntOpenHashSet documents)Determines the documents containing the given word.voidgetDocumentsWithWords(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,com.carrotsearch.hppc.IntArrayList> wordDocMapping)Determines the documents containing the words used as key in the given map.voidgetDocumentsWithWordsAsSet(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,com.carrotsearch.hppc.IntOpenHashSet> wordDocMapping)Determines the documents containing the words used as key in the given map.intReturns the number of documents the corpus contains.
-
Field Details
-
LOGGER
private static final org.slf4j.Logger LOGGER -
fieldName
-
dirReader
protected org.apache.lucene.index.DirectoryReader dirReader -
reader
protected org.apache.lucene.index.AtomicReader[] reader -
contexts
protected org.apache.lucene.index.AtomicReaderContext[] contexts
-
-
Constructor Details
-
LuceneCorpusAdapter
protected LuceneCorpusAdapter(org.apache.lucene.index.DirectoryReader dirReader, org.apache.lucene.index.AtomicReader[] reader, org.apache.lucene.index.AtomicReaderContext[] contexts, String fieldName)
-
-
Method Details
-
create
public static LuceneCorpusAdapter create(String indexPath, String fieldName) throws org.apache.lucene.index.CorruptIndexException, IOExceptionCreates a corpus adapter which uses the Lucene index with the given path and searches on the field with the given field name.- Parameters:
indexPath-fieldName-- Returns:
- Throws:
org.apache.lucene.index.CorruptIndexExceptionIOException
-
getDocumentsWithWordAsSet
Description copied from interface:BooleanDocumentSupportingAdapterDetermines the documents containing the given word. The ids of the found documents are inserted into the given set.- Specified by:
getDocumentsWithWordAsSetin interfaceBooleanDocumentSupportingAdapter- Parameters:
word- the word which should be searcheddocuments- the set in which the document ids will be stored
-
close
public void close()Closes the Lucene index.- Specified by:
closein interfaceCorpusAdapter
-
getNumberOfDocuments
public int getNumberOfDocuments()Description copied from interface:BooleanDocumentSupportingAdapterReturns the number of documents the corpus contains.- Specified by:
getNumberOfDocumentsin interfaceBooleanDocumentSupportingAdapter- Returns:
- the number of documents
-
getDocumentsWithWordsAsSet
public void getDocumentsWithWordsAsSet(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,com.carrotsearch.hppc.IntOpenHashSet> wordDocMapping)Description copied from interface:BooleanDocumentSupportingAdapterDetermines the documents containing the words used as key in the given map. The resulting sets contain the ids of the documents and are inserted into the map.- Specified by:
getDocumentsWithWordsAsSetin interfaceBooleanDocumentSupportingAdapter- Parameters:
wordDocMapping- a mapping of words to documents in which the results are stored
-
getDocumentsWithWords
public void getDocumentsWithWords(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,com.carrotsearch.hppc.IntArrayList> wordDocMapping)Description copied from interface:BooleanDocumentSupportingAdapterDetermines the documents containing the words used as key in the given map. The resulting int arrays contain the ids of the documents and are inserted into the map.- Specified by:
getDocumentsWithWordsin interfaceBooleanDocumentSupportingAdapter- Parameters:
wordDocMapping- a mapping of words to documents in which the results are stored
-
getDocumentsWithWord
Description copied from interface:BooleanDocumentSupportingAdapterDetermines the documents containing the given word. The ids of the found documents are appended into the given list.- Specified by:
getDocumentsWithWordin interfaceBooleanDocumentSupportingAdapter- Parameters:
word- the word which should be searcheddocuments- the list to the document ids will be added
-