Package org.aksw.palmetto.corpus.lucene
Class WindowSupportingLuceneCorpusAdapter
- java.lang.Object
-
- org.aksw.palmetto.corpus.lucene.LuceneCorpusAdapter
-
- org.aksw.palmetto.corpus.lucene.WindowSupportingLuceneCorpusAdapter
-
- All Implemented Interfaces:
BooleanDocumentSupportingAdapter,CorpusAdapter,WindowSupportingAdapter
- Direct Known Subclasses:
CachingWindowSupportingLuceneCorpusAdapter
public class WindowSupportingLuceneCorpusAdapter extends LuceneCorpusAdapter implements WindowSupportingAdapter
-
-
Field Summary
Fields Modifier and Type Field Description protected StringdocLengthFieldNameprotected int[][]histogramstatic StringHISTOGRAM_FILE_SUFFIXprivate static org.slf4j.LoggerLOGGER-
Fields inherited from class org.aksw.palmetto.corpus.lucene.LuceneCorpusAdapter
contexts, dirReader, fieldName, reader
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedWindowSupportingLuceneCorpusAdapter(org.apache.lucene.index.DirectoryReader dirReader, org.apache.lucene.index.AtomicReader[] reader, org.apache.lucene.index.AtomicReaderContext[] contexts, String textFieldName, String docLengthFieldName, int[][] histogram)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidaddDocLength(com.carrotsearch.hppc.IntIntOpenHashMap docLengths, int globalDocId, int localDocId, org.apache.lucene.index.AtomicReader reader)static WindowSupportingLuceneCorpusAdaptercreate(String indexPath, String textFieldName, String docLengthFieldName)protected voidgatherWordPositions(org.apache.lucene.index.DocsAndPositionsEnum docPosEnum, com.carrotsearch.hppc.IntArrayList positions)int[][]getDocumentSizeHistogram()Returns the histogram of the document sizes of the corpus.protected voidrequestDocumentsWithWord(String word, com.carrotsearch.hppc.IntObjectOpenHashMap<com.carrotsearch.hppc.IntArrayList[]> positionsInDocs, com.carrotsearch.hppc.IntIntOpenHashMap docLengths, int wordId, int numberOfWords)com.carrotsearch.hppc.IntObjectOpenHashMap<com.carrotsearch.hppc.IntArrayList[]>requestWordPositionsInDocuments(String[] words, com.carrotsearch.hppc.IntIntOpenHashMap docLengths)Returns the positions of the given words inside the corpus.-
Methods inherited from class org.aksw.palmetto.corpus.lucene.LuceneCorpusAdapter
close, create, getDocumentsWithWord, getDocumentsWithWordAsSet, getDocumentsWithWords, getDocumentsWithWordsAsSet, getNumberOfDocuments
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.aksw.palmetto.corpus.CorpusAdapter
close
-
-
-
-
Field Detail
-
histogram
protected int[][] histogram
-
docLengthFieldName
protected String docLengthFieldName
-
LOGGER
private static final org.slf4j.Logger LOGGER
-
HISTOGRAM_FILE_SUFFIX
public static final String HISTOGRAM_FILE_SUFFIX
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
WindowSupportingLuceneCorpusAdapter
protected WindowSupportingLuceneCorpusAdapter(org.apache.lucene.index.DirectoryReader dirReader, org.apache.lucene.index.AtomicReader[] reader, org.apache.lucene.index.AtomicReaderContext[] contexts, String textFieldName, String docLengthFieldName, int[][] histogram)
-
-
Method Detail
-
create
public static WindowSupportingLuceneCorpusAdapter create(String indexPath, String textFieldName, String docLengthFieldName) throws org.apache.lucene.index.CorruptIndexException, IOException
- Throws:
org.apache.lucene.index.CorruptIndexExceptionIOException
-
getDocumentSizeHistogram
public int[][] getDocumentSizeHistogram()
Description copied from interface:WindowSupportingAdapterReturns the histogram of the document sizes of the corpus.- Specified by:
getDocumentSizeHistogramin interfaceWindowSupportingAdapter- Returns:
- the histogram of the document sizes
-
requestWordPositionsInDocuments
public com.carrotsearch.hppc.IntObjectOpenHashMap<com.carrotsearch.hppc.IntArrayList[]> requestWordPositionsInDocuments(String[] words, com.carrotsearch.hppc.IntIntOpenHashMap docLengths)
Description copied from interface:WindowSupportingAdapterReturns the positions of the given words inside the corpus.- Specified by:
requestWordPositionsInDocumentsin interfaceWindowSupportingAdapter- Parameters:
words- the words for which the positions inside the documents should be determineddocLengths- empty int int map in which the document lengths and counts are inserted- Returns:
- the positions of the given words inside the corpus
-
requestDocumentsWithWord
protected void requestDocumentsWithWord(String word, com.carrotsearch.hppc.IntObjectOpenHashMap<com.carrotsearch.hppc.IntArrayList[]> positionsInDocs, com.carrotsearch.hppc.IntIntOpenHashMap docLengths, int wordId, int numberOfWords)
-
gatherWordPositions
protected void gatherWordPositions(org.apache.lucene.index.DocsAndPositionsEnum docPosEnum, com.carrotsearch.hppc.IntArrayList positions) throws IOException- Throws:
IOException
-
addDocLength
protected void addDocLength(com.carrotsearch.hppc.IntIntOpenHashMap docLengths, int globalDocId, int localDocId, org.apache.lucene.index.AtomicReader reader) throws IOException- Throws:
IOException
-
-