Package org.aksw.palmetto.corpus
Interface BooleanDocumentSupportingAdapter
-
- All Superinterfaces:
CorpusAdapter
- All Known Implementing Classes:
CachingWindowSupportingLuceneCorpusAdapter,LuceneCorpusAdapter,WindowSupportingLuceneCorpusAdapter
public interface BooleanDocumentSupportingAdapter extends CorpusAdapter
This is an interface for an adapter that makes boolean document word counts available. Note that this interface is used for boolean paragraph and boolean sentence probability estimation methods, too, since the difference between these methods relies in the preprocessing of the corpus.- Author:
- m.roeder
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description voidgetDocumentsWithWord(String word, com.carrotsearch.hppc.IntArrayList documents)Determines the documents containing the given word.voidgetDocumentsWithWordAsSet(String word, com.carrotsearch.hppc.IntOpenHashSet documents)Determines the documents containing the given word.voidgetDocumentsWithWords(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,com.carrotsearch.hppc.IntArrayList> wordDocMapping)Determines the documents containing the words used as key in the given map.voidgetDocumentsWithWordsAsSet(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,com.carrotsearch.hppc.IntOpenHashSet> wordDocMapping)Determines the documents containing the words used as key in the given map.intgetNumberOfDocuments()Returns the number of documents the corpus contains.-
Methods inherited from interface org.aksw.palmetto.corpus.CorpusAdapter
close
-
-
-
-
Method Detail
-
getDocumentsWithWordsAsSet
void getDocumentsWithWordsAsSet(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,com.carrotsearch.hppc.IntOpenHashSet> wordDocMapping)
Determines the documents containing the words used as key in the given map. The resulting sets contain the ids of the documents and are inserted into the map.- Parameters:
wordDocMapping- a mapping of words to documents in which the results are stored
-
getDocumentsWithWordAsSet
void getDocumentsWithWordAsSet(String word, com.carrotsearch.hppc.IntOpenHashSet documents)
Determines the documents containing the given word. The ids of the found documents are inserted into the given set.- Parameters:
word- the word which should be searcheddocuments- the set in which the document ids will be stored
-
getDocumentsWithWords
void getDocumentsWithWords(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,com.carrotsearch.hppc.IntArrayList> wordDocMapping)
Determines the documents containing the words used as key in the given map. The resulting int arrays contain the ids of the documents and are inserted into the map.- Parameters:
wordDocMapping- a mapping of words to documents in which the results are stored
-
getDocumentsWithWord
void getDocumentsWithWord(String word, com.carrotsearch.hppc.IntArrayList documents)
Determines the documents containing the given word. The ids of the found documents are appended into the given list.- Parameters:
word- the word which should be searcheddocuments- the list to the document ids will be added
-
getNumberOfDocuments
int getNumberOfDocuments()
Returns the number of documents the corpus contains.- Returns:
- the number of documents
-
-