Interface BooleanDocumentSupportingAdapter

All Superinterfaces:
CorpusAdapter
All Known Implementing Classes:
LuceneCorpusAdapter, WindowSupportingLuceneCorpusAdapter

public interface BooleanDocumentSupportingAdapter extends CorpusAdapter
This is an interface for an adapter that makes boolean document word counts available. Note that this interface is used for boolean paragraph and boolean sentence probability estimation methods, too, since the difference between these methods relies in the preprocessing of the corpus.
Author:
m.roeder
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    getDocumentsWithWord​(String word, com.carrotsearch.hppc.IntArrayList documents)
    Determines the documents containing the given word.
    void
    getDocumentsWithWordAsSet​(String word, com.carrotsearch.hppc.IntOpenHashSet documents)
    Determines the documents containing the given word.
    void
    getDocumentsWithWords​(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,​com.carrotsearch.hppc.IntArrayList> wordDocMapping)
    Determines the documents containing the words used as key in the given map.
    void
    getDocumentsWithWordsAsSet​(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,​com.carrotsearch.hppc.IntOpenHashSet> wordDocMapping)
    Determines the documents containing the words used as key in the given map.
    int
    Returns the number of documents the corpus contains.

    Methods inherited from interface org.aksw.palmetto.corpus.CorpusAdapter

    close
  • Method Details

    • getDocumentsWithWordsAsSet

      void getDocumentsWithWordsAsSet(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,​com.carrotsearch.hppc.IntOpenHashSet> wordDocMapping)
      Determines the documents containing the words used as key in the given map. The resulting sets contain the ids of the documents and are inserted into the map.
      Parameters:
      wordDocMapping - a mapping of words to documents in which the results are stored
    • getDocumentsWithWordAsSet

      void getDocumentsWithWordAsSet(String word, com.carrotsearch.hppc.IntOpenHashSet documents)
      Determines the documents containing the given word. The ids of the found documents are inserted into the given set.
      Parameters:
      word - the word which should be searched
      documents - the set in which the document ids will be stored
    • getDocumentsWithWords

      void getDocumentsWithWords(com.carrotsearch.hppc.ObjectObjectOpenHashMap<String,​com.carrotsearch.hppc.IntArrayList> wordDocMapping)
      Determines the documents containing the words used as key in the given map. The resulting int arrays contain the ids of the documents and are inserted into the map.
      Parameters:
      wordDocMapping - a mapping of words to documents in which the results are stored
    • getDocumentsWithWord

      void getDocumentsWithWord(String word, com.carrotsearch.hppc.IntArrayList documents)
      Determines the documents containing the given word. The ids of the found documents are appended into the given list.
      Parameters:
      word - the word which should be searched
      documents - the list to the document ids will be added
    • getNumberOfDocuments

      int getNumberOfDocuments()
      Returns the number of documents the corpus contains.
      Returns:
      the number of documents