Class MalletLdaWrapper

java.lang.Object
org.dice_research.topicmodeling.algorithm.mallet.MalletLdaWrapper
All Implemented Interfaces:
Serializable, org.dice_research.topicmodeling.algorithms.ModelingAlgorithm, org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier, org.dice_research.topicmodeling.algorithms.VocabularyContaining, org.dice_research.topicmodeling.preprocessing.PreprocessorFactory

public class MalletLdaWrapper extends Object implements org.dice_research.topicmodeling.algorithms.ModelingAlgorithm, org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
See Also:
Serialized Form
  • Field Details

  • Constructor Details

    • MalletLdaWrapper

      public MalletLdaWrapper(int numberOfTopics)
    • MalletLdaWrapper

      public MalletLdaWrapper(int numberOfTopics, long seed)
    • MalletLdaWrapper

      public MalletLdaWrapper(int numberOfTopics, double alphaSum, double beta)
    • MalletLdaWrapper

      public MalletLdaWrapper(int numberOfTopics, double alphaSum, double beta, long seed)
    • MalletLdaWrapper

      public MalletLdaWrapper(cc.mallet.types.LabelAlphabet topicAlphabet, double alphaSum, double beta)
    • MalletLdaWrapper

      public MalletLdaWrapper(cc.mallet.types.LabelAlphabet topicAlphabet, double alphaSum, double beta, long seed)
  • Method Details

    • createPreprocessor

      public org.dice_research.topicmodeling.preprocessing.Preprocessor createPreprocessor(org.dice_research.topicmodeling.preprocessing.docsupplier.DocumentSupplier supplier, org.dice_research.topicmodeling.lang.Language lang)
      Specified by:
      createPreprocessor in interface org.dice_research.topicmodeling.preprocessing.PreprocessorFactory
    • initialize

      public void initialize(org.dice_research.topicmodeling.utils.corpus.Corpus corpus)
      Specified by:
      initialize in interface org.dice_research.topicmodeling.algorithms.ModelingAlgorithm
    • getTokenizedTermsAsText

      protected String[] getTokenizedTermsAsText(org.dice_research.topicmodeling.utils.corpus.DocumentListCorpus<?> corpus)
    • directInitialization

      private void directInitialization(org.dice_research.topicmodeling.utils.corpus.Corpus corpus, org.dice_research.topicmodeling.utils.vocabulary.Vocabulary vocabulary)
    • createInstanceFromDocument

      protected cc.mallet.types.Instance createInstanceFromDocument(org.dice_research.topicmodeling.utils.doc.Document document, cc.mallet.types.Alphabet alphabet)
    • createMultipleSpellingVocabulary

      protected void createMultipleSpellingVocabulary(org.dice_research.topicmodeling.utils.corpus.DocumentListCorpus<?> corpus)
    • performNextStep

      public void performNextStep()
      Specified by:
      performNextStep in interface org.dice_research.topicmodeling.algorithms.ModelingAlgorithm
    • getModel

      public org.dice_research.topicmodeling.algorithms.Model getModel()
      Specified by:
      getModel in interface org.dice_research.topicmodeling.algorithms.ModelingAlgorithm
    • getWordTopicAssignmentForDocument

      public int[] getWordTopicAssignmentForDocument(int documentId)
      Specified by:
      getWordTopicAssignmentForDocument in interface org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
    • getWordsOfDocument

      public int[] getWordsOfDocument(int documentId)
      Specified by:
      getWordsOfDocument in interface org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
    • getDocumentAsFeatureSequence

      public cc.mallet.types.FeatureSequence getDocumentAsFeatureSequence(int documentId)
    • getNumberOfTopics

      public int getNumberOfTopics()
      Specified by:
      getNumberOfTopics in interface org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
    • getNumberOfDocuments

      public int getNumberOfDocuments()
      Specified by:
      getNumberOfDocuments in interface org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
    • getNumberOfWords

      public int getNumberOfWords()
      Specified by:
      getNumberOfWords in interface org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
    • getVocabulary

      public org.dice_research.topicmodeling.utils.vocabulary.Vocabulary getVocabulary()
      Specified by:
      getVocabulary in interface org.dice_research.topicmodeling.algorithms.VocabularyContaining
    • getSeed

      public long getSeed()
      Specified by:
      getSeed in interface org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
    • getWordCounts

      public org.dice_research.topicmodeling.algorithms.WordCounter getWordCounts()
      Specified by:
      getWordCounts in interface org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
    • setMalletRegexToken

      public void setMalletRegexToken(String malletRegexToken)
    • setOptimizeInterval

      public void setOptimizeInterval(int interval)
      Set to 0 to turn optimization off.
      Parameters:
      interval -