Class MalletLdaWrapper
java.lang.Object
org.dice_research.topicmodeling.algorithm.mallet.MalletLdaWrapper
- All Implemented Interfaces:
Serializable,org.dice_research.topicmodeling.algorithms.ModelingAlgorithm,org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier,org.dice_research.topicmodeling.algorithms.VocabularyContaining,org.dice_research.topicmodeling.preprocessing.PreprocessorFactory
public class MalletLdaWrapper
extends Object
implements org.dice_research.topicmodeling.algorithms.ModelingAlgorithm, org.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
- See Also:
- Serialized Form
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected MalletAlphabetWrapperprivate static org.slf4j.Loggerprivate static Stringprivate Stringprotected longprivate static longprotected MalletLdaWrapper.MalletLDATopicModelerprotected org.dice_research.topicmodeling.algorithms.WordCounter -
Constructor Summary
ConstructorsConstructorDescriptionMalletLdaWrapper(int numberOfTopics)MalletLdaWrapper(int numberOfTopics, double alphaSum, double beta)MalletLdaWrapper(int numberOfTopics, double alphaSum, double beta, long seed)MalletLdaWrapper(int numberOfTopics, long seed)MalletLdaWrapper(cc.mallet.types.LabelAlphabet topicAlphabet, double alphaSum, double beta)MalletLdaWrapper(cc.mallet.types.LabelAlphabet topicAlphabet, double alphaSum, double beta, long seed) -
Method Summary
Modifier and TypeMethodDescriptionprotected cc.mallet.types.InstancecreateInstanceFromDocument(org.dice_research.topicmodeling.utils.doc.Document document, cc.mallet.types.Alphabet alphabet)protected voidcreateMultipleSpellingVocabulary(org.dice_research.topicmodeling.utils.corpus.DocumentListCorpus<?> corpus)org.dice_research.topicmodeling.preprocessing.PreprocessorcreatePreprocessor(org.dice_research.topicmodeling.preprocessing.docsupplier.DocumentSupplier supplier, org.dice_research.topicmodeling.lang.Language lang)private voiddirectInitialization(org.dice_research.topicmodeling.utils.corpus.Corpus corpus, org.dice_research.topicmodeling.utils.vocabulary.Vocabulary vocabulary)cc.mallet.types.FeatureSequencegetDocumentAsFeatureSequence(int documentId)org.dice_research.topicmodeling.algorithms.ModelgetModel()intintintlonggetSeed()protected String[]getTokenizedTermsAsText(org.dice_research.topicmodeling.utils.corpus.DocumentListCorpus<?> corpus)org.dice_research.topicmodeling.utils.vocabulary.Vocabularyorg.dice_research.topicmodeling.algorithms.WordCounterint[]getWordsOfDocument(int documentId)int[]getWordTopicAssignmentForDocument(int documentId)voidinitialize(org.dice_research.topicmodeling.utils.corpus.Corpus corpus)voidvoidsetMalletRegexToken(String malletRegexToken)voidsetOptimizeInterval(int interval)Set to 0 to turn optimization off.
-
Field Details
-
serialVersionUID
private static final long serialVersionUID- See Also:
- Constant Field Values
-
LOGGER
private static final org.slf4j.Logger LOGGER -
MALLET_REGEX_TOKEN
- See Also:
- Constant Field Values
-
topicModel
-
alphabet
-
seed
protected long seed -
wordCounter
protected transient org.dice_research.topicmodeling.algorithms.WordCounter wordCounter -
malletRegexToken
-
-
Constructor Details
-
MalletLdaWrapper
public MalletLdaWrapper(int numberOfTopics) -
MalletLdaWrapper
public MalletLdaWrapper(int numberOfTopics, long seed) -
MalletLdaWrapper
public MalletLdaWrapper(int numberOfTopics, double alphaSum, double beta) -
MalletLdaWrapper
public MalletLdaWrapper(int numberOfTopics, double alphaSum, double beta, long seed) -
MalletLdaWrapper
public MalletLdaWrapper(cc.mallet.types.LabelAlphabet topicAlphabet, double alphaSum, double beta) -
MalletLdaWrapper
public MalletLdaWrapper(cc.mallet.types.LabelAlphabet topicAlphabet, double alphaSum, double beta, long seed)
-
-
Method Details
-
createPreprocessor
public org.dice_research.topicmodeling.preprocessing.Preprocessor createPreprocessor(org.dice_research.topicmodeling.preprocessing.docsupplier.DocumentSupplier supplier, org.dice_research.topicmodeling.lang.Language lang)- Specified by:
createPreprocessorin interfaceorg.dice_research.topicmodeling.preprocessing.PreprocessorFactory
-
initialize
public void initialize(org.dice_research.topicmodeling.utils.corpus.Corpus corpus)- Specified by:
initializein interfaceorg.dice_research.topicmodeling.algorithms.ModelingAlgorithm
-
getTokenizedTermsAsText
protected String[] getTokenizedTermsAsText(org.dice_research.topicmodeling.utils.corpus.DocumentListCorpus<?> corpus) -
directInitialization
private void directInitialization(org.dice_research.topicmodeling.utils.corpus.Corpus corpus, org.dice_research.topicmodeling.utils.vocabulary.Vocabulary vocabulary) -
createInstanceFromDocument
protected cc.mallet.types.Instance createInstanceFromDocument(org.dice_research.topicmodeling.utils.doc.Document document, cc.mallet.types.Alphabet alphabet) -
createMultipleSpellingVocabulary
protected void createMultipleSpellingVocabulary(org.dice_research.topicmodeling.utils.corpus.DocumentListCorpus<?> corpus) -
performNextStep
public void performNextStep()- Specified by:
performNextStepin interfaceorg.dice_research.topicmodeling.algorithms.ModelingAlgorithm
-
getModel
public org.dice_research.topicmodeling.algorithms.Model getModel()- Specified by:
getModelin interfaceorg.dice_research.topicmodeling.algorithms.ModelingAlgorithm
-
getWordTopicAssignmentForDocument
public int[] getWordTopicAssignmentForDocument(int documentId)- Specified by:
getWordTopicAssignmentForDocumentin interfaceorg.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
-
getWordsOfDocument
public int[] getWordsOfDocument(int documentId)- Specified by:
getWordsOfDocumentin interfaceorg.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
-
getDocumentAsFeatureSequence
public cc.mallet.types.FeatureSequence getDocumentAsFeatureSequence(int documentId) -
getNumberOfTopics
public int getNumberOfTopics()- Specified by:
getNumberOfTopicsin interfaceorg.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
-
getNumberOfDocuments
public int getNumberOfDocuments()- Specified by:
getNumberOfDocumentsin interfaceorg.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
-
getNumberOfWords
public int getNumberOfWords()- Specified by:
getNumberOfWordsin interfaceorg.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
-
getVocabulary
public org.dice_research.topicmodeling.utils.vocabulary.Vocabulary getVocabulary()- Specified by:
getVocabularyin interfaceorg.dice_research.topicmodeling.algorithms.VocabularyContaining
-
getSeed
public long getSeed()- Specified by:
getSeedin interfaceorg.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
-
getWordCounts
public org.dice_research.topicmodeling.algorithms.WordCounter getWordCounts()- Specified by:
getWordCountsin interfaceorg.dice_research.topicmodeling.algorithms.ProbTopicModelingAlgorithmStateSupplier
-
setMalletRegexToken
-
setOptimizeInterval
public void setOptimizeInterval(int interval)Set to 0 to turn optimization off.- Parameters:
interval-
-