public class NewsDeMarkupRemovingSupplierDecorator extends AbstractDocumentSupplierDecorator
| Modifier and Type | Field and Description |
|---|---|
private dk.brics.automaton.RunAutomaton |
charAutomaton |
private dk.brics.automaton.RunAutomaton |
tagAutomaton |
private dk.brics.automaton.RunAutomaton |
tooltipAutomaton |
documentSource| Constructor and Description |
|---|
NewsDeMarkupRemovingSupplierDecorator(org.dice_research.topicmodeling.preprocessing.docsupplier.DocumentSupplier documentSource) |
| Modifier and Type | Method and Description |
|---|---|
String |
cleanText(String text) |
private void |
handleHtmlEncodedChar(StringBuilder cleanText,
String text,
int pos,
int length) |
private void |
handleHtmlTag(StringBuilder cleanText,
String text,
int pos,
int length) |
private void |
handleToolTip(StringBuilder cleanText,
StringBuilder toolTips,
String text,
int pos,
int length) |
org.dice_research.topicmodeling.utils.doc.Document |
prepareDocument(org.dice_research.topicmodeling.utils.doc.Document document) |
apply, getDecoratedDocumentSupplier, getNextDocument, setDecoratedDocumentSupplier, setDocumentStartIdclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitprivate dk.brics.automaton.RunAutomaton tagAutomaton
private dk.brics.automaton.RunAutomaton charAutomaton
private dk.brics.automaton.RunAutomaton tooltipAutomaton
public NewsDeMarkupRemovingSupplierDecorator(org.dice_research.topicmodeling.preprocessing.docsupplier.DocumentSupplier documentSource)
public org.dice_research.topicmodeling.utils.doc.Document prepareDocument(org.dice_research.topicmodeling.utils.doc.Document document)
prepareDocument in class AbstractDocumentSupplierDecoratorprivate void handleHtmlTag(StringBuilder cleanText, String text, int pos, int length)
private void handleHtmlEncodedChar(StringBuilder cleanText, String text, int pos, int length)
private void handleToolTip(StringBuilder cleanText, StringBuilder toolTips, String text, int pos, int length)
Copyright © 2015–2020. All rights reserved.