public class HtmlCharsetExtractingSupplierDecorator extends AbstractDocumentSupplierDecorator
| Modifier and Type | Class and Description |
|---|---|
static class |
HtmlCharsetExtractingSupplierDecorator.StringWithCharset |
| Modifier and Type | Field and Description |
|---|---|
private static Charset |
DEFAULT_CHARSET |
private static org.slf4j.Logger |
LOGGER |
documentSource| Constructor and Description |
|---|
HtmlCharsetExtractingSupplierDecorator(org.dice_research.topicmodeling.preprocessing.docsupplier.DocumentSupplier documentSource) |
| Modifier and Type | Method and Description |
|---|---|
private HtmlCharsetExtractingSupplierDecorator.StringWithCharset |
checkEncoding(HtmlCharsetExtractingSupplierDecorator.StringWithCharset text) |
private Charset |
extractCharset(String html) |
private String |
extractCharsetFromMetaTag(String html,
int metaStart,
int metaEnd) |
private String |
extractLowercasedHead(String html) |
org.dice_research.topicmodeling.utils.doc.Document |
prepareDocument(org.dice_research.topicmodeling.utils.doc.Document document) |
apply, getDecoratedDocumentSupplier, getNextDocument, setDecoratedDocumentSupplier, setDocumentStartIdclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitprivate static final org.slf4j.Logger LOGGER
private static final Charset DEFAULT_CHARSET
public HtmlCharsetExtractingSupplierDecorator(org.dice_research.topicmodeling.preprocessing.docsupplier.DocumentSupplier documentSource)
public org.dice_research.topicmodeling.utils.doc.Document prepareDocument(org.dice_research.topicmodeling.utils.doc.Document document)
prepareDocument in class AbstractDocumentSupplierDecoratorprivate HtmlCharsetExtractingSupplierDecorator.StringWithCharset checkEncoding(HtmlCharsetExtractingSupplierDecorator.StringWithCharset text)
Copyright © 2015–2020. All rights reserved.