|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectgate.util.AbstractFeatureBearer
gate.creole.AbstractResource
gate.creole.AbstractLanguageResource
gate.DocumentFormat
gate.corpora.TextualDocumentFormat
@CreoleResource(name="GATE Textual Document Format",
isPrivate=true,
autoinstances=)
public class TextualDocumentFormatThe format of Documents. Subclasses of DocumentFormat know about particular MIME types and how to unpack the information in any markup or formatting they contain into GATE annotations. Each MIME type has its own subclass of DocumentFormat, e.g. XmlDocumentFormat, RtfDocumentFormat, MpegDocumentFormat. These classes register themselves with a static index residing here when they are constructed. Static getDocumentFormat methods can then be used to get the appropriate format class for a particular document.
| Field Summary | |
|---|---|
private static boolean |
DEBUG
Debug flag |
| Fields inherited from class gate.DocumentFormat |
|---|
element2StringMap, magic2mimeTypeMap, markupElementsMap, mimeString2ClassHandlerMap, mimeString2mimeTypeMap, suffixes2mimeTypeMap |
| Fields inherited from class gate.creole.AbstractLanguageResource |
|---|
dataStore, lrPersistentId |
| Fields inherited from class gate.creole.AbstractResource |
|---|
name |
| Constructor Summary | |
|---|---|
TextualDocumentFormat()
Default construction |
|
| Method Summary | |
|---|---|
void |
annotateParagraphs(Document aDoc,
int startOffset,
int endOffset,
String annotSetName)
This method annotates paragraphs in a GATE document. |
DataStore |
getDataStore()
Get the data store that this LR lives in. |
protected static boolean |
hasContentButNoValidUrl(Document doc)
This is a test to see if the GATE document has a valid URL or a valid content. |
Resource |
init()
Initialise this resource, and return it. |
private void |
removeExtraNewLine(Document doc)
Delete '\r' in combination CRLF or LFCR in document content |
protected void |
setNewLineProperty(Document doc)
Check the new line sequence and set document property. |
void |
unpackMarkup(Document doc)
Unpack the markup in the document. |
void |
unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
|
| Methods inherited from class gate.creole.AbstractLanguageResource |
|---|
cleanup, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
| Methods inherited from class gate.creole.AbstractResource |
|---|
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface gate.LanguageResource |
|---|
getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
| Methods inherited from interface gate.Resource |
|---|
cleanup, getParameterValue, setParameterValue, setParameterValues |
| Methods inherited from interface gate.util.NameBearer |
|---|
getName, setName |
| Field Detail |
|---|
private static final boolean DEBUG
| Constructor Detail |
|---|
public TextualDocumentFormat()
| Method Detail |
|---|
public Resource init()
throws ResourceInstantiationException
init in interface Resourceinit in class AbstractResourceResourceInstantiationException
public void unpackMarkup(Document doc)
throws DocumentFormatException
unpackMarkup in class DocumentFormatDocumentFormatException
public void unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
throws DocumentFormatException
unpackMarkup in class DocumentFormatDocumentFormatException
protected static boolean hasContentButNoValidUrl(Document doc)
throws DocumentFormatException
doc -
DocumentFormatExceptionprotected void setNewLineProperty(Document doc)
private void removeExtraNewLine(Document doc)
public void annotateParagraphs(Document aDoc,
int startOffset,
int endOffset,
String annotSetName)
throws DocumentFormatException
aDoc - is the gate document on which the paragraph detection would
be performed.If it is null or its content it's null then the method woul
simply return doing nothing.startOffset - is the index form the document content from which the
paragraph detection will startendOffset - is the offset where the detection will end.annotSetName - is the name of the set in which paragraph annotation
would be created.The annotation type created will be "paragraph"
DocumentFormatExceptionpublic DataStore getDataStore()
AbstractLanguageResource
getDataStore in interface LanguageResourcegetDataStore in class AbstractLanguageResource
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||