gate.corpora
Class XmlDocumentFormat
java.lang.Object
gate.util.AbstractFeatureBearer
gate.creole.AbstractResource
gate.creole.AbstractLanguageResource
gate.DocumentFormat
gate.corpora.TextualDocumentFormat
gate.corpora.XmlDocumentFormat
- All Implemented Interfaces:
- LanguageResource, Resource, FeatureBearer, NameBearer, Serializable
@CreoleResource(name="GATE XML Document Format",
isPrivate=true,
autoinstances=)
public class XmlDocumentFormat- extends TextualDocumentFormat
The format of Documents. Subclasses of DocumentFormat know about
particular MIME types and how to unpack the information in any markup
or formatting they contain into GATE annotations. Each MIME type has
its own subclass of DocumentFormat, e.g. XmlDocumentFormat,
RtfDocumentFormat, MpegDocumentFormat. These classes register
themselves with a static index residing here when they are
constructed. Static getDocumentFormat methods can then be used to get
the appropriate format class for a particular document.
- See Also:
- Serialized Form
| Methods inherited from class gate.DocumentFormat |
addStatusListener, areEqual, decideBetweenThreeMimeTypes, decideBetweenTwoMimeTypes, fireStatusChanged, getDocumentFormat, getDocumentFormat, getDocumentFormat, getElement2StringMap, getFeatures, getMarkupElementsMap, getMimeType, getMimeTypeForString, getShouldCollectRepositioning, getSupportedFileSuffixes, guessTypeUsingMagicNumbers, removeStatusListener, runMagicNumbers, setElement2StringMap, setFeatures, setMarkupElementsMap, setMimeType, setShouldCollectRepositioning, unpackMarkup |
| Methods inherited from class gate.creole.AbstractResource |
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
XmlDocumentFormat
public XmlDocumentFormat()
- Default construction
supportsRepositioning
public Boolean supportsRepositioning()
- We could collect repositioning information during XML parsing
- Overrides:
supportsRepositioning in class DocumentFormat
unpackMarkup
public void unpackMarkup(Document doc)
throws DocumentFormatException
- Old style of unpackMarkup (without collecting of RepositioningInfo)
- Overrides:
unpackMarkup in class TextualDocumentFormat
- Throws:
DocumentFormatException
unpackMarkup
public void unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
throws DocumentFormatException
- Unpack the markup in the document. This converts markup from the
native format (e.g. XML) into annotations in GATE format. Uses the
markupElementsMap to determine which elements to convert, and what
annotation type names to use. If the document was created from a
String, then is recomandable to set the doc's sourceUrl to null.
So, if the document has a valid URL, then the parser will try to
parse the XML document pointed by the URL.If the URL is not valid,
or is null, then the doc's content will be parsed. If the doc's
content is not a valid XML then the parser might crash.
- Overrides:
unpackMarkup in class TextualDocumentFormat
- Parameters:
doc - The gate document you want to parse. If
doc.getSourceUrl() returns null
then the content of doc will be parsed. Using a URL is
recomended because the parser will report errors corectlly
if the XML document is not well formed.
- Throws:
DocumentFormatException
isGateXmlFormat
protected static boolean isGateXmlFormat(String content)
- Determine whether the given document content string represents a
GATE custom format XML document.
init
public Resource init()
throws ResourceInstantiationException
- Initialise this resource, and return it.
- Specified by:
init in interface Resource- Overrides:
init in class TextualDocumentFormat
- Throws:
ResourceInstantiationException