|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectgate.util.AbstractFeatureBearer
gate.creole.AbstractResource
gate.creole.AbstractLanguageResource
gate.DocumentFormat
gate.corpora.TextualDocumentFormat
gate.corpora.XmlDocumentFormat
@CreoleResource(name="GATE XML Document Format",
isPrivate=true,
autoinstances=)
public class XmlDocumentFormatThe format of Documents. Subclasses of DocumentFormat know about particular MIME types and how to unpack the information in any markup or formatting they contain into GATE annotations. Each MIME type has its own subclass of DocumentFormat, e.g. XmlDocumentFormat, RtfDocumentFormat, MpegDocumentFormat. These classes register themselves with a static index residing here when they are constructed. Static getDocumentFormat methods can then be used to get the appropriate format class for a particular document.
| Field Summary | |
|---|---|
private static boolean |
DEBUG
Debug flag |
private static javax.xml.stream.XMLInputFactory |
staxFactory
InputFactory for the StAX parser used for GATE format XML. |
| Fields inherited from class gate.DocumentFormat |
|---|
element2StringMap, magic2mimeTypeMap, markupElementsMap, mimeString2ClassHandlerMap, mimeString2mimeTypeMap, suffixes2mimeTypeMap |
| Fields inherited from class gate.creole.AbstractLanguageResource |
|---|
dataStore, lrPersistentId |
| Fields inherited from class gate.creole.AbstractResource |
|---|
name |
| Constructor Summary | |
|---|---|
XmlDocumentFormat()
Default construction |
|
| Method Summary | |
|---|---|
private static javax.xml.stream.XMLInputFactory |
getInputFactory()
Returns the StAX input factory, creating one if it is currently null. |
Resource |
init()
Initialise this resource, and return it. |
protected static boolean |
isGateXmlFormat(String content)
Determine whether the given document content string represents a GATE custom format XML document. |
Boolean |
supportsRepositioning()
We could collect repositioning information during XML parsing |
private void |
unpackGateFormatMarkup(Document doc,
StatusListener statusListener)
Unpacks markup in the GATE-specific standoff XML markup format. |
private void |
unpackGeneralXmlMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo,
StatusListener statusListener)
Unpack markup from any XML format. |
void |
unpackMarkup(Document doc)
Old style of unpackMarkup (without collecting of RepositioningInfo) |
void |
unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
Unpack the markup in the document. |
| Methods inherited from class gate.corpora.TextualDocumentFormat |
|---|
annotateParagraphs, getDataStore, hasContentButNoValidUrl, setNewLineProperty |
| Methods inherited from class gate.creole.AbstractLanguageResource |
|---|
cleanup, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
| Methods inherited from class gate.creole.AbstractResource |
|---|
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface gate.LanguageResource |
|---|
getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
| Methods inherited from interface gate.Resource |
|---|
cleanup, getParameterValue, setParameterValue, setParameterValues |
| Methods inherited from interface gate.util.NameBearer |
|---|
getName, setName |
| Field Detail |
|---|
private static final boolean DEBUG
private static javax.xml.stream.XMLInputFactory staxFactory
| Constructor Detail |
|---|
public XmlDocumentFormat()
| Method Detail |
|---|
public Boolean supportsRepositioning()
supportsRepositioning in class DocumentFormat
public void unpackMarkup(Document doc)
throws DocumentFormatException
unpackMarkup in class TextualDocumentFormatDocumentFormatException
public void unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
throws DocumentFormatException
unpackMarkup in class TextualDocumentFormatdoc - The gate document you want to parse. If
doc.getSourceUrl() returns null
then the content of doc will be parsed. Using a URL is
recomended because the parser will report errors corectlly
if the XML document is not well formed.
DocumentFormatException
private void unpackGateFormatMarkup(Document doc,
StatusListener statusListener)
throws DocumentFormatException
doc - the document to processstatusListener - optional status listener to receive status
messages
DocumentFormatException - if a fatal error occurs during
parsing
private static javax.xml.stream.XMLInputFactory getInputFactory()
throws javax.xml.stream.XMLStreamException
staxFactory
javax.xml.stream.XMLStreamException
private void unpackGeneralXmlMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo,
StatusListener statusListener)
throws DocumentFormatException
doc - the document to process
DocumentFormatExceptionprotected static boolean isGateXmlFormat(String content)
public Resource init()
throws ResourceInstantiationException
init in interface Resourceinit in class TextualDocumentFormatResourceInstantiationException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||