Package gate.corpora

Interface Summary
EventAwareCorpus  
EventAwareDocument  
EventAwareLanguageResource  
 

Class Summary
CorpusImpl Corpora are sets of Document.
DatabaseCorpusImpl  
DatabaseDocumentImpl  
DocumentContentImpl Represents the commonalities between all sorts of document contents.
DocumentData  
DocumentImpl Represents the commonalities between all sorts of documents.
DocumentStaxUtils This class provides support for reading and writing GATE XML format using StAX (the Streaming API for XML).
DocumentStaxUtils.AnnotationObject An inner class modeling the information contained by an annotation.
DocumentStaxUtils.ArrayCharSequence Thin wrapper class to use a char[] as a CharSequence.
DocumentXmlUtils This class is contains useful static methods for working with the GATE XML format.
EmailDocumentFormat The format of Documents.
HtmlDocumentFormat The format of Documents.
MimeType A very basic implementation for a MIME Type.
NekoHtmlDocumentFormat DocumentFormat that uses Andy Clark's NekoHTML parser to parse HTML documents.
RepositioningInfo RepositioningInfo keep information about correspondence of positions between the original and extracted document content.
SerialCorpusImpl  
SgmlDocumentFormat The format of Documents.
TestCorpus Tests for the Corpus classes
TestDocument Tests for the Document classes
TestDocumentStaxUtils  
TestSerialCorpus Tests for the SerialCorpus classes
TextualDocumentFormat The format of Documents.
TikaFormat  
XmlDocumentFormat The format of Documents.
 

Enum Summary
DocType Enum for different types of documents.
 

Exception Summary
SynchronisationException