gate.corpora
Class TikaFormat

java.lang.Object
  extended by gate.util.AbstractFeatureBearer
      extended by gate.creole.AbstractResource
          extended by gate.creole.AbstractLanguageResource
              extended by gate.DocumentFormat
                  extended by gate.corpora.TikaFormat
All Implemented Interfaces:
LanguageResource, Resource, FeatureBearer, NameBearer, Serializable

@CreoleResource(name="Apache Tika Document Format",
                isPrivate=true,
                autoinstances=)
public class TikaFormat
extends DocumentFormat

See Also:
Serialized Form

Field Summary
 
Fields inherited from class gate.DocumentFormat
element2StringMap, magic2mimeTypeMap, markupElementsMap, mimeString2ClassHandlerMap, mimeString2mimeTypeMap, suffixes2mimeTypeMap
 
Fields inherited from class gate.creole.AbstractLanguageResource
dataStore, lrPersistentId
 
Fields inherited from class gate.creole.AbstractResource
name
 
Constructor Summary
TikaFormat()
           
 
Method Summary
 Resource init()
          Initialise this resource, and return it.
 Boolean supportsRepositioning()
          If the document format could collect repositioning information during the unpack phase this method will return true.
 void unpackMarkup(Document doc)
          Unpack the markup in the document.
 void unpackMarkup(Document doc, RepositioningInfo repInfo, RepositioningInfo ampCodingInfo)
           
 
Methods inherited from class gate.DocumentFormat
addStatusListener, areEqual, decideBetweenThreeMimeTypes, decideBetweenTwoMimeTypes, fireStatusChanged, getDocumentFormat, getDocumentFormat, getDocumentFormat, getElement2StringMap, getFeatures, getMarkupElementsMap, getMimeType, getMimeTypeForString, getShouldCollectRepositioning, getSupportedFileSuffixes, guessTypeUsingMagicNumbers, removeStatusListener, runMagicNumbers, setElement2StringMap, setFeatures, setMarkupElementsMap, setMimeType, setShouldCollectRepositioning, unpackMarkup
 
Methods inherited from class gate.creole.AbstractLanguageResource
cleanup, getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync
 
Methods inherited from class gate.creole.AbstractResource
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gate.LanguageResource
getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync
 
Methods inherited from interface gate.Resource
cleanup, getParameterValue, setParameterValue, setParameterValues
 
Methods inherited from interface gate.util.NameBearer
getName, setName
 

Constructor Detail

TikaFormat

public TikaFormat()
Method Detail

init

public Resource init()
              throws ResourceInstantiationException
Description copied from class: AbstractResource
Initialise this resource, and return it.

Specified by:
init in interface Resource
Overrides:
init in class AbstractResource
Throws:
ResourceInstantiationException

supportsRepositioning

public Boolean supportsRepositioning()
Description copied from class: DocumentFormat
If the document format could collect repositioning information during the unpack phase this method will return true.
You should override this method in the child class of the defined document format if it could collect the repositioning information.

Overrides:
supportsRepositioning in class DocumentFormat

unpackMarkup

public void unpackMarkup(Document doc)
                  throws DocumentFormatException
Description copied from class: DocumentFormat
Unpack the markup in the document. This converts markup from the native format (e.g. XML, RTF) into annotations in GATE format. Uses the markupElementsMap to determine which elements to convert, and what annotation type names to use.

Specified by:
unpackMarkup in class DocumentFormat
Throws:
DocumentFormatException

unpackMarkup

public void unpackMarkup(Document doc,
                         RepositioningInfo repInfo,
                         RepositioningInfo ampCodingInfo)
                  throws DocumentFormatException
Specified by:
unpackMarkup in class DocumentFormat
Throws:
DocumentFormatException