gate.creole.tokeniser
Class DefaultTokeniser

java.lang.Object
  extended by gate.util.AbstractFeatureBearer
      extended by gate.creole.AbstractResource
          extended by gate.creole.AbstractProcessingResource
              extended by gate.creole.AbstractLanguageAnalyser
                  extended by gate.creole.tokeniser.DefaultTokeniser
All Implemented Interfaces:
ANNIEConstants, Executable, LanguageAnalyser, ProcessingResource, Resource, Benchmarkable, FeatureBearer, NameBearer, Serializable

public class DefaultTokeniser
extends AbstractLanguageAnalyser
implements Benchmarkable

A composed tokeniser containing a SimpleTokeniser and a Transducer. The simple tokeniser tokenises the document and the transducer processes its output.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class gate.creole.AbstractProcessingResource
AbstractProcessingResource.InternalStatusListener, AbstractProcessingResource.IntervalProgressListener
 
Field Summary
private  String annotationSetName
           
private  String benchmarkId
           
private static boolean DEBUG
           
static String DEF_TOK_ANNOT_SET_PARAMETER_NAME
           
static String DEF_TOK_DOCUMENT_PARAMETER_NAME
           
static String DEF_TOK_ENCODING_PARAMETER_NAME
           
static String DEF_TOK_GRAMRULES_URL_PARAMETER_NAME
           
static String DEF_TOK_TOKRULES_URL_PARAMETER_NAME
           
private  String encoding
           
protected  SimpleTokeniser tokeniser
          the simple tokeniser used for tokenisation
private  URL tokeniserRulesURL
           
protected  Transducer transducer
          the transducer used for post-processing
private  URL transducerGrammarURL
           
 
Fields inherited from class gate.creole.AbstractLanguageAnalyser
corpus, document
 
Fields inherited from class gate.creole.AbstractProcessingResource
interrupted
 
Fields inherited from class gate.creole.AbstractResource
name
 
Fields inherited from class gate.util.AbstractFeatureBearer
features
 
Fields inherited from interface gate.creole.ANNIEConstants
ANNOTATION_COREF_FEATURE_NAME, DATE_ANNOTATION_TYPE, DATE_POSTED_ANNOTATION_TYPE, DEFAULT_FILE, DOCUMENT_COREF_FEATURE_NAME, JOB_ID_ANNOTATION_TYPE, LOCATION_ANNOTATION_TYPE, LOOKUP_ANNOTATION_TYPE, LOOKUP_CLASS_FEATURE_NAME, LOOKUP_INSTANCE_FEATURE_NAME, LOOKUP_LANGUAGE_FEATURE_NAME, LOOKUP_MAJOR_TYPE_FEATURE_NAME, LOOKUP_MINOR_TYPE_FEATURE_NAME, LOOKUP_ONTOLOGY_FEATURE_NAME, MONEY_ANNOTATION_TYPE, ORGANIZATION_ANNOTATION_TYPE, PERSON_ANNOTATION_TYPE, PERSON_GENDER_FEATURE_NAME, PLUGIN_DIR, PR_NAMES, SENTENCE_ANNOTATION_TYPE, SPACE_TOKEN_ANNOTATION_TYPE, TOKEN_ANNOTATION_TYPE, TOKEN_CATEGORY_FEATURE_NAME, TOKEN_KIND_FEATURE_NAME, TOKEN_LENGTH_FEATURE_NAME, TOKEN_ORTH_FEATURE_NAME, TOKEN_STRING_FEATURE_NAME
 
Constructor Summary
DefaultTokeniser()
           
 
Method Summary
 void cleanup()
          should clear all internal data of the resource.
 void execute()
          Run the resource.
 String getAnnotationSetName()
           
 String getBenchmarkId()
          Returns the benchmark ID of this resource.
 String getEncoding()
           
 URL getTokeniserRulesURL()
           
 URL getTransducerGrammarURL()
           
 Resource init()
          Initialise this resource, and return it.
 void interrupt()
          Notifies all the PRs in this controller that they should stop their execution as soon as possible.
 void setAnnotationSetName(String annotationSetName)
           
 void setBenchmarkId(String benchmarkId)
          This method sets the benchmarkID for this resource.
 void setEncoding(String encoding)
           
 void setTokeniserRulesURL(URL tokeniserRulesURL)
           
 void setTransducerGrammarURL(URL transducerGrammarURL)
           
 
Methods inherited from class gate.creole.AbstractLanguageAnalyser
getCorpus, getDocument, setCorpus, setDocument
 
Methods inherited from class gate.creole.AbstractProcessingResource
addProgressListener, addStatusListener, fireProcessFinished, fireProgressChanged, fireStatusChanged, isInterrupted, reInit, removeProgressListener, removeStatusListener
 
Methods inherited from class gate.creole.AbstractResource
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners
 
Methods inherited from class gate.util.AbstractFeatureBearer
getFeatures, setFeatures
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gate.ProcessingResource
reInit
 
Methods inherited from interface gate.Resource
getParameterValue, setParameterValue, setParameterValues
 
Methods inherited from interface gate.util.FeatureBearer
getFeatures, setFeatures
 
Methods inherited from interface gate.util.NameBearer
getName, setName
 
Methods inherited from interface gate.Executable
isInterrupted
 

Field Detail

DEF_TOK_DOCUMENT_PARAMETER_NAME

public static final String DEF_TOK_DOCUMENT_PARAMETER_NAME
See Also:
Constant Field Values

DEF_TOK_ANNOT_SET_PARAMETER_NAME

public static final String DEF_TOK_ANNOT_SET_PARAMETER_NAME
See Also:
Constant Field Values

DEF_TOK_TOKRULES_URL_PARAMETER_NAME

public static final String DEF_TOK_TOKRULES_URL_PARAMETER_NAME
See Also:
Constant Field Values

DEF_TOK_GRAMRULES_URL_PARAMETER_NAME

public static final String DEF_TOK_GRAMRULES_URL_PARAMETER_NAME
See Also:
Constant Field Values

DEF_TOK_ENCODING_PARAMETER_NAME

public static final String DEF_TOK_ENCODING_PARAMETER_NAME
See Also:
Constant Field Values

DEBUG

private static final boolean DEBUG
See Also:
Constant Field Values

tokeniser

protected SimpleTokeniser tokeniser
the simple tokeniser used for tokenisation


transducer

protected Transducer transducer
the transducer used for post-processing


tokeniserRulesURL

private URL tokeniserRulesURL

encoding

private String encoding

transducerGrammarURL

private URL transducerGrammarURL

annotationSetName

private String annotationSetName

benchmarkId

private String benchmarkId
Constructor Detail

DefaultTokeniser

public DefaultTokeniser()
Method Detail

init

public Resource init()
              throws ResourceInstantiationException
Initialise this resource, and return it.

Specified by:
init in interface Resource
Overrides:
init in class AbstractProcessingResource
Throws:
ResourceInstantiationException

cleanup

public void cleanup()
Description copied from class: AbstractProcessingResource
should clear all internal data of the resource. Does nothing now

Specified by:
cleanup in interface Resource
Overrides:
cleanup in class AbstractProcessingResource

execute

public void execute()
             throws ExecutionException
Description copied from class: AbstractProcessingResource
Run the resource. It doesn't make sense not to override this in subclasses so the default implementation signals an exception.

Specified by:
execute in interface Executable
Overrides:
execute in class AbstractProcessingResource
Throws:
ExecutionException

interrupt

public void interrupt()
Notifies all the PRs in this controller that they should stop their execution as soon as possible.

Specified by:
interrupt in interface Executable
Overrides:
interrupt in class AbstractProcessingResource

setTokeniserRulesURL

public void setTokeniserRulesURL(URL tokeniserRulesURL)

getTokeniserRulesURL

public URL getTokeniserRulesURL()

setEncoding

public void setEncoding(String encoding)

getEncoding

public String getEncoding()

setTransducerGrammarURL

public void setTransducerGrammarURL(URL transducerGrammarURL)

getTransducerGrammarURL

public URL getTransducerGrammarURL()

setAnnotationSetName

public void setAnnotationSetName(String annotationSetName)

getAnnotationSetName

public String getAnnotationSetName()

setBenchmarkId

public void setBenchmarkId(String benchmarkId)
Description copied from interface: Benchmarkable
This method sets the benchmarkID for this resource. The resource must use this as the prefix for any sub-events it logs.

Specified by:
setBenchmarkId in interface Benchmarkable
Parameters:
benchmarkId - the benchmark ID, which must not contain spaces as it is already used as a separator in the log, you can use Benchmark.createBenchmarkId(String, String) for it.

getBenchmarkId

public String getBenchmarkId()
Description copied from interface: Benchmarkable
Returns the benchmark ID of this resource.

Specified by:
getBenchmarkId in interface Benchmarkable
Returns: