gate.creole.annic.apache.lucene.analysis.standard
Class StandardTokenizer
java.lang.Object
gate.creole.annic.apache.lucene.analysis.TokenStream
gate.creole.annic.apache.lucene.analysis.Tokenizer
gate.creole.annic.apache.lucene.analysis.standard.StandardTokenizer
- All Implemented Interfaces:
- StandardTokenizerConstants
public class StandardTokenizer
- extends Tokenizer
- implements StandardTokenizerConstants
A grammar-based tokenizer constructed with JavaCC.
This should be a good tokenizer for most European-language documents.
Many applications have specific tokenizer needs. If this tokenizer does
not suit your application, please consider copying this source code
directory to your project and maintaining your own grammar-based tokenizer.
| Fields inherited from class gate.creole.annic.apache.lucene.analysis.Tokenizer |
input |
| Fields inherited from interface gate.creole.annic.apache.lucene.analysis.standard.StandardTokenizerConstants |
ACRONYM, ALPHA, ALPHANUM, APOSTROPHE, CJK, COMPANY, DEFAULT, DIGIT, EMAIL, EOF, HAS_DIGIT, HOST, LETTER, NOISE, NUM, P, tokenImage |
| Methods inherited from class gate.creole.annic.apache.lucene.analysis.Tokenizer |
close |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
token_source
public StandardTokenizerTokenManager token_source
token
public Token token
jj_nt
public Token jj_nt
jj_ntk
private int jj_ntk
jj_gen
private int jj_gen
jj_la1
private final int[] jj_la1
jj_la1_0
private static int[] jj_la1_0
jj_expentries
private Vector jj_expentries
jj_expentry
private int[] jj_expentry
jj_kind
private int jj_kind
StandardTokenizer
public StandardTokenizer(Reader reader)
- Constructs a tokenizer for this Reader.
StandardTokenizer
public StandardTokenizer(CharStream stream)
StandardTokenizer
public StandardTokenizer(StandardTokenizerTokenManager tm)
next
public final Token next()
throws ParseException,
IOException
- Returns the next token in the stream, or null at EOS.
The returned token's type is set to an element of StandardTokenizerConstants.tokenImage.
- Specified by:
next in class TokenStream
- Throws:
ParseException
IOException
jj_la1_0
private static void jj_la1_0()
ReInit
public void ReInit(CharStream stream)
ReInit
public void ReInit(StandardTokenizerTokenManager tm)
jj_consume_token
private final Token jj_consume_token(int kind)
throws ParseException
- Throws:
ParseException
getNextToken
public final Token getNextToken()
getToken
public final Token getToken(int index)
jj_ntk
private final int jj_ntk()
generateParseException
public ParseException generateParseException()
enable_tracing
public final void enable_tracing()
disable_tracing
public final void disable_tracing()