|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectgate.creole.annic.lucene.LuceneDocument
public class LuceneDocument
Given an instance of Gate Document, this class provides a method to convert it into the format that lucene can understand and can store in its indexes. This class also stores the tokenStream on the disk in order to retrieve it at the time of searching
| Nested Class Summary | |
|---|---|
private class |
LuceneDocument.OffsetGroup
Internal class used for storing the offsets of annotations. |
| Constructor Summary | |
|---|---|
LuceneDocument()
|
|
| Method Summary | |
|---|---|
List<Document> |
createDocuments(String corpusPersistenceID,
Document gateDoc,
String documentID,
ArrayList<String> annotSetsToInclude,
ArrayList<String> annotSetsToExclude,
ArrayList<String> featuresToInclude,
ArrayList<String> featuresToExclude,
String indexLocation,
String baseTokenAnnotationType,
Boolean createTokensAutomatically,
String indexUnitAnnotationType)
Given an instance of Gate Document, it converts it into the format that lucene can understand and can store in its indexes. |
private boolean |
createTokens(Document gateDocument,
AnnotationSet set)
|
private String |
getCompatibleName(String name)
Some file names are not compatible to the underlying file system. |
private ArrayList<Token>[] |
getTokens(Document document,
AnnotationSet inputAs,
ArrayList<String> featuresToInclude,
ArrayList<String> featuresToExclude,
String baseTokenAnnotationType,
AnnotationSet baseTokenSet,
String indexUnitAnnotationType,
AnnotationSet indexUnitSet,
Set<String> indexedFeatures)
This method given a GATE document and other required parameters, for each annotation of type indexUnitAnnotationType creates a separate list of baseTokens underlying in it. |
private void |
writeOnDisk(ArrayList tokenStream,
String folderName,
String fileName,
String location)
This method, given a tokenstream and file name, writes the tokenstream on the provided location. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public LuceneDocument()
| Method Detail |
|---|
public List<Document> createDocuments(String corpusPersistenceID,
Document gateDoc,
String documentID,
ArrayList<String> annotSetsToInclude,
ArrayList<String> annotSetsToExclude,
ArrayList<String> featuresToInclude,
ArrayList<String> featuresToExclude,
String indexLocation,
String baseTokenAnnotationType,
Boolean createTokensAutomatically,
String indexUnitAnnotationType)
corpusPersistenceID - gateDoc - documentID - annotSet - featuresToExclude - indexLocation - baseTokenAnnotationType - indexUnitAnnotationType -
private boolean createTokens(Document gateDocument,
AnnotationSet set)
private String getCompatibleName(String name)
name -
private void writeOnDisk(ArrayList tokenStream,
String folderName,
String fileName,
String location)
throws Exception
tokenStream - fileName - location -
Exception
private ArrayList<Token>[] getTokens(Document document,
AnnotationSet inputAs,
ArrayList<String> featuresToInclude,
ArrayList<String> featuresToExclude,
String baseTokenAnnotationType,
AnnotationSet baseTokenSet,
String indexUnitAnnotationType,
AnnotationSet indexUnitSet,
Set<String> indexedFeatures)
document - inputAs - featuresToExclude - baseTokenAnnotationType - indexUnitAnnotationType -
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||