gate.creole.annic.lucene
Class LuceneIndexer

java.lang.Object
  extended by gate.creole.annic.lucene.LuceneIndexer
All Implemented Interfaces:
Indexer

public class LuceneIndexer
extends Object
implements Indexer

This class provides a Lucene based implementation for the Indexer interface. It asks users to provide various required parameters and creates the Lucene Index.

Author:
niraj

Field Summary
protected  Corpus corpus
          An corpus for indexing
protected  boolean DEBUG
           
protected  Map parameters
          Various parameters such as location of the Index etc.
 
Constructor Summary
LuceneIndexer(URL indexLocationUrl)
          Constructor
 
Method Summary
 void add(String corpusPersistenceID, List<Document> added)
          Add new documents to Index
protected  void checkIndexParameters(Map parameters)
          Checks the Index Parameters to see if they are all compatible
 void createIndex(Map indexParameters)
          Creates index directory and indexing all documents in the corpus.
 void deleteIndex()
          Deletes the index.
private  String getCompatibleName(String name)
           
 Corpus getCorpus()
          Returns the corpus.
protected  Map getIndexParameters()
          Returns the indexing parameters
private  List<Document> getLuceneDocuments(String corpusPersistenceID, Document gateDoc, String location)
          We create a separate Lucene document for each index unit available in the gate document.
 Set<String> getNamesOfSerializedFiles(String documentID)
          This method returns a set of annotation set names that are indexed.
 Map getParameters()
          Returns the set parameters
 void optimizeIndex()
          Optimize existing index.
private  void readParametersFromDisk(URL indexLocationUrl)
          This method, searchers for the LuceneIndexDefinition.xml file at the provided location.
 void remove(List removedIDs)
          remove documents from the Index
 void setCorpus(Corpus corpus)
          Sets the corpus.
private  void writeParametersToDisk()
          All Index parameters are stored on a disc at the index_location_url/LuceneIndexDefinition.xml file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEBUG

protected boolean DEBUG

corpus

protected Corpus corpus
An corpus for indexing


parameters

protected Map parameters
Various parameters such as location of the Index etc.

Constructor Detail

LuceneIndexer

public LuceneIndexer(URL indexLocationUrl)
              throws IOException
Constructor

Parameters:
indexLocationUrl -
Throws:
IOException
Method Detail

checkIndexParameters

protected void checkIndexParameters(Map parameters)
                             throws IndexException
Checks the Index Parameters to see if they are all compatible

Throws:
IndexException

getIndexParameters

protected Map getIndexParameters()
Returns the indexing parameters

Returns:

createIndex

public void createIndex(Map indexParameters)
                 throws IndexException
Creates index directory and indexing all documents in the corpus.

Specified by:
createIndex in interface Indexer
Parameters:
indexParameters - This is a map containing various values required to create an index In case of LuceneIndexManager following are the values required

INDEX_LOCATION_URL - this is a URL where the Index be created

BASE_TOKEN_ANNOTATION_TYPE

INDEX_UNIT_ANNOTATION_TYPE

FEATURES_TO_EXCLUDE

FEATURES_TO_INCLUDE

Throws:
IndexException

optimizeIndex

public void optimizeIndex()
                   throws IndexException
Optimize existing index.

Specified by:
optimizeIndex in interface Indexer
Throws:
IndexException

deleteIndex

public void deleteIndex()
                 throws IndexException
Deletes the index.

Specified by:
deleteIndex in interface Indexer
Throws:
IndexException

add

public void add(String corpusPersistenceID,
                List<Document> added)
         throws IndexException
Add new documents to Index

Specified by:
add in interface Indexer
Parameters:
corpusPersistenceID -
addedDocuments -
Throws:
IndexException

getCompatibleName

private String getCompatibleName(String name)

remove

public void remove(List removedIDs)
            throws IndexException
remove documents from the Index

Specified by:
remove in interface Indexer
Parameters:
removedDocumentPersistenceIds - - when documents are not peristed, Persistence IDs will not be available In that case provide the document Names instead of their IDs
Throws:
Exception
IndexException

getLuceneDocuments

private List<Document> getLuceneDocuments(String corpusPersistenceID,
                                          Document gateDoc,
                                          String location)
                                   throws IndexException
We create a separate Lucene document for each index unit available in the gate document. An array of Lucene document is returned as a call to this method. It uses various indexing parameters set earlier.

Parameters:
corpusPersistenceID -
gateDoc -
location -
Returns:
Throws:
IndexException

getCorpus

public Corpus getCorpus()
Returns the corpus.

Specified by:
getCorpus in interface Indexer
Returns:

setCorpus

public void setCorpus(Corpus corpus)
               throws IndexException
Sets the corpus.

Specified by:
setCorpus in interface Indexer
Throws:
IndexException

readParametersFromDisk

private void readParametersFromDisk(URL indexLocationUrl)
                             throws IOException
This method, searchers for the LuceneIndexDefinition.xml file at the provided location. The file is supposed to contain all the required parameters which are used to create an index.

Parameters:
indexLocationUrl -
Throws:
IOException

writeParametersToDisk

private void writeParametersToDisk()
                            throws IOException
All Index parameters are stored on a disc at the index_location_url/LuceneIndexDefinition.xml file.

Throws:
IOException

getParameters

public Map getParameters()
Returns the set parameters

Specified by:
getParameters in interface Indexer
Returns:

getNamesOfSerializedFiles

public Set<String> getNamesOfSerializedFiles(String documentID)
                                      throws IndexException
This method returns a set of annotation set names that are indexed.

Returns:
Throws:
IndexException