gate.corpora
Class DocumentXmlUtils

java.lang.Object
  extended by gate.corpora.DocumentXmlUtils

public class DocumentXmlUtils
extends Object

This class is contains useful static methods for working with the GATE XML format. Many of the methods in this class were originally in DocumentImpl but as they are not specific to any one implementation of the Document interface they have been moved here.


Field Summary
static int DOC_SIZE_MULTIPLICATION_FACTOR
          This field is used when creating StringBuffers for toXml() methods.
static Map entitiesMap
          A map initialized in init() containing entities that needs to be replaced in strings
 
Constructor Summary
DocumentXmlUtils()
           
 
Method Summary
static void annotationSetToXml(AnnotationSet anAnnotationSet, StringBuffer buffer)
          This method saves an AnnotationSet as XML.
static void annotationSetToXml(AnnotationSet anAnnotationSet, String annotationSetNameToUse, StringBuffer buffer)
          This method saves an AnnotationSet as XML.
static void buildEntityMapFromString(String aScanString, TreeMap aMapToFill)
          This method takes aScanString and searches for those chars from entitiesMap that appear in the string.
static StringBuffer combinedNormalisation(String inputString)
          Combines replaceCharsWithEntities and filterNonXmlChars in a single method
static StringBuffer featuresToXml(FeatureMap aFeatureMap, Map normalizedFeatureNames)
          This method saves a FeatureMap as XML elements.
static StringBuffer filterNonXmlChars(StringBuffer aStrBuffer)
          This method filters any non XML char see: http://www.w3c.org/TR/2000/REC-xml-20001006#charsets All non XML chars will be replaced with 0x20 (space char) This assures that the next time the document is loaded there won't be any problems.
static boolean isXmlChar(char ch)
          This method decide if a char is a valid XML one or not
static StringBuffer replaceCharsWithEntities(String anInputString)
          This method replace all chars that appears in the anInputString and also that are in the entitiesMap with their corresponding entity
static String textWithNodes(TextualDocument doc, String aText)
          Returns the document's text interspersed with <Node> elements at all points where the document has an annotation beginning or ending.
static String toXml(TextualDocument doc)
          Returns a GateXml document that is a custom XML format for wich there is a reader inside GATE called gate.xml.GateFormatXmlHandler.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DOC_SIZE_MULTIPLICATION_FACTOR

public static final int DOC_SIZE_MULTIPLICATION_FACTOR
This field is used when creating StringBuffers for toXml() methods. The size of the StringBuffer will be docDonctent.size() multiplied by this value. It is aimed to improve the performance of StringBuffer

See Also:
Constant Field Values

entitiesMap

public static Map entitiesMap
A map initialized in init() containing entities that needs to be replaced in strings

Constructor Detail

DocumentXmlUtils

public DocumentXmlUtils()
Method Detail

toXml

public static String toXml(TextualDocument doc)
Returns a GateXml document that is a custom XML format for wich there is a reader inside GATE called gate.xml.GateFormatXmlHandler. What it does is to serialize a GATE document in an XML format.

Parameters:
doc - the document to serialize.
Returns:
a string representing a Gate Xml document.

featuresToXml

public static StringBuffer featuresToXml(FeatureMap aFeatureMap,
                                         Map normalizedFeatureNames)
This method saves a FeatureMap as XML elements.

Parameters:
aFeatureMap - the feature map that has to be saved as XML.
Returns:
a String like this: ... ......

combinedNormalisation

public static StringBuffer combinedNormalisation(String inputString)
Combines replaceCharsWithEntities and filterNonXmlChars in a single method


filterNonXmlChars

public static StringBuffer filterNonXmlChars(StringBuffer aStrBuffer)
This method filters any non XML char see: http://www.w3c.org/TR/2000/REC-xml-20001006#charsets All non XML chars will be replaced with 0x20 (space char) This assures that the next time the document is loaded there won't be any problems.

Parameters:
aStrBuffer - represents the input String that is filtred. If the aStrBuffer is null then an empty string will be returend
Returns:
the "purified" StringBuffer version of the aStrBuffer

isXmlChar

public static boolean isXmlChar(char ch)
This method decide if a char is a valid XML one or not

Parameters:
ch - the char to be tested
Returns:
true if is a valid XML char and fals if is not.

replaceCharsWithEntities

public static StringBuffer replaceCharsWithEntities(String anInputString)
This method replace all chars that appears in the anInputString and also that are in the entitiesMap with their corresponding entity

Parameters:
anInputString - the string analyzed. If it is null then returns the empty string
Returns:
a string representing the input string with chars replaced with entities

textWithNodes

public static String textWithNodes(TextualDocument doc,
                                   String aText)
Returns the document's text interspersed with <Node> elements at all points where the document has an annotation beginning or ending.


buildEntityMapFromString

public static void buildEntityMapFromString(String aScanString,
                                            TreeMap aMapToFill)
This method takes aScanString and searches for those chars from entitiesMap that appear in the string. A tree map(offset2Char) is filled using as key the offsets where those Chars appear and the Char. If one of the params is null the method simply returns.


annotationSetToXml

public static void annotationSetToXml(AnnotationSet anAnnotationSet,
                                      StringBuffer buffer)
This method saves an AnnotationSet as XML.

Parameters:
anAnnotationSet - The annotation set that has to be saved as XML.

annotationSetToXml

public static void annotationSetToXml(AnnotationSet anAnnotationSet,
                                      String annotationSetNameToUse,
                                      StringBuffer buffer)
This method saves an AnnotationSet as XML.

Parameters:
anAnnotationSet - The annotation set that has to be saved as XML.
annotationSetNameToUse - The standard annotationSetToXml(AnnotaionSet, StringBuffer) uses the name that belongs to the provided annotation set, however, this method allows one to store the provided annotation set under a different annotation set name.