gate.util.reporting
Class DocTimeReporter

java.lang.Object
  extended by gate.util.reporting.DocTimeReporter
All Implemented Interfaces:
BenchmarkReportable

public class DocTimeReporter
extends Object
implements BenchmarkReportable

A reporter class to generate a report on time taken by each document within given corpus.


Field Summary
static int ALL_DOCS
          This integer constant when set as No of Docs indicates that the report have all the documents matching a given PR.
private  HashSet<String> allDocs
          An HashSet containing names of the documents matching the given search string.
private  File benchmarkFile
          A File handle to input benchmark file.
private  LinkedHashMap<String,Object> docContainer
          A LinkedHashMap containing the documents matching the given PRs.
private static int FILE_CHUNK_SIZE
          Chunk size in which file will be read
private  float globalTotal
          Total time taken by the given pipeline for the current logical run.
private  String logicalStart
          A marker indicating the start of current logical run.
static String MATCH_ALL_PR_REGEX
          The default value for search string matching PRs for given run.
private  HashSet<String> matchingPRs
          An HashSet containing PR names matching the search string.
private  int maxDocumentInReport
          No of documents to be displayed against matching PRs.
static String MEDIA_HTML
          This string constant when set as print media indicates that the report is printed in HTML format.
static String MEDIA_TEXT
          This string constant when set as print media indicates that the report is printed in TEXT format.
private static String NL
          An OS independent line separator
private  String pipelineName
          Name of the given pipeline
private  String printMedia
          Report media.
private  String PRMatchingRegex
          Search string, could be a PR name.
private  File reportFile
          Path where to save the report file.
private static int STATUS_ERROR
          Status flag for error exit.
private static int STATUS_NORMAL
          Status flag for normal exit.
private  File temporaryDirectory
          Folder where the benchmark.txt files are created for specific pipeline log entries.
 int validEntries
          An integer containing the count of total valid log entries present in input file provided.
 
Constructor Summary
DocTimeReporter()
          No argument constructor.
DocTimeReporter(String[] args)
          A constructor to be used while executing the tool from the command line.
 
Method Summary
 Object calculate(Object reportContainer)
          Calculates the total of the time taken by processing element at each leaf level.
private  void deleteFile(File fileToBeDeleted)
          A method for deleting a given file.
private  LinkedHashMap<String,Object> doTotal(LinkedHashMap<String,Object> reportContainer)
          Computes the sub totals at each processing level.
 void executeReport()
          A single method to execute report (A command line counter part API ).
private  void generateReport()
          Calls store, calculate and printReport for generating the actual report.
 File getBenchmarkFile()
           
 String getLogicalStart()
          Returns the marker indicating logical start of a run.
 int getMaxDocumentInReport()
          Returns the maximum no of documents to be shown in the report.
 String getPrintMedia()
          Returns the name of the media on which report will be generated. e.g. text, HTML.
 String getPRMatchingRegex()
          Returns the search string to be matched to PR names present in the log entries.
 File getReportFile()
           
private  void initTmpDir()
           
private  boolean isPRMatched(String benchmarkIDs, String searchString)
          Provides the functionality to match a user input string with the PR in the given benchmark ids.
static void main(String[] args)
          A main method which acts as a entry point while executing a report via command line
private  void organizeEntries(LinkedHashMap<String,Object> store, String matchedPR, String bTime, String docName)
          Organizes the valid data extracted from the log entries into LinkedHashMap.
 void parseArguments(String[] args)
          Parses the report command lime arguments.
private  boolean parseLinesFromLast(byte[] bytearray, Vector<String> lastNlines, long fromPos)
          A method to ensure that the required line is read from the given file part.
 void printReport(Object reportSource, File outputFile)
          Prints a report as per the value provided for print media option.
private  void printToHTML(LinkedHashMap<String,Object> reportSource, File outputFile)
          Prints the document level statistics report in HTML format.
private  void printToText(Object reportContainer, File outputFile)
          Prints benchmark report in text format.
 void setBenchmarkFile(File benchmarkFile)
          Sets the input benchmark file from which the report is generated.
 void setLogicalStart(String logicalStart)
          Sets optionally a string indicating the logical start of a run.
 void setMaxDocumentInReport(int maxDocumentInReport)
          Maximum number of documents contained in the report.
 void setPrintMedia(String printMedia)
          Sets the media on which report will be generated.
 void setPRMatchingRegex(String matchingRegex)
          Search string to match PR names present in the benchmark file.
 void setReportFile(File reportFile)
          If not set, the default is the file name "report.txt/html" in the system temporary directory.
private  LinkedHashMap sortHashMapByValues(LinkedHashMap passedMap)
          Sorts LinkedHashMap by its values(natural descending order). keeps the duplicates as it is.
private  void splitBenchmarkFile(File benchmarkFile, File report)
          Provides the functionality to separate out pipeline specific benchmark entries in separate temporary benchmark files in a temporary folder in the current working directory.
 Object store(File inputFile)
          Stores GATE processing elements and the time taken by them in an in-memory data structure for report generation.
private  long tail(File fileToBeRead, int chunkSize)
          A method for reading the file upside down.
static void usage()
          Display a usage message
private  boolean validateLogEntry(String benchmarkIDChain, ArrayList<String> startTokens)
          Ignores the inconsistent log entries from the benchmark file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

benchmarkFile

private File benchmarkFile
A File handle to input benchmark file.


printMedia

private String printMedia
Report media.


maxDocumentInReport

private int maxDocumentInReport
No of documents to be displayed against matching PRs.


PRMatchingRegex

private String PRMatchingRegex
Search string, could be a PR name.


logicalStart

private String logicalStart
A marker indicating the start of current logical run.


reportFile

private File reportFile
Path where to save the report file.


allDocs

private HashSet<String> allDocs
An HashSet containing names of the documents matching the given search string.


matchingPRs

private HashSet<String> matchingPRs
An HashSet containing PR names matching the search string. Used to display in report header.


globalTotal

private float globalTotal
Total time taken by the given pipeline for the current logical run.


docContainer

private LinkedHashMap<String,Object> docContainer
A LinkedHashMap containing the documents matching the given PRs.


temporaryDirectory

private File temporaryDirectory
Folder where the benchmark.txt files are created for specific pipeline log entries.


pipelineName

private String pipelineName
Name of the given pipeline


STATUS_NORMAL

private static final int STATUS_NORMAL
Status flag for normal exit.

See Also:
Constant Field Values

STATUS_ERROR

private static final int STATUS_ERROR
Status flag for error exit.

See Also:
Constant Field Values

FILE_CHUNK_SIZE

private static final int FILE_CHUNK_SIZE
Chunk size in which file will be read

See Also:
Constant Field Values

NL

private static final String NL
An OS independent line separator


validEntries

public int validEntries
An integer containing the count of total valid log entries present in input file provided.


MEDIA_TEXT

public static final String MEDIA_TEXT
This string constant when set as print media indicates that the report is printed in TEXT format.

See Also:
Constant Field Values

MEDIA_HTML

public static final String MEDIA_HTML
This string constant when set as print media indicates that the report is printed in HTML format.

See Also:
Constant Field Values

ALL_DOCS

public static final int ALL_DOCS
This integer constant when set as No of Docs indicates that the report have all the documents matching a given PR.

See Also:
Constant Field Values

MATCH_ALL_PR_REGEX

public static final String MATCH_ALL_PR_REGEX
The default value for search string matching PRs for given run.

See Also:
Constant Field Values
Constructor Detail

DocTimeReporter

public DocTimeReporter()
No argument constructor.


DocTimeReporter

DocTimeReporter(String[] args)
A constructor to be used while executing the tool from the command line.

Parameters:
args - array containing command line arguments.
Method Detail

initTmpDir

private void initTmpDir()

calculate

public Object calculate(Object reportContainer)
Calculates the total of the time taken by processing element at each leaf level. Also calculates the difference between the actual time taken by the resources and system noted time.

Specified by:
calculate in interface BenchmarkReportable
Parameters:
reportContainer - An Object of type LinkedHashMap containing the processing elements (with time in milliseconds) in hierarchical structure.
Returns:
An Object containing modified hierarchical structure of processing elements with totals and All others embedded in it.

sortHashMapByValues

private LinkedHashMap sortHashMapByValues(LinkedHashMap passedMap)
Sorts LinkedHashMap by its values(natural descending order). keeps the duplicates as it is.

Parameters:
passedMap - An Object of type LinkedHashMap to be sorted by its values.
Returns:
An Object containing the sorted LinkedHashMap.

doTotal

private LinkedHashMap<String,Object> doTotal(LinkedHashMap<String,Object> reportContainer)
Computes the sub totals at each processing level.

Parameters:
reportContainer - An Object of type LinkedHashMap containing the processing elements (with time in milliseconds) in hierarchical structure.
Returns:
An Object containing the LinkedHashMap with the element values totaled.

printReport

public void printReport(Object reportSource,
                        File outputFile)
Prints a report as per the value provided for print media option.

Specified by:
printReport in interface BenchmarkReportable
Parameters:
reportSource - An Object of type LinkedHashMap containing the processing elements (with time in milliseconds) in hierarchical structure.
outputFile - Path where to save the report.

printToText

private void printToText(Object reportContainer,
                         File outputFile)
Prints benchmark report in text format.

Parameters:
reportContainer - An Object of type LinkedHashMap containing the document names (with time in milliseconds) in hierarchical structure.
outputFile - An object of type File representing the output report file.

store

public Object store(File inputFile)
             throws BenchmarkReportInputFileFormatException
Stores GATE processing elements and the time taken by them in an in-memory data structure for report generation.

Specified by:
store in interface BenchmarkReportable
Parameters:
inputFile - A handle to the input benchmark file.
Returns:
An Object of type LinkedHashMap containing the processing elements (with time in milliseconds) in hierarchical structure. Null if there was an error.
Throws:
BenchmarkReportInputFileFormatException - if the input file provided is not a valid benchmark file.

organizeEntries

private void organizeEntries(LinkedHashMap<String,Object> store,
                             String matchedPR,
                             String bTime,
                             String docName)
Organizes the valid data extracted from the log entries into LinkedHashMap.

Parameters:
store - A global LinkedHashMap containing the processing elements (with time in milliseconds) in hierarchical structure.
matchedPR - A PR matching the given search string.
bTime - Time taken by the specific processing element.
docName - Name of the document being processed.

printToHTML

private void printToHTML(LinkedHashMap<String,Object> reportSource,
                         File outputFile)
Prints the document level statistics report in HTML format.

Parameters:
reportSource - An Object of type LinkedHashMap containing the document names (with time in milliseconds).
outputFile - An object of type File representing the output report file to which the HTML report is to be written.

validateLogEntry

private boolean validateLogEntry(String benchmarkIDChain,
                                 ArrayList<String> startTokens)
Ignores the inconsistent log entries from the benchmark file. Entries from modules like pronominal coreferencer which have not been converted to new benchmarking conventions are ignored.

Parameters:
benchmarkIDChain - the chain of benchmark ids. This is the third token in the benchmark file.
startTokens - an array of first tokens in the benchmark id chain.
Returns:
true if valid log entry; false otherwise.

parseArguments

public void parseArguments(String[] args)
Parses the report command lime arguments.

Specified by:
parseArguments in interface BenchmarkReportable
Parameters:
args - array containing the command line arguments.

getPrintMedia

public String getPrintMedia()
Returns the name of the media on which report will be generated. e.g. text, HTML.

Returns:
printMedia A String containing the name of the media on which report will be generated.

setPrintMedia

public void setPrintMedia(String printMedia)
Sets the media on which report will be generated.

Parameters:
printMedia - Type of media on which the report will be generated. Must be MEDIA_TEXT or MEDIA_HTML. The default is MEDIA_HTML.

isPRMatched

private boolean isPRMatched(String benchmarkIDs,
                            String searchString)
Provides the functionality to match a user input string with the PR in the given benchmark ids.

Parameters:
benchmarkIDs - A string of benchmarkIDs containing the PR name at the start of string.
searchString - The string to be matched for PR name.
Returns:
boolean true if search string matches PR name; false otherwise.

deleteFile

private void deleteFile(File fileToBeDeleted)
                 throws BenchmarkReportFileAccessException
A method for deleting a given file.

Parameters:
fileToBeDeleted - A handle of the file to be deleted.
Throws:
BenchmarkReportFileAccessException - if a given file could not be deleted.

splitBenchmarkFile

private void splitBenchmarkFile(File benchmarkFile,
                                File report)
                         throws BenchmarkReportFileAccessException,
                                BenchmarkReportInputFileFormatException
Provides the functionality to separate out pipeline specific benchmark entries in separate temporary benchmark files in a temporary folder in the current working directory.

Parameters:
benchmarkFile - An object of type File representing the input benchmark file.
report - A file handle to the report file to be written.
Throws:
BenchmarkReportFileAccessException - if any error occurs while accessing the input benchmark file or while splitting it.
BenchmarkReportExecutionException - if the given input benchmark file is modified while generating the report.
BenchmarkReportInputFileFormatException

tail

private long tail(File fileToBeRead,
                  int chunkSize)
           throws BenchmarkReportInputFileFormatException
A method for reading the file upside down.

Parameters:
fileToBeRead - An object of the file to be read.
chunkSize - An integer specifying the size of the chunks in which file will be read.
Returns:
A long value pointing to the start position of the given file chunk.
Throws:
BenchmarkReportInputFileFormatException

parseLinesFromLast

private boolean parseLinesFromLast(byte[] bytearray,
                                   Vector<String> lastNlines,
                                   long fromPos)
A method to ensure that the required line is read from the given file part.

Parameters:
bytearray - A part of a file being read upside down.
lastNlines - A vector containing the lines extracted from file part.
fromPos - A long value indicating the start of a file part.
Returns:
true if marker indicating the logical start of run is found; false otherwise.

usage

public static void usage()
Display a usage message


main

public static void main(String[] args)
                 throws BenchmarkReportInputFileFormatException,
                        BenchmarkReportFileAccessException
A main method which acts as a entry point while executing a report via command line

Parameters:
args - A string array containing the command line arguments.
Throws:
BenchmarkReportExecutionException - if a given input file is modified while generating the report.
BenchmarkReportInputFileFormatException
BenchmarkReportFileAccessException

generateReport

private void generateReport()
                     throws BenchmarkReportInputFileFormatException,
                            BenchmarkReportFileAccessException
Calls store, calculate and printReport for generating the actual report.

Throws:
BenchmarkReportInputFileFormatException
BenchmarkReportFileAccessException

executeReport

public void executeReport()
                   throws BenchmarkReportInputFileFormatException,
                          BenchmarkReportFileAccessException
Description copied from interface: BenchmarkReportable
A single method to execute report (A command line counter part API ). Call this method after setting the report parameters.

Specified by:
executeReport in interface BenchmarkReportable
Throws:
BenchmarkReportInputFileFormatException
BenchmarkReportFileAccessException

getLogicalStart

public String getLogicalStart()
Returns the marker indicating logical start of a run.

Returns:
logicalStart A String containing the marker indicating logical start of a run.

setLogicalStart

public void setLogicalStart(String logicalStart)
Sets optionally a string indicating the logical start of a run.

Parameters:
logicalStart - A String indicating the logical start of a run. Useful when you you have marked different runs in your benchmark file with this string at their start. By default the value is null.

getBenchmarkFile

public File getBenchmarkFile()
Returns:
benchmarkFile path to input benchmark file.
See Also:
setBenchmarkFile(java.io.File)

setBenchmarkFile

public void setBenchmarkFile(File benchmarkFile)
Sets the input benchmark file from which the report is generated. By default use the file named "benchmark.txt" from the application execution directory.

Parameters:
benchmarkFile - Input benchmark file.

getReportFile

public File getReportFile()
Returns:
reportFile file path where the report file is written.
See Also:
setReportFile(java.io.File)

setReportFile

public void setReportFile(File reportFile)
If not set, the default is the file name "report.txt/html" in the system temporary directory.

Parameters:
reportFile - file path to the report file to write.

getMaxDocumentInReport

public int getMaxDocumentInReport()
Returns the maximum no of documents to be shown in the report.

Returns:
maxDocumentInReport An integer specifying the maximum no of documents to be shown in the report.

setMaxDocumentInReport

public void setMaxDocumentInReport(int maxDocumentInReport)
Maximum number of documents contained in the report.

Parameters:
maxDocumentInReport - Maximum number of documents contained in the report. Use the constant ALL_DOCS for reporting all documents. The default is 10.

getPRMatchingRegex

public String getPRMatchingRegex()
Returns the search string to be matched to PR names present in the log entries.

Returns:
PRMatchingRegex A String to be matched to PR names present in the log entries.

setPRMatchingRegex

public void setPRMatchingRegex(String matchingRegex)
Search string to match PR names present in the benchmark file.

Parameters:
matchingRegex - regular expression to match PR names present in the benchmark file. The default is MATCH_ALL_PR_REGEX.