|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectgate.corpora.DocumentContentImpl
public class DocumentContentImpl
Represents the commonalities between all sorts of document contents.
| Field Summary | |
|---|---|
(package private) String |
content
Just for now - later we have to cater for different types of content. |
private static boolean |
DEBUG
Debug flag |
private static int |
INTERNAL_BUFFER_SIZE
Buffer size for reading 16k is 4 times the block size on most filesystems so it should be efficient for most cases |
(package private) String |
originalContent
For preserving the original content of the document. |
(package private) static long |
serialVersionUID
Freeze the serialization UID. |
| Constructor Summary | |
|---|---|
DocumentContentImpl()
Default construction |
|
DocumentContentImpl(String s)
For ranges |
|
DocumentContentImpl(URL u,
String encoding,
Long start,
Long end)
Contruction from URL and offsets. |
|
| Method Summary | |
|---|---|
(package private) void |
edit(Long start,
Long end,
DocumentContent replacement)
Propagate changes to the document content. |
boolean |
equals(Object other)
Two documents are the same if their contents is the same |
DocumentContent |
getContent(Long start,
Long end)
Return the contents under a particular span. |
String |
getOriginalContent()
Return the original content of the document received during the loading phase or on construction from string. |
int |
hashCode()
Calculate the hash value for the object. |
(package private) boolean |
isValidOffset(Long offset)
Check that an offset is valid |
(package private) boolean |
isValidOffsetRange(Long start,
Long end)
Check that both start and end are valid offsets and that they constitute a valid offset range |
Long |
size()
The size of this content (e.g. character length for textual content). |
String |
toString()
Returns the String representing the content in case of a textual document. |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
private static final boolean DEBUG
private static final int INTERNAL_BUFFER_SIZE
String content
String originalContent
static final long serialVersionUID
| Constructor Detail |
|---|
public DocumentContentImpl()
public DocumentContentImpl(URL u,
String encoding,
Long start,
Long end)
throws IOException
IOExceptionpublic DocumentContentImpl(String s)
| Method Detail |
|---|
void edit(Long start,
Long end,
DocumentContent replacement)
public DocumentContent getContent(Long start,
Long end)
throws InvalidOffsetException
DocumentContentConceptually the annotation offsets are defined as falling in between characters, with "0" pointing before the fist character. Because of that, the offsets where an annotation ends and the space after it starts are the same.
So this is what the "abcde" string looks like with the offsets explicitly included: 0a1b2c3d4e5
"ab cd" would then look like this: 0a1b2 3c4d5
with the following annotations:
Token "ab" [0,2]
SpaceToken " " [2,3]
Token "cd" [3,5]
getContent in interface DocumentContentstart - the beginning index, inclusive.end - the ending index, exclusive.
InvalidOffsetException - if the
start is negative, or
end is larger than the length of
this DocumentContent object, or
start is larger than
end.public String toString()
toString in class Objectpublic Long size()
size in interface DocumentContentboolean isValidOffset(Long offset)
boolean isValidOffsetRange(Long start,
Long end)
public boolean equals(Object other)
equals in class Objectpublic int hashCode()
hashCode in class Objectpublic String getOriginalContent()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||