Package org.aksw.commons.io.seekable.api
Interface Seekable
- All Superinterfaces:
AutoCloseable,Channel,Closeable,ReadableByteChannel
- All Known Implementing Classes:
PageNavigator,SeekableFromBlock
Interface that enables relative navigation over data of fixed finite
but possibly initially unknown size.
Start and end positions can be 'discovered' when a relative operation causes.
A Seekable is a ReadableByteChannel but in addition it includes methods for relative
seeks and pattern matching methods.
The rationale is, that certain operations can be carried out faster if they are pushed
to underlying implementation.
For example, consider comparing a fixed sequence of bytes to a Seekable: Instead of repeatedly
requesting a copies of bytes from the channel, the comparision can be pushed to the seekable which
may find out it can delegate the request to an internal buffer. Or it may detect that the
operation crosses internal buffer boundaries and handle this case accordingly.
The other aspect is, that for relative seeks it is assumed that checking
the resources associated with the most recent position first is most likely to
generate lookup hits.
So with this we can skip a check for whether a global position change should be translated into a relative one.
A seekable is backed by a data supplying entity such as a byte array, a ByteBuffer, a FileChannel
or a composite thereof.
There are two related main features of this interface / trait:
One is is that it common matching methods are part of the interface - with emphasis on binary search.
The other is, that due to this integrated functionality, internal structures can be abstracted:
A matcher can thus transparently run over a sequence of internal blocks without exposing these details.
Implementation can provide their own optimized overrides of these matchers, thus
significantly speeding up lookups.
Methods that perform matching can move the position one byte before or after the
backing data region by means of reducing the amount of copying of byte arrays.
- Author:
- raven
-
Method Summary
Modifier and TypeMethodDescriptiondefault longbinarySearch(long min, long max, byte delimiter, byte[] prefix) Delimiter-based binary search.intcheckNext(int len, boolean changePos) Attempt to advance the position by the given number of bytes.intcheckPrev(int len, boolean changePos) clone()default SeekableDefault method to work around scala bug https://github.com/scala/bug/issues/10501voidclose()static intcompareArrays(byte[] a, byte[] b) default intcompareToPrefix(byte[] prefix) Compare the bytes at the current position to a given sequence of bytes If there are fewer bytes available in the seekable than provide for comparison, then only that many are compared.default booleandeltaPos(int delta) Relative positioning.default byteget()Read a byte at the current positiondefault byteget(int relPos) Get one byte relative to the current positionlonggetPos()Optional operation.booleanThe state of a seekable may be one unit beyond the end.booleanThe state of a seekable may be one unit before the start.default booleannextPos(int len) Attempt to advance the position by the given number of bytes.default intpeekNextBytes(byte[] dst, int offset, int len) Attempt to read bytes at the current position without altering the positionvoidposToEnd()Optional operation.default booleanposToNext(byte delimiter) Move the position to the next delimiter if it exists.default intposToNext(byte delimiter, boolean changePos) 0: match but 0 bytes moved negative values indicate no match.default booleanposToPrev(byte delimiter) Move the position to the previous delimiter if it exists, or one element past the end of data such that isPosBeforeStart() yields true.voidOptional operation.default booleanprevPos(int len) Attempt to step back the position by the given number of bytes.default intread(ByteBuffer dst) readString(int len) voidsetPos(long pos) Optional operation.default longsize()The currently known size (of the underlying entity)
-
Method Details
-
clone
Seekable clone() -
cloneObject
Default method to work around scala bug https://github.com/scala/bug/issues/10501- Returns:
-
getPos
Optional operation. Get the position in this seekable- Returns:
- Throws:
IOException
-
setPos
Optional operation. Get the position in this seekable- Throws:
IOException
-
posToStart
Optional operation. Move one unit before the start of the seekable; raises an exception on infinite seekables- Throws:
IOException
-
posToEnd
Optional operation. Move to one unit beyond the end of the seekable; raises an exception on infinite seekables- Throws:
IOException
-
get
Get one byte relative to the current position- Parameters:
relPos-- Returns:
- Throws:
IOException
-
get
Read a byte at the current position- Returns:
- The byte at the current position if the position is valid
- Throws:
IOException
-
isPosBeforeStart
The state of a seekable may be one unit before the start. In this state, if the seekable is non-empty, nextPos(1) must be a valid position- Returns:
- Throws:
IOException
-
isPosAfterEnd
The state of a seekable may be one unit beyond the end. In this state, if the seekable is non-empty, prevPos(1) must be a valid position- Returns:
- Throws:
IOException
-
nextPos
Attempt to advance the position by the given number of bytes. If the position is valid before the call it will always be valid when the call returns - i.e. in that case isPosBeforeStart and isPosAfterEnd will always be false. Argument must not be negative.- Parameters:
len-- Returns:
- True if the position was changed by the requested amount of bytes. False means that the position was unchanged.
- Throws:
IOException
-
checkNext
Attempt to advance the position by the given number of bytes. Return the number of bytes by which the position was changed. Returning less bytes than requested implies that a end position was reached which cannot be passed. This method cannot pass beyond the end - i.e. isPosAfterEnd cannot change from false to true by calling this method.- Parameters:
len-- Returns:
- Throws:
IOException
-
prevPos
Attempt to step back the position by the given number of bytes. Argument must not be negative.- Parameters:
len-- Returns:
- True if the position was changed by the *requested* amount of bytes. False means that the position was unchanged.
- Throws:
IOException
-
checkPrev
- Throws:
IOException
-
deltaPos
Relative positioning. Delegates to nextPos or prevPos based on sign of delta.- Parameters:
delta-- Returns:
- Throws:
IOException
-
peekNextBytes
Attempt to read bytes at the current position without altering the position- Parameters:
dst-offset-len-- Returns:
- Throws:
IOException
-
posToNext
Move the position to the next delimiter if it exists. Positive result is the number of bytes the position was advanced by this invocation. Negative result indicates that the number of bytes until the end of the seekable - i.e. within that number of bytes no match was found. Move the position to the next delimiter if it exists, or one element past the end of data such that isPosAfterEnd() yields true. Position is unchanged if already at a delimiter- Parameters:
delimiter-- Returns:
- true if the position was changed, false otherwise
- Throws:
IOException
-
posToNext
0: match but 0 bytes moved negative values indicate no match. add +1 to get the number of bytes moved: -1: no match and 0 bytes moved -10: no match and 9 bytes moved- Parameters:
delimiter-changePos- If no delimiter is found, move the pos to the end- Returns:
- Throws:
IOException
-
posToPrev
Move the position to the previous delimiter if it exists, or one element past the end of data such that isPosBeforeStart() yields true. Position is unchanged if already at delimiter- Parameters:
delimiter-- Returns:
- true if the position was changed, false otherwise
- Throws:
IOException
-
read
- Specified by:
readin interfaceReadableByteChannel- Throws:
IOException
-
readString
- Throws:
IOException
-
compareToPrefix
Compare the bytes at the current position to a given sequence of bytes If there are fewer bytes available in the seekable than provide for comparison, then only that many are compared. This default implementation uses read(ByteBuffer). Other implementations may override this behavior to compare the given prefix directly against their internal data structures without the intermediate buffer copy due to read.- Parameters:
prefix-- Returns:
- Throws:
IOException
-
compareArrays
static int compareArrays(byte[] a, byte[] b) -
binarySearch
Delimiter-based binary search. delimiter must not appear in prefix Result is the position of the match or -1 if no match was found. Position is set to the first match. TODO Position is undefined if there was no match - which is not optimal - we might want to reset it in that case To reiterate: For fwd / bwd searches, the contract is to move to the next match or beyond the start/end of the stream We might want to change it that if there is a match, then move to it and return true, otherwise leave the position unchanged and return false- Parameters:
min-max-delimiter-prefix-- Returns:
- Throws:
IOException
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceChannel- Specified by:
closein interfaceCloseable- Throws:
IOException
-
size
The currently known size (of the underlying entity)- Throws:
IOException
-