Class BufferOverInputStream
java.lang.Object
org.aksw.commons.io.hadoop.binseach.bz2.BufferOverInputStream
- All Implemented Interfaces:
AutoCloseable,ChannelFactory<Seekable>
FIXME This class should be removed because it is superseded by BufferOverReadableChannel in aksw-commons-io!
Implementation of a byte array that caches data in buckets from
an InputStream.
Instances of these class are thread safe, but the obtained channels are not; each channel should only be operated on
by one thread.
Differences to BufferedInputStream
- this class caches all data read from the inputstream hence there is no mark / reset mechanism
- buffer is split into buckets (no data copying required when allocating more space)
- data is loaded on demand based on (possibly concurrent) requests to the seekable channels obtained with
newChannel()
Closest known-to-me Hadoop counterpart is BufferedFSInputStream (which is based on BufferedInputStream)
- Author:
- raven
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classclass -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected BufferOverInputStream.BucketPointerEnd marker with two components (idx, pos) it is wrapped in an object to enable atomic replacement of the reference The pointer is monotonous in the sense that the end marker's logical linear location is only increased Reading an old version while a new one has been set will only cause a read to return on the old boundary, but a subsequent synchronized check for whether loading of additional data is needed is then made anywayprotected byte[][]The buffered dataprotected InputStreamSupplier for additional dataprotected booleanFlag to indicate that the dataSupplier has been consumed This is the case when dataSupplier(buffer) returns -1protected longThe number of cached bytes.protected intMaximum number to read from the dataSupplier in one requestprotected int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()static BufferOverInputStreamcreate(InputStream in, int maxReadSize, int... preconfiguredBucketSizes) intdoRead(BufferOverInputStream.ByteArrayChannel reader, ByteBuffer dst) protected voidlonggetPointer(byte[][] buckets, BufferOverInputStream.BucketPointer end, long pos) static longgetPosition(byte[][] buckets, int idx, int pos) booleanprotected voidloadData(int needed) fetch a chunk from the input streamprotected voidloadDataUpTo(long requestedPos) Preload data up to including the requested position.static voidstatic voidprotected int
-
Field Details
-
buckets
protected byte[][] bucketsThe buffered data -
activeEnd
End marker with two components (idx, pos) it is wrapped in an object to enable atomic replacement of the reference The pointer is monotonous in the sense that the end marker's logical linear location is only increased Reading an old version while a new one has been set will only cause a read to return on the old boundary, but a subsequent synchronized check for whether loading of additional data is needed is then made anyway -
knownDataSize
protected long knownDataSizeThe number of cached bytes. Corresponds to the linear representation of activeEnd. -
dataSupplier
Supplier for additional data -
minReadSize
protected int minReadSize -
maxReadSize
protected int maxReadSizeMaximum number to read from the dataSupplier in one request -
isDataSupplierConsumed
protected boolean isDataSupplierConsumedFlag to indicate that the dataSupplier has been consumed This is the case when dataSupplier(buffer) returns -1
-
-
Constructor Details
-
BufferOverInputStream
-
-
Method Details
-
getKnownDataSize
public long getKnownDataSize() -
isDataSupplierConsumed
public boolean isDataSupplierConsumed() -
create
public static BufferOverInputStream create(InputStream in, int maxReadSize, int... preconfiguredBucketSizes) - Parameters:
maxReadSize- Maximum number of bytes to request form the input stream at oncein-maxReadSize-preconfiguredBucketSizes-- Returns:
-
getPosition
public static long getPosition(byte[][] buckets, int idx, int pos) -
getPointer
public static BufferOverInputStream.BucketPointer getPointer(byte[][] buckets, BufferOverInputStream.BucketPointer end, long pos) - Parameters:
buckets-pos-- Returns:
- Pointer to a valid location in the know data block or null
-
newChannel
- Specified by:
newChannelin interfaceChannelFactory<Seekable>
-
nextBucketSize
protected int nextBucketSize() -
doRead
-
loadDataUpTo
protected void loadDataUpTo(long requestedPos) Preload data up to including the requested position. It is inclusive in order to allow for checking whether the requested position is in range.- Parameters:
requestedPos-
-
loadData
protected void loadData(int needed) fetch a chunk from the input stream -
ensureCapacityInActiveBucket
protected void ensureCapacityInActiveBucket() -
main
- Throws:
Exception
-
main2
- Throws:
Exception
-
close
- Specified by:
closein interfaceAutoCloseable- Throws:
Exception
-