Class CBZip2InputStreamAdapted
- All Implemented Interfaces:
Closeable,AutoCloseable,org.apache.hadoop.io.compress.bzip2.BZip2Constants
The decompression requires large amounts of memory. Thus you should call the
close() method as soon as possible, to force
CBZip2InputStream to release the allocated memory. See
CBZip2OutputStream for information about memory
usage.
CBZip2InputStream reads bytes from the compressed source stream via
the single byte read() method exclusively.
Thus you should consider to use a buffered source stream.
This Ant code was enhanced so that it can de-compress blocks of bzip2 data. Current position in the stream is an important statistic for Hadoop. For example in LineRecordReader, we solely depend on the current position in the stream to know about the progress. The notion of position becomes complicated for compressed files. The Hadoop splitting is done in terms of compressed file. But a compressed file deflates to a large amount of data. So we have handled this problem in the following way. On object creation time, we find the next block start delimiter. Once such a marker is found, the stream stops there (we discard any read compressed data in this process) and the position is reported as the beginning of the block start delimiter. At this point we are ready for actual reading (i.e. decompression) of data. The subsequent read calls give out data. The position is updated when the caller of this class has read off the current block + 1 bytes. In between the block reading, position is not updated. (We can only update the position on block boundaries).
Instances of this class are not threadsafe.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumA state machine to keep track of current state of the de-coder -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final longstatic final longFields inherited from interface org.apache.hadoop.io.compress.bzip2.BZip2Constants
baseBlockSize, END_OF_BLOCK, END_OF_STREAM, G_SIZE, MAX_ALPHA_SIZE, MAX_CODE_LEN, MAX_SELECTORS, N_GROUPS, N_ITERS, NUM_OVERSHOOT_BYTES, rNums, RUNA, RUNB -
Constructor Summary
ConstructorsConstructorDescriptionCBZip2InputStreamAdapted(InputStream in, org.apache.hadoop.io.compress.SplittableCompressionCodec.READ_MODE readMode) Constructs a new CBZip2InputStream which decompresses bytes read from the specified stream. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()longThis method reports the processed bytes so far.static longReturns the number of bytes between the current stream position and the immediate next BZip2 block marker.intread()intread(byte[] dest, int offs, int len) In CONTINOUS reading mode, this read method starts from the start of the compressed stream and end at the end of file by emitting un-compressed data.protected voidbooleanskipToNextMarker(long marker, int markerBitLength) This method tries to find the marker (passed to it as the first parameter) in the stream.protected voidupdateProcessedByteCount(int count) This method keeps track of raw processed compressed bytes.voidupdateReportedByteCount(int count) This method is called by the client of this class in case there are any corrections in the stream position.Methods inherited from class java.io.InputStream
available, mark, markSupported, nullInputStream, read, readAllBytes, readNBytes, readNBytes, reset, skip, skipNBytes, transferTo
-
Field Details
-
BLOCK_DELIMITER
public static final long BLOCK_DELIMITER- See Also:
-
EOS_DELIMITER
public static final long EOS_DELIMITER- See Also:
-
-
Constructor Details
-
CBZip2InputStreamAdapted
public CBZip2InputStreamAdapted(InputStream in, org.apache.hadoop.io.compress.SplittableCompressionCodec.READ_MODE readMode) throws IOException Constructs a new CBZip2InputStream which decompresses bytes read from the specified stream.Although BZip2 headers are marked with the magic "Bz" this constructor expects the next byte in the stream to be the first one after the magic. Thus callers have to skip the first two bytes. Otherwise this constructor will throw an exception.
- Throws:
IOException- if the stream content is malformed or an I/O error occurs.NullPointerException- if in == null
-
CBZip2InputStreamAdapted
- Throws:
IOException
-
-
Method Details
-
getProcessedByteCount
public long getProcessedByteCount()This method reports the processed bytes so far. Please note that this statistic is only updated on block boundaries and only when the stream is initiated in BYBLOCK mode. -
updateProcessedByteCount
protected void updateProcessedByteCount(int count) This method keeps track of raw processed compressed bytes.- Parameters:
count- count is the number of bytes to be added to raw processed bytes
-
updateReportedByteCount
public void updateReportedByteCount(int count) This method is called by the client of this class in case there are any corrections in the stream position. One common example is when client of this code removes starting BZ characters from the compressed stream.- Parameters:
count- count bytes are added to the reported bytes
-
skipToNextMarker
public boolean skipToNextMarker(long marker, int markerBitLength) throws IOException, IllegalArgumentException This method tries to find the marker (passed to it as the first parameter) in the stream. It can find bit patterns of length <= 63 bits. Specifically this method is used in CBZip2InputStream to find the end of block (EOB) delimiter in the stream, starting from the current position of the stream. If marker is found, the stream position will be at the byte containing the starting bit of the marker.- Parameters:
marker- The bit pattern to be found in the streammarkerBitLength- No of bits in the marker- Returns:
- true if the marker was found otherwise false
- Throws:
IOExceptionIllegalArgumentException- if marketBitLength is greater than 63
-
reportCRCError
- Throws:
IOException
-
numberOfBytesTillNextMarker
Returns the number of bytes between the current stream position and the immediate next BZip2 block marker.- Parameters:
in- The InputStream- Returns:
- long Number of bytes between current stream position and the next BZip2 block start marker.
- Throws:
IOException
-
read
- Specified by:
readin classInputStream- Throws:
IOException
-
read
In CONTINOUS reading mode, this read method starts from the start of the compressed stream and end at the end of file by emitting un-compressed data. In this mode stream positioning is not announced and should be ignored. In BYBLOCK reading mode, this read method informs about the end of a BZip2 block by returning EOB. At this event, the compressed stream position is also announced. This announcement tells that how much of the compressed stream has been de-compressed and read out of this class. In between EOB events, the stream position is not updated.- Overrides:
readin classInputStream- Returns:
- int The return value greater than 0 are the bytes read. A value of -1 means end of stream while -2 represents end of block
- Throws:
IOException- if the stream content is malformed or an I/O error occurs.
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classInputStream- Throws:
IOException
-