Class BZip2CodecAdapted

java.lang.Object
org.aksw.commons.io.hadoop.compress.bzip2.BZip2CodecAdapted
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.io.compress.CompressionCodec, org.apache.hadoop.io.compress.SplittableCompressionCodec

@Public @Evolving public class BZip2CodecAdapted extends Object implements org.apache.hadoop.conf.Configurable, org.apache.hadoop.io.compress.SplittableCompressionCodec
This class provides output and input streams for bzip2 compression and decompression. It uses the native bzip2 library on the system if possible, else it uses a pure-Java implementation of the bzip2 algorithm. The configuration parameter io.compression.codec.bzip2.library can be used to control this behavior. In the pure-Java mode, the Compressor and Decompressor interfaces are not implemented. Therefore, in that mode, those methods of CompressionCodec which have a Compressor or Decompressor type argument, throw UnsupportedOperationException. Currently, support for splittability is available only in the pure-Java mode; therefore, if a SplitCompressionInputStream is requested, the pure-Java implementation is used, regardless of the setting of the configuration parameter mentioned above.
  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.hadoop.io.compress.CompressionCodec

    org.apache.hadoop.io.compress.CompressionCodec.Util

    Nested classes/interfaces inherited from interface org.apache.hadoop.io.compress.SplittableCompressionCodec

    org.apache.hadoop.io.compress.SplittableCompressionCodec.READ_MODE
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new instance of BZip2Codec.
    BZip2CodecAdapted(int bufferSize)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.hadoop.io.compress.Compressor
    Create a new Compressor for use by this CompressionCodec.
    org.apache.hadoop.io.compress.Decompressor
    Create a new Decompressor for use by this CompressionCodec.
    org.apache.hadoop.io.compress.CompressionInputStream
    Create a CompressionInputStream that will read from the given input stream and return a stream for uncompressed data.
    org.apache.hadoop.io.compress.CompressionInputStream
    createInputStream(InputStream in, org.apache.hadoop.io.compress.Decompressor decompressor)
    Create a CompressionInputStream that will read from the given InputStream with the given Decompressor, and return a stream for uncompressed data.
    org.apache.hadoop.io.compress.SplitCompressionInputStream
    createInputStream(InputStream seekableIn, org.apache.hadoop.io.compress.Decompressor decompressor, long start, long end, org.apache.hadoop.io.compress.SplittableCompressionCodec.READ_MODE readMode)
    Creates CompressionInputStream to be used to read off uncompressed data in one of the two reading modes.
    org.apache.hadoop.io.compress.CompressionOutputStream
    Create a CompressionOutputStream that will write to the given OutputStream.
    org.apache.hadoop.io.compress.CompressionOutputStream
    createOutputStream(OutputStream out, org.apache.hadoop.io.compress.Compressor compressor)
    Create a CompressionOutputStream that will write to the given OutputStream with the given Compressor.
    Class<? extends org.apache.hadoop.io.compress.Compressor>
    Get the type of Compressor needed by this CompressionCodec.
    org.apache.hadoop.conf.Configuration
    Return the configuration used by this object.
    Class<? extends org.apache.hadoop.io.compress.Decompressor>
    Get the type of Decompressor needed by this CompressionCodec.
    .bz2 is recognized as the default extension for compressed BZip2 files
    void
    setConf(org.apache.hadoop.conf.Configuration conf)
    Set the configuration to be used by this object.

    Methods inherited from class Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • BZip2CodecAdapted

      public BZip2CodecAdapted()
      Creates a new instance of BZip2Codec.
    • BZip2CodecAdapted

      public BZip2CodecAdapted(int bufferSize)
  • Method Details

    • setConf

      public void setConf(org.apache.hadoop.conf.Configuration conf)
      Set the configuration to be used by this object.
      Specified by:
      setConf in interface org.apache.hadoop.conf.Configurable
      Parameters:
      conf - the configuration object.
    • getConf

      public org.apache.hadoop.conf.Configuration getConf()
      Return the configuration used by this object.
      Specified by:
      getConf in interface org.apache.hadoop.conf.Configurable
      Returns:
      the configuration object used by this objec.
    • createOutputStream

      public org.apache.hadoop.io.compress.CompressionOutputStream createOutputStream(OutputStream out) throws IOException
      Create a CompressionOutputStream that will write to the given OutputStream.
      Specified by:
      createOutputStream in interface org.apache.hadoop.io.compress.CompressionCodec
      Parameters:
      out - the location for the final output stream
      Returns:
      a stream the user can write uncompressed data to, to have it compressed
      Throws:
      IOException
    • createOutputStream

      public org.apache.hadoop.io.compress.CompressionOutputStream createOutputStream(OutputStream out, org.apache.hadoop.io.compress.Compressor compressor) throws IOException
      Create a CompressionOutputStream that will write to the given OutputStream with the given Compressor.
      Specified by:
      createOutputStream in interface org.apache.hadoop.io.compress.CompressionCodec
      Parameters:
      out - the location for the final output stream
      compressor - compressor to use
      Returns:
      a stream the user can write uncompressed data to, to have it compressed
      Throws:
      IOException
    • getCompressorType

      public Class<? extends org.apache.hadoop.io.compress.Compressor> getCompressorType()
      Get the type of Compressor needed by this CompressionCodec.
      Specified by:
      getCompressorType in interface org.apache.hadoop.io.compress.CompressionCodec
      Returns:
      the type of compressor needed by this codec.
    • createCompressor

      public org.apache.hadoop.io.compress.Compressor createCompressor()
      Create a new Compressor for use by this CompressionCodec.
      Specified by:
      createCompressor in interface org.apache.hadoop.io.compress.CompressionCodec
      Returns:
      a new compressor for use by this codec
    • createInputStream

      public org.apache.hadoop.io.compress.CompressionInputStream createInputStream(InputStream in) throws IOException
      Create a CompressionInputStream that will read from the given input stream and return a stream for uncompressed data.
      Specified by:
      createInputStream in interface org.apache.hadoop.io.compress.CompressionCodec
      Parameters:
      in - the stream to read compressed bytes from
      Returns:
      a stream to read uncompressed bytes from
      Throws:
      IOException
    • createInputStream

      public org.apache.hadoop.io.compress.CompressionInputStream createInputStream(InputStream in, org.apache.hadoop.io.compress.Decompressor decompressor) throws IOException
      Create a CompressionInputStream that will read from the given InputStream with the given Decompressor, and return a stream for uncompressed data.
      Specified by:
      createInputStream in interface org.apache.hadoop.io.compress.CompressionCodec
      Parameters:
      in - the stream to read compressed bytes from
      decompressor - decompressor to use
      Returns:
      a stream to read uncompressed bytes from
      Throws:
      IOException
    • createInputStream

      public org.apache.hadoop.io.compress.SplitCompressionInputStream createInputStream(InputStream seekableIn, org.apache.hadoop.io.compress.Decompressor decompressor, long start, long end, org.apache.hadoop.io.compress.SplittableCompressionCodec.READ_MODE readMode) throws IOException
      Creates CompressionInputStream to be used to read off uncompressed data in one of the two reading modes. i.e. Continuous or Blocked reading modes
      Specified by:
      createInputStream in interface org.apache.hadoop.io.compress.SplittableCompressionCodec
      Parameters:
      seekableIn - The InputStream
      start - The start offset into the compressed stream
      end - The end offset into the compressed stream
      readMode - Controls whether progress is reported continuously or only at block boundaries.
      Returns:
      CompressionInputStream for BZip2 aligned at block boundaries
      Throws:
      IOException
    • getDecompressorType

      public Class<? extends org.apache.hadoop.io.compress.Decompressor> getDecompressorType()
      Get the type of Decompressor needed by this CompressionCodec.
      Specified by:
      getDecompressorType in interface org.apache.hadoop.io.compress.CompressionCodec
      Returns:
      the type of decompressor needed by this codec.
    • createDecompressor

      public org.apache.hadoop.io.compress.Decompressor createDecompressor()
      Create a new Decompressor for use by this CompressionCodec.
      Specified by:
      createDecompressor in interface org.apache.hadoop.io.compress.CompressionCodec
      Returns:
      a new decompressor for use by this codec
    • getDefaultExtension

      public String getDefaultExtension()
      .bz2 is recognized as the default extension for compressed BZip2 files
      Specified by:
      getDefaultExtension in interface org.apache.hadoop.io.compress.CompressionCodec
      Returns:
      A String telling the default bzip2 file extension