Class SeekableSourceOverSplit

java.lang.Object
net.sansa_stack.hadoop.core.SeekableSourceOverSplit
All Implemented Interfaces:
Closeable, AutoCloseable, org.aksw.commons.io.buffer.array.HasArrayOps<byte[]>, org.aksw.commons.io.input.ReadableChannelFactory<byte[]>, org.aksw.commons.io.input.ReadableChannelSource<byte[]>, org.aksw.commons.io.input.SeekableReadableChannelSource<byte[]>

public class SeekableSourceOverSplit extends Object implements org.aksw.commons.io.input.SeekableReadableChannelSource<byte[]>, Closeable
A seekable source over a split (usually a hadoop input split). When there is an attempt to read over the split boundary, then a "transition" action is called. This action may scan ahead for an end end position after the split boundary.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
     
    protected org.aksw.commons.io.input.ReadableChannel<byte[]>
    The total number of bytes that need to be read from base until the split boundary is reached.
    protected org.aksw.commons.io.input.SeekableReadableChannel<byte[]>
     
    protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]>
     
    protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]>
    The postamble buffer is only served if a limit is set via SeekableSourceOverSplit.Channel.setLimit(long) If no limit is set then the remainder of the stream is consumed which is assumed to include the postamble
     
    protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]>
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    SeekableSourceOverSplit(org.aksw.commons.io.input.ReadableChannel<byte[]> baseStream, org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> headBuffer, org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> tailBuffer, org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> postambleBuffer, NavigableMap<Long,Long> absPosToBlockOffset)
    If true then the headStream can no longer be used.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
     
    protected static SeekableSourceOverSplit
    create(org.aksw.commons.io.input.ReadableChannel<byte[]> baseStream, org.aksw.commons.io.input.ReadableChannel<byte[]> headStream, byte[] postambleBytes, NavigableMap<Long,Long> blockOffsetToAbsPos)
     
    createForBlockEncodedStream(org.aksw.commons.io.hadoop.SeekableInputStream inn, long splitPoint, byte[] postambleBytes)
     
    createForNonEncodedStream(org.aksw.commons.io.hadoop.SeekableInputStream in, long splitPoint, byte[] postambleBytes)
     
     
    org.aksw.commons.io.buffer.array.ArrayOps<byte[]>
     
    long
    getBlockForPos(long pos)
     
    protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]>
    getBufferByBaseOffset(long baseOffset)
     
    protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]>
    getBufferByIndex(int index)
     
    protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]>
     
    org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]>
     
    long
     
    long
     
    org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]>
     
    net.sansa_stack.hadoop.core.SeekableSourceOverSplit.Channel
     
    protected void
     
    long
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.aksw.commons.io.input.SeekableReadableChannelSource

    newReadableChannel, newReadableChannel, newReadableChannel
  • Field Details

    • baseStream

      protected org.aksw.commons.io.input.ReadableChannel<byte[]> baseStream
      The total number of bytes that need to be read from base until the split boundary is reached. A value of -1 indicates unknown. For non-encoded streams this is simply the length of the split.
    • headBuffer

      protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> headBuffer
    • tailBuffer

      protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> tailBuffer
    • postambleBuffer

      protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> postambleBuffer
      The postamble buffer is only served if a limit is set via SeekableSourceOverSplit.Channel.setLimit(long) If no limit is set then the remainder of the stream is consumed which is assumed to include the postamble
    • debufferedHead

      protected org.aksw.commons.io.input.SeekableReadableChannel<byte[]> debufferedHead
    • posToIndex

      protected NavigableMap<Long,Integer> posToIndex
    • absPosToBlockOffset

      protected NavigableMap<Long,Long> absPosToBlockOffset
  • Constructor Details

    • SeekableSourceOverSplit

      public SeekableSourceOverSplit(org.aksw.commons.io.input.ReadableChannel<byte[]> baseStream, org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> headBuffer, org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> tailBuffer, org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> postambleBuffer, NavigableMap<Long,Long> absPosToBlockOffset)
      If true then the headStream can no longer be used.
  • Method Details

    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
    • getBlockForPos

      public long getBlockForPos(long pos)
    • getKnownSize

      public long getKnownSize()
    • getAbsPosToBlockOffset

      public NavigableMap<Long,Long> getAbsPosToBlockOffset()
      Returns:
      null if the underlying stream is not based on blocks; otherwise a map of byte-offsets (staring from zero) to block offsets
    • getBufferByBaseOffset

      protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> getBufferByBaseOffset(long baseOffset)
    • getBufferByIndex

      protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> getBufferByIndex(int index)
    • getBufferByIndexUnsafe

      protected org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> getBufferByIndexUnsafe(int index)
    • setupTailBuffer

      protected void setupTailBuffer()
    • getHeadBuffer

      public org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> getHeadBuffer()
    • getTailBuffer

      public org.aksw.commons.io.buffer.array.BufferOverReadableChannel<byte[]> getTailBuffer()
    • createForNonEncodedStream

      public static SeekableSourceOverSplit createForNonEncodedStream(org.aksw.commons.io.hadoop.SeekableInputStream in, long splitPoint, byte[] postambleBytes)
    • createForBlockEncodedStream

      public static SeekableSourceOverSplit createForBlockEncodedStream(org.aksw.commons.io.hadoop.SeekableInputStream inn, long splitPoint, byte[] postambleBytes)
    • create

      protected static SeekableSourceOverSplit create(org.aksw.commons.io.input.ReadableChannel<byte[]> baseStream, org.aksw.commons.io.input.ReadableChannel<byte[]> headStream, byte[] postambleBytes, NavigableMap<Long,Long> blockOffsetToAbsPos)
    • getHeadSize

      public long getHeadSize()
    • newReadableChannel

      public net.sansa_stack.hadoop.core.SeekableSourceOverSplit.Channel newReadableChannel() throws IOException
      Specified by:
      newReadableChannel in interface org.aksw.commons.io.input.ReadableChannelFactory<byte[]>
      Specified by:
      newReadableChannel in interface org.aksw.commons.io.input.SeekableReadableChannelSource<byte[]>
      Throws:
      IOException
    • size

      public long size() throws IOException
      Specified by:
      size in interface org.aksw.commons.io.input.ReadableChannelSource<byte[]>
      Throws:
      IOException
    • getArrayOps

      public org.aksw.commons.io.buffer.array.ArrayOps<byte[]> getArrayOps()
      Specified by:
      getArrayOps in interface org.aksw.commons.io.buffer.array.HasArrayOps<byte[]>