Class FileSplitUtils


  • public class FileSplitUtils
    extends Object
    • Constructor Summary

      Constructors 
      Constructor Description
      FileSplitUtils()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <T> io.reactivex.rxjava3.core.Flowable<T> createFlow​(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.mapreduce.InputFormat<?,​T> inputFormat, org.apache.hadoop.mapreduce.InputSplit inputSplit)
      Create a flow of records for a given input split w.r.t.
      static List<Object[]> createTestParameters​(Map<String,​com.google.common.collect.Range<Integer>> fileToNumSplits)
      Util method typically for use with split-related unit tests
      static InputStream getDecodedStreamFromSplit​(org.apache.hadoop.mapreduce.lib.input.FileSplit split, org.apache.hadoop.conf.Configuration job)
      Util method to open a decoded stream from a split.
      static List<org.apache.hadoop.mapreduce.InputSplit> listFileSplits​(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits)  
      static Stream<org.apache.hadoop.mapreduce.InputSplit> streamFileSplits​(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits)
      Utility method to create a specific number of splits for a file.
    • Constructor Detail

      • FileSplitUtils

        public FileSplitUtils()
    • Method Detail

      • getDecodedStreamFromSplit

        public static InputStream getDecodedStreamFromSplit​(org.apache.hadoop.mapreduce.lib.input.FileSplit split,
                                                            org.apache.hadoop.conf.Configuration job)
                                                     throws IOException
        Util method to open a decoded stream from a split. Useful to read out header information from the first split in order to put it into the hadoop configuration before starting parallel processing.
        Parameters:
        split -
        job -
        Returns:
        Throws:
        IOException
      • streamFileSplits

        public static Stream<org.apache.hadoop.mapreduce.InputSplit> streamFileSplits​(org.apache.hadoop.fs.Path path,
                                                                                      long fileLengthTotal,
                                                                                      long numSplits)
                                                                               throws IOException
        Utility method to create a specific number of splits for a file.
        Parameters:
        path -
        fileLengthTotal -
        numSplits -
        Returns:
        Throws:
        IOException
      • listFileSplits

        public static List<org.apache.hadoop.mapreduce.InputSplit> listFileSplits​(org.apache.hadoop.fs.Path path,
                                                                                  long fileLengthTotal,
                                                                                  long numSplits)
                                                                           throws IOException
        Throws:
        IOException
      • createFlow

        public static <T> io.reactivex.rxjava3.core.Flowable<T> createFlow​(org.apache.hadoop.mapreduce.Job job,
                                                                           org.apache.hadoop.mapreduce.InputFormat<?,​T> inputFormat,
                                                                           org.apache.hadoop.mapreduce.InputSplit inputSplit)
        Create a flow of records for a given input split w.r.t. a given input format
      • createTestParameters

        public static List<Object[]> createTestParameters​(Map<String,​com.google.common.collect.Range<Integer>> fileToNumSplits)
        Util method typically for use with split-related unit tests