Package net.sansa_stack.hadoop.util
Class FileSplitUtils
- java.lang.Object
-
- net.sansa_stack.hadoop.util.FileSplitUtils
-
public class FileSplitUtils extends Object
-
-
Constructor Summary
Constructors Constructor Description FileSplitUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <T> io.reactivex.rxjava3.core.Flowable<T>
createFlow(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.mapreduce.InputFormat<?,T> inputFormat, org.apache.hadoop.mapreduce.InputSplit inputSplit)
Create a flow of records for a given input split w.r.t.static List<Object[]>
createTestParameters(Map<String,com.google.common.collect.Range<Integer>> fileToNumSplits)
Util method typically for use with split-related unit testsstatic InputStream
getDecodedStreamFromSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit split, org.apache.hadoop.conf.Configuration job)
Util method to open a decoded stream from a split.static List<org.apache.hadoop.mapreduce.InputSplit>
listFileSplits(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits)
static Stream<org.apache.hadoop.mapreduce.InputSplit>
streamFileSplits(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits)
Utility method to create a specific number of splits for a file.
-
-
-
Method Detail
-
getDecodedStreamFromSplit
public static InputStream getDecodedStreamFromSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit split, org.apache.hadoop.conf.Configuration job) throws IOException
Util method to open a decoded stream from a split. Useful to read out header information from the first split in order to put it into the hadoop configuration before starting parallel processing.- Parameters:
split
-job
-- Returns:
- Throws:
IOException
-
streamFileSplits
public static Stream<org.apache.hadoop.mapreduce.InputSplit> streamFileSplits(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits) throws IOException
Utility method to create a specific number of splits for a file.- Parameters:
path
-fileLengthTotal
-numSplits
-- Returns:
- Throws:
IOException
-
listFileSplits
public static List<org.apache.hadoop.mapreduce.InputSplit> listFileSplits(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits) throws IOException
- Throws:
IOException
-
createFlow
public static <T> io.reactivex.rxjava3.core.Flowable<T> createFlow(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.mapreduce.InputFormat<?,T> inputFormat, org.apache.hadoop.mapreduce.InputSplit inputSplit)
Create a flow of records for a given input split w.r.t. a given input format
-
-