Package net.sansa_stack.hadoop.util
Class FileSplitUtils
java.lang.Object
net.sansa_stack.hadoop.util.FileSplitUtils
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic <T> Stream<T>createFlow(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.mapreduce.InputFormat<?, T> inputFormat, org.apache.hadoop.mapreduce.InputSplit inputSplit) Create a flow of records for a given input split w.r.t.static <T> io.reactivex.rxjava3.core.Flowable<T>createFlow2(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.mapreduce.InputFormat<?, T> inputFormat, org.apache.hadoop.mapreduce.InputSplit inputSplit) createTestParameters(Map<String, com.google.common.collect.Range<Integer>> fileToNumSplits) Util method typically for use with split-related unit testsstatic InputStreamgetDecodedStreamFromSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit split, org.apache.hadoop.conf.Configuration job) Util method to open a decoded stream from a split.static List<org.apache.hadoop.mapreduce.InputSplit>listFileSplits(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits) static Stream<org.apache.hadoop.mapreduce.InputSplit>streamFileSplits(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits) Utility method to create a specific number of splits for a file.
-
Constructor Details
-
FileSplitUtils
public FileSplitUtils()
-
-
Method Details
-
getDecodedStreamFromSplit
public static InputStream getDecodedStreamFromSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit split, org.apache.hadoop.conf.Configuration job) throws IOException Util method to open a decoded stream from a split. Useful to read out header information from the first split in order to put it into the hadoop configuration before starting parallel processing.- Parameters:
split-job-- Returns:
- Throws:
IOException
-
streamFileSplits
public static Stream<org.apache.hadoop.mapreduce.InputSplit> streamFileSplits(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits) throws IOException Utility method to create a specific number of splits for a file.- Parameters:
path-fileLengthTotal-numSplits-- Returns:
- Throws:
IOException
-
listFileSplits
public static List<org.apache.hadoop.mapreduce.InputSplit> listFileSplits(org.apache.hadoop.fs.Path path, long fileLengthTotal, long numSplits) throws IOException - Throws:
IOException
-
createFlow
public static <T> Stream<T> createFlow(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.mapreduce.InputFormat<?, T> inputFormat, org.apache.hadoop.mapreduce.InputSplit inputSplit) Create a flow of records for a given input split w.r.t. a given input format -
createFlow2
public static <T> io.reactivex.rxjava3.core.Flowable<T> createFlow2(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.mapreduce.InputFormat<?, T> inputFormat, org.apache.hadoop.mapreduce.InputSplit inputSplit) -
createTestParameters
public static List<Object[]> createTestParameters(Map<String, com.google.common.collect.Range<Integer>> fileToNumSplits) Util method typically for use with split-related unit tests
-