Class RecordReaderJsonArray
- java.lang.Object
-
- org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T>
-
- net.sansa_stack.hadoop.core.RecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement>
-
- net.sansa_stack.hadoop.format.gson.json.RecordReaderJsonArray
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class RecordReaderJsonArray extends RecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement>
-
-
Field Summary
Fields Modifier and Type Field Description protected com.google.gson.Gson
gson
protected static Pattern
jsonFwdPattern
static String
RECORD_MAXLENGTH_KEY
static String
RECORD_MINLENGTH_KEY
static String
RECORD_PROBECOUNT_KEY
-
Fields inherited from class net.sansa_stack.hadoop.core.RecordReaderGenericBase
accumulating, codec, currentKey, currentValue, datasetFlow, decompressor, EMPTY_BYTE_ARRAY, isEncoded, isFirstSplit, maxRecordLength, maxRecordLengthKey, minRecordLength, minRecordLengthKey, postambleBytes, preambleBytes, probeRecordCount, probeRecordCountKey, raisedThrowable, rawStream, recordStartPattern, split, splitEnd, splitLength, splitStart, stream
-
-
Constructor Summary
Constructors Constructor Description RecordReaderJsonArray()
RecordReaderJsonArray(com.google.gson.Gson gson)
RecordReaderJsonArray(String minRecordLengthKey, String maxRecordLengthKey, String probeRecordCountKey, Pattern recordSearchPattern, com.google.gson.Gson gson)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected InputStream
effectiveInputStream(InputStream base)
Always replace the first character (which is either a comma or open bracket) with an open bracket in order to mimick a JSON array start.void
initialize(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
Read out config paramaters (prefixes, length thresholds, ...) and examine the codec in order to set an internal flag whether the stream will be encoded or not.protected io.reactivex.rxjava3.core.Flowable<com.google.gson.JsonElement>
parse(Callable<InputStream> inputStreamSupplier)
Create a flowable from the input stream.-
Methods inherited from class net.sansa_stack.hadoop.core.RecordReaderGenericBase
aggregate, close, createRecordFlow, didHitSplitBound, effectiveInputStreamSupp, findFirstPositionWithProbeSuccess, findNextRecord, getCurrentKey, getCurrentValue, getPos, getProgress, initRecordFlow, lines, logClose, logUnexpectedClose, nextKeyValue, parseFromSeekable, prober, setStreamToInterval
-
-
-
-
Field Detail
-
RECORD_MINLENGTH_KEY
public static final String RECORD_MINLENGTH_KEY
- See Also:
- Constant Field Values
-
RECORD_MAXLENGTH_KEY
public static final String RECORD_MAXLENGTH_KEY
- See Also:
- Constant Field Values
-
RECORD_PROBECOUNT_KEY
public static final String RECORD_PROBECOUNT_KEY
- See Also:
- Constant Field Values
-
jsonFwdPattern
protected static final Pattern jsonFwdPattern
-
gson
protected com.google.gson.Gson gson
-
-
Method Detail
-
initialize
public void initialize(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException
Description copied from class:RecordReaderGenericBase
Read out config paramaters (prefixes, length thresholds, ...) and examine the codec in order to set an internal flag whether the stream will be encoded or not.- Overrides:
initialize
in classRecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement>
- Throws:
IOException
-
effectiveInputStream
protected InputStream effectiveInputStream(InputStream base)
Always replace the first character (which is either a comma or open bracket) with an open bracket in order to mimick a JSON array start.- Overrides:
effectiveInputStream
in classRecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement>
- Parameters:
base
- The base input stream- Returns:
-
parse
protected io.reactivex.rxjava3.core.Flowable<com.google.gson.JsonElement> parse(Callable<InputStream> inputStreamSupplier)
Description copied from class:RecordReaderGenericBase
Create a flowable from the input stream. The input stream may be incorrectly positioned in which case the Flowable is expected to indicate this by raising an error event.- Specified by:
parse
in classRecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement>
- Parameters:
inputStreamSupplier
- A supplier of input streams. May supply the same underlying stream on each call hence only at most a single stream should be taken from the supplier. Supplied streams are safe to use in try-with-resources blocks (possibly using CloseShieldInputStream). Taken streams should be closed by the client code.- Returns:
-
-