Class RecordReaderJsonArray
java.lang.Object
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T>
net.sansa_stack.hadoop.core.RecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement>
net.sansa_stack.hadoop.format.gson.json.RecordReaderJsonArray
- All Implemented Interfaces:
Closeable,AutoCloseable
public class RecordReaderJsonArray
extends RecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement,com.google.gson.JsonElement>
-
Nested Class Summary
Nested classes/interfaces inherited from class net.sansa_stack.hadoop.core.RecordReaderGenericBase
RecordReaderGenericBase.ReadTooFarException -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected com.google.gson.Gsonprotected static final CustomPatternstatic final Stringstatic final Stringstatic final StringFields inherited from class net.sansa_stack.hadoop.core.RecordReaderGenericBase
accumulating, codec, currentKey, currentValue, datasetFlow, decompressor, EMPTY_BYTE_ARRAY, enableStats, isEncoded, isFirstSplit, maxExtraByteCount, maxRecordLength, maxRecordLengthKey, minRecordLength, minRecordLengthKey, postambleBytes, preambleBytes, probeElementCount, probeElementCountKey, probeRecordCount, probeRecordCountKey, rawStream, recordFlowCloseable, recordStartPattern, regionStartSearchReadOverRegionEnd, regionStartSearchReadOverSplitEnd, skipRecordCount, split, splitEnd, splitId, splitLength, splitName, splitStart, stream, tailByteBuffer, tailBytes, tailEltBuffer, tailElts, tailEltsTime, tailRecordOffset, totalEltCount, totalRecordCount -
Constructor Summary
ConstructorsConstructorDescriptionRecordReaderJsonArray(com.google.gson.Gson gson) RecordReaderJsonArray(RecordReaderConf conf, com.google.gson.Gson gson) -
Method Summary
Modifier and TypeMethodDescriptionprotected InputStreamAlways replace the first character (which is either a comma or open bracket) with an open bracket in order to mimick a JSON array start.voidinitialize(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context) Read out config paramaters (prefixes, length thresholds, ...) and examine the codec in order to set an internal flag whether the stream will be encoded or not.protected Stream<com.google.gson.JsonElement>parse(InputStream in, boolean isProbe) Create a flowable from the input stream.protected io.reactivex.rxjava3.core.Flowable<com.google.gson.JsonElement>parse(Callable<InputStream> inputStreamSupplier) Methods inherited from class net.sansa_stack.hadoop.core.RecordReaderGenericBase
abbreviate, abbreviate, abbreviateAsUTF8, aggregate, aggregate, close, convert, createMatcherFactory, createRecordFlow, detectTail, didHitSplitBound, effectiveInputStreamSupp, findFirstPositionWithProbeSuccess, findNextRegion, getCurrentKey, getCurrentValue, getPos, getPosition, getProgress, getStats, initRecordFlow, lines, logClose, logUnexpectedClose, nextKeyValue, parseFromSeekable, prober, setStreamToInterval, unbufferedStream
-
Field Details
-
RECORD_MINLENGTH_KEY
- See Also:
-
RECORD_MAXLENGTH_KEY
- See Also:
-
RECORD_PROBECOUNT_KEY
- See Also:
-
jsonFwdPattern
-
gson
protected com.google.gson.Gson gson
-
-
Constructor Details
-
RecordReaderJsonArray
public RecordReaderJsonArray() -
RecordReaderJsonArray
public RecordReaderJsonArray(com.google.gson.Gson gson) -
RecordReaderJsonArray
-
-
Method Details
-
initialize
public void initialize(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException Description copied from class:RecordReaderGenericBaseRead out config paramaters (prefixes, length thresholds, ...) and examine the codec in order to set an internal flag whether the stream will be encoded or not.- Overrides:
initializein classRecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement, com.google.gson.JsonElement, com.google.gson.JsonElement> - Throws:
IOException
-
effectiveInputStream
Always replace the first character (which is either a comma or open bracket) with an open bracket in order to mimick a JSON array start.- Overrides:
effectiveInputStreamin classRecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement, com.google.gson.JsonElement, com.google.gson.JsonElement> - Parameters:
base- The base input stream- Returns:
-
parse
Description copied from class:RecordReaderGenericBaseCreate a flowable from the input stream. The input stream may be incorrectly positioned in which case the Flowable is expected to indicate this by raising an error event.- Specified by:
parsein classRecordReaderGenericBase<com.google.gson.JsonElement,com.google.gson.JsonElement, com.google.gson.JsonElement, com.google.gson.JsonElement> - Parameters:
in- A supplier of input streams. May supply the same underlying stream on each call hence only at most a single stream should be taken from the supplier. Supplied streams are safe to use in try-with-resources blocks (possibly using CloseShieldInputStream). Taken streams should be closed by the client code.isProbe- Whether the parser should be configured for probing. For example, it is desireable to suppress porse errors during probing. Also, for probing the parser may optimize itself for minimizing latency of yielding items rather than overall throughput.- Returns:
-
parse
protected io.reactivex.rxjava3.core.Flowable<com.google.gson.JsonElement> parse(Callable<InputStream> inputStreamSupplier)
-