Class RecordReaderRdfTrigDataset
java.lang.Object
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T>
net.sansa_stack.hadoop.core.RecordReaderGenericBase<U,G,A,T>
net.sansa_stack.hadoop.format.jena.base.RecordReaderGenericRdfBase<U,G,A,T>
net.sansa_stack.hadoop.format.jena.base.RecordReaderGenericRdfAccumulatingBase<org.apache.jena.sparql.core.Quad,org.apache.jena.graph.Node,org.aksw.jenax.arq.dataset.api.DatasetOneNg,org.aksw.jenax.arq.dataset.api.DatasetOneNg>
net.sansa_stack.hadoop.format.jena.trig.RecordReaderRdfTrigDataset
- All Implemented Interfaces:
Closeable,AutoCloseable
public class RecordReaderRdfTrigDataset
extends RecordReaderGenericRdfAccumulatingBase<org.apache.jena.sparql.core.Quad,org.apache.jena.graph.Node,org.aksw.jenax.arq.dataset.api.DatasetOneNg,org.aksw.jenax.arq.dataset.api.DatasetOneNg>
RecordReader for the Trig RDF format that groups consecutive quads having the
same IRI for the graph component into Datasets.
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from class net.sansa_stack.hadoop.core.RecordReaderGenericBase
RecordReaderGenericBase.ReadTooFarException -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringprotected static final CustomPatternprotected static final CustomPatternThis is the pattern for directives or trig data where graphs are separated by '{'.protected static final CustomPatternThis is the pattern for directives or (compact) IRIs.Fields inherited from class net.sansa_stack.hadoop.format.jena.base.RecordReaderGenericRdfBase
baseIri, baseIriKey, headerBytesKey, lang, prefixesMaxLengthKey, prefixMapFields inherited from class net.sansa_stack.hadoop.core.RecordReaderGenericBase
accumulating, codec, currentKey, currentValue, datasetFlow, decompressor, EMPTY_BYTE_ARRAY, enableStats, isEncoded, isFirstSplit, maxExtraByteCount, maxRecordLength, maxRecordLengthKey, minRecordLength, minRecordLengthKey, postambleBytes, preambleBytes, probeElementCount, probeElementCountKey, probeRecordCount, probeRecordCountKey, rawStream, recordFlowCloseable, recordStartPattern, regionStartSearchReadOverRegionEnd, regionStartSearchReadOverSplitEnd, skipRecordCount, split, splitEnd, splitId, splitLength, splitName, splitStart, stream, tailByteBuffer, tailBytes, tailEltBuffer, tailElts, tailEltsTime, tailRecordOffset, totalEltCount, totalRecordCount -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected Stream<org.apache.jena.sparql.core.Quad>parse(InputStream in, boolean isProbe) Create a flowable from the input stream.protected io.reactivex.rxjava3.core.Flowable<org.apache.jena.sparql.core.Quad>parse(Callable<InputStream> inputStreamSupplier) Methods inherited from class net.sansa_stack.hadoop.format.jena.base.RecordReaderGenericRdfBase
initialize, setupParserMethods inherited from class net.sansa_stack.hadoop.core.RecordReaderGenericBase
abbreviate, abbreviate, abbreviateAsUTF8, aggregate, aggregate, close, convert, createMatcherFactory, createRecordFlow, detectTail, didHitSplitBound, effectiveInputStream, effectiveInputStreamSupp, findFirstPositionWithProbeSuccess, findNextRegion, getCurrentKey, getCurrentValue, getPos, getPosition, getProgress, getStats, initRecordFlow, lines, logClose, logUnexpectedClose, nextKeyValue, parseFromSeekable, prober, setStreamToInterval, unbufferedStream
-
Field Details
-
RECORD_MINLENGTH_KEY
- See Also:
-
RECORD_MAXLENGTH_KEY
- See Also:
-
RECORD_PROBECOUNT_KEY
- See Also:
-
ELEMENT_PROBECOUNT_KEY
- See Also:
-
PREFIXES_MAXLENGTH_KEY
- See Also:
-
trigFwdPatternGraphFollowedByCurlyBrace
This is the pattern for directives or trig data where graphs are separated by '{'. This pattern was used -
trigFwdPatternNew
This is the pattern for directives or (compact) IRIs. Trig graphs must be IRIs. -
trigFwdPattern
-
-
Constructor Details
-
RecordReaderRdfTrigDataset
public RecordReaderRdfTrigDataset()
-
-
Method Details
-
parse
Description copied from class:RecordReaderGenericBaseCreate a flowable from the input stream. The input stream may be incorrectly positioned in which case the Flowable is expected to indicate this by raising an error event.- Specified by:
parsein classRecordReaderGenericBase<org.apache.jena.sparql.core.Quad,org.apache.jena.graph.Node, org.aksw.jenax.arq.dataset.api.DatasetOneNg, org.aksw.jenax.arq.dataset.api.DatasetOneNg> - Parameters:
in- A supplier of input streams. May supply the same underlying stream on each call hence only at most a single stream should be taken from the supplier. Supplied streams are safe to use in try-with-resources blocks (possibly using CloseShieldInputStream). Taken streams should be closed by the client code.isProbe- Whether the parser should be configured for probing. For example, it is desireable to suppress porse errors during probing. Also, for probing the parser may optimize itself for minimizing latency of yielding items rather than overall throughput.- Returns:
-
parse
protected io.reactivex.rxjava3.core.Flowable<org.apache.jena.sparql.core.Quad> parse(Callable<InputStream> inputStreamSupplier)
-