Class FileInputFormatRdfBase<T>
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
net.sansa_stack.hadoop.format.jena.base.FileInputFormatRdfBase<T>
- Type Parameters:
T-
- All Implemented Interfaces:
CanParseRdf
- Direct Known Subclasses:
FileInputFormatRdfNQuads,FileInputFormatRdfNTriples,FileInputFormatRdfTrigDataset,FileInputFormatRdfTrigQuad,FileInputFormatRdfTurtleTriple
public abstract class FileInputFormatRdfBase<T>
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
implements CanParseRdf
Base class for unit testing of reading an RDF file with
an arbitrary number of splits.
RDF is read as Datasets which means that triples are expanded to quads.
(We could generalize to tuples of RDF terms using Bindings)
The only method that need to be overriden is createRecordReaderActual(InputSplit, TaskAttemptContext).
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringprotected org.apache.jena.riot.LangInput languagestatic final longstatic final Stringprotected StringFields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE -
Constructor Summary
ConstructorsConstructorDescriptionFileInputFormatRdfBase(org.apache.jena.riot.Lang lang, String prefixesLengthMaxKey) -
Method Summary
Modifier and TypeMethodDescriptionfinal org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context) abstract org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T> createRecordReaderActual(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context) static org.apache.jena.rdf.model.ModelgetModel(org.apache.hadoop.conf.Configuration conf) Extract a Model from a hadoop conf usingPREFIXES_KEYstatic org.apache.jena.rdf.model.ModelExtract a Model from a hadoop conf.longgetPrefixByteCount(org.apache.hadoop.conf.Configuration conf) List<org.apache.hadoop.mapreduce.InputSplit>getSplits(org.apache.hadoop.mapreduce.JobContext job) booleanisSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path file) org.apache.jena.riot.system.PrefixMapparsePrefixes(InputStream in, org.apache.hadoop.conf.Configuration conf) Public method to parse prefixes w.r.t.org.apache.jena.riot.system.PrefixMapreadPrefixes(Callable<InputStream> inSupp, org.apache.hadoop.conf.Configuration conf) static org.apache.jena.riot.system.PrefixMapreadPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap sink, InputStream in, org.apache.jena.riot.Lang lang) static org.apache.jena.riot.system.PrefixMapreadPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap sink, InputStream in, org.apache.jena.riot.Lang lang, Long limit) Read prefixes from an input stream.static org.apache.jena.riot.system.PrefixMapreadPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap prefixModel, Callable<InputStream> inSupp, org.apache.jena.riot.Lang lang, Long limit) Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize, shrinkStatusMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface net.sansa_stack.hadoop.format.jena.base.CanParseRdf
parsePrefixes
-
Field Details
-
PREFIXES_KEY
- See Also:
-
BASE_IRI_KEY
- See Also:
-
PARSED_PREFIXES_LENGTH_DEFAULT
public static final long PARSED_PREFIXES_LENGTH_DEFAULT- See Also:
-
lang
protected org.apache.jena.riot.Lang langInput language -
prefixesLengthMaxKey
-
-
Constructor Details
-
FileInputFormatRdfBase
-
-
Method Details
-
isSplitable
public boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path file) - Overrides:
isSplitablein classorg.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
-
createRecordReader
public final org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context) - Specified by:
createRecordReaderin classorg.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,T>
-
createRecordReaderActual
public abstract org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T> createRecordReaderActual(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context) -
readPrefixesIntoModel
public static org.apache.jena.riot.system.PrefixMap readPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap prefixModel, Callable<InputStream> inSupp, org.apache.jena.riot.Lang lang, Long limit) - Parameters:
prefixModel- If null then a default model will be generatedinSupp- An input stream supplier. taken stream will be closed.lang- The RDF language. Must not be null.limit-- Returns:
- Throws:
Exception
-
readPrefixes
public org.apache.jena.riot.system.PrefixMap readPrefixes(Callable<InputStream> inSupp, org.apache.hadoop.conf.Configuration conf) -
parsePrefixes
public org.apache.jena.riot.system.PrefixMap parsePrefixes(InputStream in, org.apache.hadoop.conf.Configuration conf) Public method to parse prefixes w.r.t. this input format configuration- Specified by:
parsePrefixesin interfaceCanParseRdf
-
readPrefixesIntoModel
public static org.apache.jena.riot.system.PrefixMap readPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap sink, InputStream in, org.apache.jena.riot.Lang lang, Long limit) Read prefixes from an input stream.- Parameters:
limit- If non-null, limits the number of bytes that can be read from the input stream to the given value.
-
getPrefixByteCount
public long getPrefixByteCount(org.apache.hadoop.conf.Configuration conf) -
readPrefixesIntoModel
public static org.apache.jena.riot.system.PrefixMap readPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap sink, InputStream in, org.apache.jena.riot.Lang lang) -
getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job) throws IOException - Overrides:
getSplitsin classorg.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T> - Throws:
IOException
-
getModel
public static org.apache.jena.rdf.model.Model getModel(org.apache.hadoop.conf.Configuration conf) Extract a Model from a hadoop conf usingPREFIXES_KEY -
getModel
public static org.apache.jena.rdf.model.Model getModel(org.apache.hadoop.conf.Configuration conf, String key) Extract a Model from a hadoop conf. Result is never null; empty if there was no entry for the key or exception on parse error.
-