Class FileInputFormatRdfBase<T>
- java.lang.Object
-
- org.apache.hadoop.mapreduce.InputFormat<K,V>
-
- org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
-
- net.sansa_stack.hadoop.format.jena.base.FileInputFormatRdfBase<T>
-
- Type Parameters:
T
-
- Direct Known Subclasses:
FileInputFormatRdfTrigDataset
,FileInputFormatRdfTrigQuad
,FileInputFormatRdfTurtleTriple
public abstract class FileInputFormatRdfBase<T> extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
Base class for unit testing of reading an RDF file with an arbitrary number of splits. RDF is read as Datasets which means that triples are expanded to quads.(We could generalize to tuples of RDF terms using Bindings)
The only method that need to be overriden is
createRecordReaderActual(InputSplit, TaskAttemptContext)
.
-
-
Field Summary
Fields Modifier and Type Field Description static String
BASE_IRI_KEY
protected org.apache.jena.riot.Lang
lang
Input languagestatic long
PARSED_PREFIXES_LENGTH_DEFAULT
static String
PREFIXES_KEY
protected String
prefixesLengthMaxKey
-
Constructor Summary
Constructors Constructor Description FileInputFormatRdfBase(org.apache.jena.riot.Lang lang, String prefixesLengthMaxKey)
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T>
createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
abstract org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T>
createRecordReaderActual(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
static org.apache.jena.rdf.model.Model
getModel(org.apache.hadoop.conf.Configuration conf)
Extract a Model from a hadoop conf usingPREFIXES_KEY
static org.apache.jena.rdf.model.Model
getModel(org.apache.hadoop.conf.Configuration conf, String key)
Extract a Model from a hadoop conf.List<org.apache.hadoop.mapreduce.InputSplit>
getSplits(org.apache.hadoop.mapreduce.JobContext job)
boolean
isSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path file)
-
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
-
-
-
-
Field Detail
-
PREFIXES_KEY
public static final String PREFIXES_KEY
- See Also:
- Constant Field Values
-
BASE_IRI_KEY
public static final String BASE_IRI_KEY
- See Also:
- Constant Field Values
-
PARSED_PREFIXES_LENGTH_DEFAULT
public static final long PARSED_PREFIXES_LENGTH_DEFAULT
- See Also:
- Constant Field Values
-
lang
protected org.apache.jena.riot.Lang lang
Input language
-
prefixesLengthMaxKey
protected String prefixesLengthMaxKey
-
-
Constructor Detail
-
FileInputFormatRdfBase
public FileInputFormatRdfBase(org.apache.jena.riot.Lang lang, String prefixesLengthMaxKey)
-
-
Method Detail
-
isSplitable
public boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path file)
- Overrides:
isSplitable
in classorg.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
-
createRecordReader
public final org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
- Specified by:
createRecordReader
in classorg.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,T>
-
createRecordReaderActual
public abstract org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T> createRecordReaderActual(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
-
getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job) throws IOException
- Overrides:
getSplits
in classorg.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
- Throws:
IOException
-
getModel
public static org.apache.jena.rdf.model.Model getModel(org.apache.hadoop.conf.Configuration conf)
Extract a Model from a hadoop conf usingPREFIXES_KEY
-
getModel
public static org.apache.jena.rdf.model.Model getModel(org.apache.hadoop.conf.Configuration conf, String key)
Extract a Model from a hadoop conf. Result is never null; empty if there was no entry for the key or exception on parse error.
-
-