Class FileInputFormatRdfBase<T>

java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
net.sansa_stack.hadoop.format.jena.base.FileInputFormatRdfBase<T>
Type Parameters:
T -
All Implemented Interfaces:
CanParseRdf
Direct Known Subclasses:
FileInputFormatRdfNQuads, FileInputFormatRdfNTriples, FileInputFormatRdfTrigDataset, FileInputFormatRdfTrigQuad, FileInputFormatRdfTurtleTriple

public abstract class FileInputFormatRdfBase<T> extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T> implements CanParseRdf
Base class for unit testing of reading an RDF file with an arbitrary number of splits. RDF is read as Datasets which means that triples are expanded to quads.

(We could generalize to tuples of RDF terms using Bindings)

The only method that need to be overriden is createRecordReaderActual(InputSplit, TaskAttemptContext).

  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

    org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
     
    protected org.apache.jena.riot.Lang
    Input language
    static final long
     
    static final String
     
    protected String
     

    Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

    DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
  • Constructor Summary

    Constructors
    Constructor
    Description
    FileInputFormatRdfBase(org.apache.jena.riot.Lang lang, String prefixesLengthMaxKey)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    final org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T>
    createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
     
    abstract org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T>
    createRecordReaderActual(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
     
    static org.apache.jena.rdf.model.Model
    getModel(org.apache.hadoop.conf.Configuration conf)
    Extract a Model from a hadoop conf using PREFIXES_KEY
    static org.apache.jena.rdf.model.Model
    getModel(org.apache.hadoop.conf.Configuration conf, String key)
    Extract a Model from a hadoop conf.
    long
    getPrefixByteCount(org.apache.hadoop.conf.Configuration conf)
     
    List<org.apache.hadoop.mapreduce.InputSplit>
    getSplits(org.apache.hadoop.mapreduce.JobContext job)
     
    boolean
    isSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path file)
     
    org.apache.jena.riot.system.PrefixMap
    parsePrefixes(InputStream in, org.apache.hadoop.conf.Configuration conf)
    Public method to parse prefixes w.r.t.
    org.apache.jena.riot.system.PrefixMap
    readPrefixes(Callable<InputStream> inSupp, org.apache.hadoop.conf.Configuration conf)
     
    static org.apache.jena.riot.system.PrefixMap
    readPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap sink, InputStream in, org.apache.jena.riot.Lang lang)
     
    static org.apache.jena.riot.system.PrefixMap
    readPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap sink, InputStream in, org.apache.jena.riot.Lang lang, Long limit)
    Read prefixes from an input stream.
    static org.apache.jena.riot.system.PrefixMap
    readPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap prefixModel, Callable<InputStream> inSupp, org.apache.jena.riot.Lang lang, Long limit)
     

    Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

    addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize, shrinkStatus

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface net.sansa_stack.hadoop.format.jena.base.CanParseRdf

    parsePrefixes
  • Field Details

    • PREFIXES_KEY

      public static final String PREFIXES_KEY
      See Also:
    • BASE_IRI_KEY

      public static final String BASE_IRI_KEY
      See Also:
    • PARSED_PREFIXES_LENGTH_DEFAULT

      public static final long PARSED_PREFIXES_LENGTH_DEFAULT
      See Also:
    • lang

      protected org.apache.jena.riot.Lang lang
      Input language
    • prefixesLengthMaxKey

      protected String prefixesLengthMaxKey
  • Constructor Details

    • FileInputFormatRdfBase

      public FileInputFormatRdfBase(org.apache.jena.riot.Lang lang, String prefixesLengthMaxKey)
  • Method Details

    • isSplitable

      public boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path file)
      Overrides:
      isSplitable in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
    • createRecordReader

      public final org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
      Specified by:
      createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,T>
    • createRecordReaderActual

      public abstract org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,T> createRecordReaderActual(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
    • readPrefixesIntoModel

      public static org.apache.jena.riot.system.PrefixMap readPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap prefixModel, Callable<InputStream> inSupp, org.apache.jena.riot.Lang lang, Long limit)
      Parameters:
      prefixModel - If null then a default model will be generated
      inSupp - An input stream supplier. taken stream will be closed.
      lang - The RDF language. Must not be null.
      limit -
      Returns:
      Throws:
      Exception
    • readPrefixes

      public org.apache.jena.riot.system.PrefixMap readPrefixes(Callable<InputStream> inSupp, org.apache.hadoop.conf.Configuration conf)
    • parsePrefixes

      public org.apache.jena.riot.system.PrefixMap parsePrefixes(InputStream in, org.apache.hadoop.conf.Configuration conf)
      Public method to parse prefixes w.r.t. this input format configuration
      Specified by:
      parsePrefixes in interface CanParseRdf
    • readPrefixesIntoModel

      public static org.apache.jena.riot.system.PrefixMap readPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap sink, InputStream in, org.apache.jena.riot.Lang lang, Long limit)
      Read prefixes from an input stream.
      Parameters:
      limit - If non-null, limits the number of bytes that can be read from the input stream to the given value.
    • getPrefixByteCount

      public long getPrefixByteCount(org.apache.hadoop.conf.Configuration conf)
    • readPrefixesIntoModel

      public static org.apache.jena.riot.system.PrefixMap readPrefixesIntoModel(org.apache.jena.riot.system.PrefixMap sink, InputStream in, org.apache.jena.riot.Lang lang)
    • getSplits

      public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job) throws IOException
      Overrides:
      getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,T>
      Throws:
      IOException
    • getModel

      public static org.apache.jena.rdf.model.Model getModel(org.apache.hadoop.conf.Configuration conf)
      Extract a Model from a hadoop conf using PREFIXES_KEY
    • getModel

      public static org.apache.jena.rdf.model.Model getModel(org.apache.hadoop.conf.Configuration conf, String key)
      Extract a Model from a hadoop conf. Result is never null; empty if there was no entry for the key or exception on parse error.