Task 2: Ontology lexicalization
Multilingual information access can be facilitated by the availability of lexica in different
languages, for example allowing for an easy mapping of Spanish, German, and French natural
language expressions to English ontology labels.
Task
The task consists in finding English lexicalizations of a set of classes and properties from the
DBpedia ontology in a Wikipedia corpus.
The submitted lexicalizations are expected to follow the ontology lexicon format
lemon.
Full description:
qald3_openchallenge.pdf (Last updated: March 25, 2013)
Training data
The training data consists of a set of 10 classes and 30 properties from the DBpedia ontology,
as well as a lemon lexicon containing lexicalizations of those classes and properties.
A suitable corpus for finding lexicalizations is Wikipedia.
You can either download one of their data dumps,
or directly download an already cleaned up part of English Wikipedia (1.54 GB).
Test data
The test data consists of a similar set of additional 10 classes and 30 properties from the DBpedia ontology,
for which lexicalization have to be found.
Evaluation
Submitted lexica will be evaluated with respect to the reference data along three main criteria:
- lexical precision (How many of the lexical entries in the submitted
lexicon are also in the gold standard lexicon?)
- lexical recall (How many of the lexical entries in
the gold standard lexicon are also in the submitted lexicon?)
- lexical accuracy (checking the correctness of the frames
and argument mappings for each lexical entry in the submitted lexicon with respect
to the gold standard lexicon)
For both training and test phase, results can be uploaded with the following evaluation form: