Task 2: Ontology lexicalization

Multilingual information access can be facilitated by the availability of lexica in different languages, for example allowing for an easy mapping of Spanish, German, and French natural language expressions to English ontology labels.

Task

The task consists in finding English lexicalizations of a set of classes and properties from the DBpedia ontology in a Wikipedia corpus. The submitted lexicalizations are expected to follow the ontology lexicon format lemon.

Full description: qald3_openchallenge.pdf (Last updated: March 25, 2013)

Training data

The training data consists of a set of 10 classes and 30 properties from the DBpedia ontology, as well as a lemon lexicon containing lexicalizations of those classes and properties. A suitable corpus for finding lexicalizations is Wikipedia. You can either download one of their data dumps, or directly download an already cleaned up part of English Wikipedia (1.54 GB).

Test data

The test data consists of a similar set of additional 10 classes and 30 properties from the DBpedia ontology, for which lexicalization have to be found.

Evaluation

Submitted lexica will be evaluated with respect to the reference data along three main criteria:

For both training and test phase, results can be uploaded with the following evaluation form: