Class Qald7CreationTool

java.lang.Object
org.aksw.qa.commons.qald.Qald7CreationTool

public class Qald7CreationTool extends Object
  • Field Details

  • Constructor Details

    • Qald7CreationTool

      public Qald7CreationTool()
    • Qald7CreationTool

      public Qald7CreationTool(String sparqlEndpoint, int timeout)
  • Method Details

    • getQald7HybridQuestions

      public Set<Qald7Question> getQald7HybridQuestions(Set<Dataset> datasets)
      Returns all Hybrid questions for Qald7 (Loads all previous qald hybrid questions and drops duplicates). This will set "hybrid:true" in all questions!!!
      Parameters:
      datasets - All datasets from which questions should be extracted
      Returns:
      All available unique questions from given datasets
    • createQald7HybridDataset

      public void createQald7HybridDataset(Set<Dataset> hybridDatasets, String path, String filenameWithoutExtension)
      Creates the hybrid datasets. Three files will be stored in given location: QALD-Json, Extended-Json and xml
      Parameters:
      hybridDatasets - The sets questions are taken from.
      path - The path to write the datasets to.
      filenameWithoutExtension - The name of the new dataset
    • createQald7MultilingualTrainDataset

      public void createQald7MultilingualTrainDataset(Set<Dataset> datasets, boolean fileReport, boolean autocorrectOnlydbo, String path, String filenameWithoutExtension)
      Creates the multilingual train datasets. Three files will be stored in given location: QALD-Json, Extended-Json and xml
      Parameters:
      datasets - The sets questions are taken from.
      autocorrectOnlydbo - Is a bad Onlydbo-flag a exclusion criterion (Question wont appear in file) for a question or should it be autofixed?
      path - The path to write the datasets to.
      filenameWithoutExtension - The name of the new dataset
    • getQald7MultilingualTrainQuestions

      public Set<Qald7Question> getQald7MultilingualTrainQuestions(Set<Dataset> datasets, boolean autocorrectOnlydbo)
      Loads all questions from given datasets, checks question integrity (is the stored answerset still identical with the one returned for given sparql query, is a sparql present and parseable, are at least 6 languages available with keywords, is an answertype set,... ) Also, duplicates are filtered out, only the candidate with the least error flags @link Fail will be in returned set. So, returned Questions are all clean. To get a duplicate free, with Fail annotated dataset, use loadAndAnnotateTrain(Set, boolean)
      Parameters:
      datasets - The datasets from which the questions are gathered
      autocorrectOnlydbo - Is a bad Onlydbo-flag a exclusion criterion (Question wont appear in file) for a question or should it be autofixed?
      Returns:
      All clean duplicate free questions from given datasets.
    • loadAndAnnotateTrain

      public Set<Qald7Question> loadAndAnnotateTrain(Set<Dataset> datasets, boolean autocorrectOnlyDBO)
    • checkSparqlPresent

      private boolean checkSparqlPresent(IQuestion q)
    • checkAnswertypeSet

      private boolean checkAnswertypeSet(IQuestion q)
    • checkKeywordsPresent

      private boolean checkKeywordsPresent(IQuestion q)
    • checkAtleastSixLanguages

      private boolean checkAtleastSixLanguages(IQuestion q)
    • getAnswersFromServer

      public Set<String> getAnswersFromServer(IQuestion q) throws ExecutionException
      Returns answers from official dbpedia endpoint to the stored sparql in IQuestion
      Parameters:
      q - Question to be answered
      Returns:
      Answers as string set
      Throws:
      ExecutionException
    • addSave

      private void addSave(Map<String,​List<Qald7Question>> map, String question, Qald7Question q)
    • findAndSelectBestDuplicate

      private Set<Qald7Question> findAndSelectBestDuplicate(List<Qald7Question> questions)
    • extractGoodTrainQuestionsFromAnnotated

      private Set<Qald7Question> extractGoodTrainQuestionsFromAnnotated(Set<Qald7Question> questions)
    • extractBadQuestionsFromAnnotated

      private Set<Qald7Question> extractBadQuestionsFromAnnotated(Set<Qald7Question> questions, Set<Fail> ignoreFlags)
    • createQald7Dataset

      private void createQald7Dataset(Set<Qald7Question> allQuestions, String path, String filenameWithoutExtension)
    • createFileReportForTestQuestions

      public void createFileReportForTestQuestions(Set<Dataset> datasets, boolean autocorrectOnlydbo, String pathAndFilenameWithExtension, Set<Fail> ignoreFlags)
      Creates a file report to all bad questions in given datasets
      Parameters:
      datasets - All datasets to be checked
      autocorrectOnlydbo - Is a bad Onlydbo-flag a exclusion criterion (Question wont appear in file) for a question or should it be autofixed?
      pathAndFilenameWithExtension - Path and name of new file report
      skipQuestionsWithTooLittleLanguages - Normally, multilingual datasets have at least six languages. When this flag is set, all questions with less languages will be ignored, otherwise its an error Fail and the question goes into the report
    • createFileReport

      public void createFileReport(Set<Qald7Question> allQuestions, String pathAndFilenameWithExtension, Set<Fail> ignoreFlags)
    • checkIsOnlydbo

      private boolean checkIsOnlydbo(String sparqlQuery) throws org.apache.jena.query.QueryParseException
      Throws:
      org.apache.jena.query.QueryParseException
    • destroy

      public void destroy()
      Call this if you dont need this object anymore. Closes the Threads around the server connection to the sparql server.
    • main

      public static void main(String[] args)