Package org.aksw.qa.commons.qald
Class Qald7CreationTool
java.lang.Object
org.aksw.qa.commons.qald.Qald7CreationTool
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static Stringprivate intQALD1 and QALD2 not multilingual!private static Stringprivate ThreadedSPARQL -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate voidaddSave(Map<String,List<Qald7Question>> map, String question, Qald7Question q)private booleanprivate booleanprivate booleancheckIsOnlydbo(String sparqlQuery)private booleanprivate booleanvoidcreateFileReport(Set<Qald7Question> allQuestions, String pathAndFilenameWithExtension, Set<Fail> ignoreFlags)voidcreateFileReportForTestQuestions(Set<Dataset> datasets, boolean autocorrectOnlydbo, String pathAndFilenameWithExtension, Set<Fail> ignoreFlags)Creates a file report to all bad questions in given datasetsprivate voidcreateQald7Dataset(Set<Qald7Question> allQuestions, String path, String filenameWithoutExtension)voidcreateQald7HybridDataset(Set<Dataset> hybridDatasets, String path, String filenameWithoutExtension)Creates the hybrid datasets.voidcreateQald7MultilingualTrainDataset(Set<Dataset> datasets, boolean fileReport, boolean autocorrectOnlydbo, String path, String filenameWithoutExtension)Creates the multilingual train datasets.voiddestroy()Call this if you dont need this object anymore.private Set<Qald7Question>extractBadQuestionsFromAnnotated(Set<Qald7Question> questions, Set<Fail> ignoreFlags)private Set<Qald7Question>extractGoodTrainQuestionsFromAnnotated(Set<Qald7Question> questions)private Set<Qald7Question>findAndSelectBestDuplicate(List<Qald7Question> questions)Returns answers from official dbpedia endpoint to the stored sparql inIQuestiongetQald7HybridQuestions(Set<Dataset> datasets)Returns all Hybrid questions for Qald7 (Loads all previous qald hybrid questions and drops duplicates).getQald7MultilingualTrainQuestions(Set<Dataset> datasets, boolean autocorrectOnlydbo)Loads all questions from given datasets, checks question integrity (is the stored answerset still identical with the one returned for given sparql query, is a sparql present and parseable, are at least 6 languages available with keywords, is an answertype set,...loadAndAnnotateTrain(Set<Dataset> datasets, boolean autocorrectOnlyDBO)static void
-
Field Details
-
DBO_URI
- See Also:
- Constant Field Values
-
RES_URI
- See Also:
- Constant Field Values
-
duplicate
private int duplicate -
MULTILINGUAL_TRAIN_TEST_SETS
QALD1 and QALD2 not multilingual! -
HYBRID_SETS
-
sparql
-
-
Constructor Details
-
Qald7CreationTool
public Qald7CreationTool() -
Qald7CreationTool
-
-
Method Details
-
getQald7HybridQuestions
Returns all Hybrid questions for Qald7 (Loads all previous qald hybrid questions and drops duplicates). This will set "hybrid:true" in all questions!!!- Parameters:
datasets- All datasets from which questions should be extracted- Returns:
- All available unique questions from given datasets
-
createQald7HybridDataset
public void createQald7HybridDataset(Set<Dataset> hybridDatasets, String path, String filenameWithoutExtension)Creates the hybrid datasets. Three files will be stored in given location: QALD-Json, Extended-Json and xml- Parameters:
hybridDatasets- The sets questions are taken from.path- The path to write the datasets to.filenameWithoutExtension- The name of the new dataset
-
createQald7MultilingualTrainDataset
public void createQald7MultilingualTrainDataset(Set<Dataset> datasets, boolean fileReport, boolean autocorrectOnlydbo, String path, String filenameWithoutExtension)Creates the multilingual train datasets. Three files will be stored in given location: QALD-Json, Extended-Json and xml- Parameters:
datasets- The sets questions are taken from.autocorrectOnlydbo- Is a bad Onlydbo-flag a exclusion criterion (Question wont appear in file) for a question or should it be autofixed?path- The path to write the datasets to.filenameWithoutExtension- The name of the new dataset
-
getQald7MultilingualTrainQuestions
public Set<Qald7Question> getQald7MultilingualTrainQuestions(Set<Dataset> datasets, boolean autocorrectOnlydbo)Loads all questions from given datasets, checks question integrity (is the stored answerset still identical with the one returned for given sparql query, is a sparql present and parseable, are at least 6 languages available with keywords, is an answertype set,... ) Also, duplicates are filtered out, only the candidate with the least error flags @linkFailwill be in returned set. So, returned Questions are all clean. To get a duplicate free, withFailannotated dataset, useloadAndAnnotateTrain(Set, boolean)- Parameters:
datasets- The datasets from which the questions are gatheredautocorrectOnlydbo- Is a bad Onlydbo-flag a exclusion criterion (Question wont appear in file) for a question or should it be autofixed?- Returns:
- All clean duplicate free questions from given datasets.
-
loadAndAnnotateTrain
-
checkSparqlPresent
-
checkAnswertypeSet
-
checkKeywordsPresent
-
checkAtleastSixLanguages
-
getAnswersFromServer
Returns answers from official dbpedia endpoint to the stored sparql inIQuestion- Parameters:
q- Question to be answered- Returns:
- Answers as string set
- Throws:
ExecutionException
-
addSave
-
findAndSelectBestDuplicate
-
extractGoodTrainQuestionsFromAnnotated
-
extractBadQuestionsFromAnnotated
private Set<Qald7Question> extractBadQuestionsFromAnnotated(Set<Qald7Question> questions, Set<Fail> ignoreFlags) -
createQald7Dataset
private void createQald7Dataset(Set<Qald7Question> allQuestions, String path, String filenameWithoutExtension) -
createFileReportForTestQuestions
public void createFileReportForTestQuestions(Set<Dataset> datasets, boolean autocorrectOnlydbo, String pathAndFilenameWithExtension, Set<Fail> ignoreFlags)Creates a file report to all bad questions in given datasets- Parameters:
datasets- All datasets to be checkedautocorrectOnlydbo- Is a bad Onlydbo-flag a exclusion criterion (Question wont appear in file) for a question or should it be autofixed?pathAndFilenameWithExtension- Path and name of new file reportskipQuestionsWithTooLittleLanguages- Normally, multilingual datasets have at least six languages. When this flag is set, all questions with less languages will be ignored, otherwise its an errorFailand the question goes into the report
-
createFileReport
public void createFileReport(Set<Qald7Question> allQuestions, String pathAndFilenameWithExtension, Set<Fail> ignoreFlags) -
checkIsOnlydbo
private boolean checkIsOnlydbo(String sparqlQuery) throws org.apache.jena.query.QueryParseException- Throws:
org.apache.jena.query.QueryParseException
-
destroy
public void destroy()Call this if you dont need this object anymore. Closes the Threads around the server connection to the sparql server. -
main
-