public class MiningAssistant
extends java.lang.Object
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
allowConstants
Allow constants for refinements
|
protected boolean |
avoidUnboundTypeAtoms
If true, the assistant will never add atoms of the form type(x, y), i.e., it will always bind
the second argument to a type.
|
protected java.util.Collection<javatools.datatypes.ByteString> |
bodyExcludedRelations
List of excluded relations for the body of rules;
|
protected java.util.Collection<javatools.datatypes.ByteString> |
bodyTargetRelations
List of target relations for the body of rules;
|
protected ConfidenceMetric |
confidenceMetric
Confidence metric used to assess the quality of rules.
|
protected boolean |
countAlwaysOnSubject
Count directly on subject or use functional information
|
protected boolean |
enabledConfidenceUpperBounds
Enable confidence and PCA confidence upper bounds for pruning when given a confidence threshold
|
protected boolean |
enabledFunctionalityHeuristic
Use a functionality vs suggested functionality heuristic to prune low confident rule upfront.
|
protected boolean |
enablePerfectRules
Enable perfect rule pruning, i.e., do not further specialize rules with PCA confidence
1.0.
|
protected boolean |
enableQueryRewriting
Enable query rewriting to optimize runtime.
|
protected boolean |
enforceConstants
Enforce constants in all atoms of rules
|
protected boolean |
exploitMaxLengthOption
If false, the assistant will not exploit the maximum length restriction to improve
runtime.
|
protected java.util.HashMap<java.lang.String,java.lang.Double> |
headCardinalities
Contains the number of triples per relation in the database
|
protected java.util.Collection<javatools.datatypes.ByteString> |
headExcludedRelations
List of excluded relations for the head of rules;
|
protected KB |
kb
Factory object to instantiate query components
|
protected KB |
kbSchema
Exclusively used for schema information, such as subclass and sub-property
relations or relation signatures.
|
protected int |
maxDepth
Maximum number of atoms in a query
|
protected double |
minPcaConfidence
Minimum confidence
|
protected double |
minStdConfidence
Minimum confidence
|
protected int |
recursivityLimit
Maximum number of times a relation can appear in
a rule.
|
protected javatools.datatypes.ByteString |
subPropertyString
Subproperty keyword
|
protected long |
totalObjectCount
Number of different objects in the underlying dataset
|
protected long |
totalSubjectCount
Number of different subjects in the underlying dataset
|
protected javatools.datatypes.ByteString |
typeString
Type keyword
|
protected boolean |
verbose
If true, the assistant will output minimal debug information
|
| Constructor and Description |
|---|
MiningAssistant(KB dataSource) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
buildInitialQueries(javatools.datatypes.IntHashMap<javatools.datatypes.ByteString> relations,
double minSupportThreshold,
java.util.Collection<Rule> output)
Given a list of relations with their corresponding support (one assistant could count based on the number of pairs,
another could use the number of subjects), it adds one rule per relation to the output.
|
protected boolean |
calculateConfidenceApproximationFor3Atoms(Rule candidate)
Calculate the confidence approximation of the query for the case when the rule has exactly 3 atoms.
|
protected boolean |
calculateConfidenceApproximationForGeneralCase(Rule candidate)
Given a rule with more than 3 atoms and a single path connecting the head variables,
it computes a confidence approximation.
|
boolean |
calculateConfidenceBoundsAndApproximations(Rule candidate)
It computes the confidence upper bounds and approximations for the rule sent as argument.
|
void |
calculateConfidenceMetrics(Rule candidate)
It computes the standard and the PCA confidence of a given rule.
|
protected boolean |
canAddInstantiatedAtoms()
Returns true if the assistant configuration allows the addition of instantiated atom, i.e., atoms
where one of the arguments has a constant.
|
double |
computeCardinality(Rule rule)
It computes the number of positive examples (cardinality) of the given rule
based on the evidence in the database.
|
double |
computePCAConfidence(Rule rule)
It computes the PCA confidence of the given rule based on the evidence in database.
|
double |
computeStandardConfidence(Rule candidate)
It computes the standard confidence of the given rule based on the evidence in database.
|
protected int |
findCountingVariable(javatools.datatypes.ByteString[] headAtom)
It determines the counting variable of an atom with constant relation based on
the functionality of the relation
|
java.util.Collection<javatools.datatypes.ByteString> |
getBodyExcludedRelations() |
java.util.Collection<javatools.datatypes.ByteString> |
getBodyTargetRelations() |
void |
getClosingAtoms(Rule rule,
double minSupportThreshold,
java.util.Collection<Rule> output)
Returns all rule candidates obtained by adding a new atom that does not contain
fresh variables.
|
ConfidenceMetric |
getConfidenceMetric() |
void |
getDanglingAtoms(Rule rule,
double minSupportThreshold,
java.util.Collection<Rule> output)
Returns all candidates obtained by adding a new dangling atom to the query.
|
java.lang.String |
getDescription()
Brief description of the MiningAssistant capabilities.
|
long |
getFactsCount() |
long |
getHeadCardinality(Rule query) |
java.util.Collection<javatools.datatypes.ByteString> |
getHeadExcludedRelations() |
void |
getInitialAtoms(double minSupportThreshold,
java.util.Collection<Rule> output)
Returns a list of one-atom queries using the relations from the KB
|
void |
getInitialAtomsFromSeeds(java.util.Collection<javatools.datatypes.ByteString> relations,
double minSupportThreshold,
java.util.Collection<Rule> output)
Returns a list of one-atom queries using the head relations provided in the collection relations.
|
void |
getInstantiatedAtoms(Rule rule,
double minSupportThreshold,
java.util.Collection<Rule> danglingEdges,
java.util.Collection<Rule> output)
Returns all candidates obtained by instantiating the dangling variable of the last added
triple pattern in the rule
|
protected void |
getInstantiatedAtoms(Rule queryWithDanglingEdge,
Rule parentQuery,
int danglingAtomPosition,
int danglingPositionInEdge,
double minSupportThreshold,
java.util.Collection<Rule> output)
It returns all the refinements of queryWithDanglingEdge where the fresh variable in the dangling
atom has been bound to all the constants that keep the query above the support threshold.
|
KB |
getKb()
It returns the training dataset from which rules atoms are added
|
KB |
getKbSchema()
It returns the KB containing the schema information (subclass and subproperty relationships,
domains and ranges for relation, etc.) about the training dataset.
|
int |
getMaxDepth() |
double |
getMinConfidence() |
double |
getPcaConfidenceThreshold() |
int |
getRecursivityLimit() |
double |
getRelationCardinality(javatools.datatypes.ByteString relation) |
double |
getRelationCardinality(java.lang.String relation) |
protected java.util.Set<javatools.datatypes.ByteString> |
getSubClasses(javatools.datatypes.ByteString className) |
long |
getTotalCount(int projVarPosition) |
long |
getTotalCount(Rule candidate) |
long |
getTotalObjectCount() |
long |
getTotalSubjectCount()
Returns the total number of subjects in the database.
|
boolean |
isAvoidUnboundTypeAtoms() |
boolean |
isEnabledConfidenceUpperBounds() |
boolean |
isEnabledFunctionalityHeuristic() |
boolean |
isEnablePerfectRules() |
boolean |
isEnableQueryRewriting() |
boolean |
isEnforceConstants() |
boolean |
isExploitMaxLengthOption() |
boolean |
isVerbose() |
boolean |
registerHeadRelation(Rule query) |
void |
setAllowConstants(boolean allowConstants) |
void |
setAvoidUnboundTypeAtoms(boolean avoidUnboundTypeAtoms) |
void |
setBodyExcludedRelations(java.util.Collection<javatools.datatypes.ByteString> excludedRelations) |
void |
setConfidenceMetric(ConfidenceMetric confidenceMetric) |
void |
setCountAlwaysOnSubject(boolean countAlwaysOnSubject) |
void |
setEnabledConfidenceUpperBounds(boolean enabledConfidenceUpperBounds) |
void |
setEnabledFunctionalityHeuristic(boolean enableOptimizations) |
void |
setEnablePerfectRules(boolean enablePerfectRules) |
void |
setEnableQueryRewriting(boolean enableQueryRewriting) |
void |
setEnforceConstants(boolean enforceConstants) |
void |
setExploitMaxLengthOption(boolean exploitMaxLengthOption) |
void |
setHeadExcludedRelations(java.util.Collection<javatools.datatypes.ByteString> headExcludedRelations) |
void |
setKbSchema(KB schemaSource) |
void |
setMaxDepth(int maxAntecedentDepth) |
void |
setPcaConfidenceThreshold(double minConfidence) |
void |
setRecursivityLimit(int recursivityLimit) |
void |
setStdConfidenceThreshold(double minConfidence) |
void |
setTargetBodyRelations(java.util.Collection<javatools.datatypes.ByteString> bodyTargetRelations) |
void |
setVerbose(boolean silent) |
boolean |
testConfidenceThresholds(Rule candidate)
It checks whether a rule satisfies the confidence thresholds and the
sky-line heuristic: the strategy that avoids outputting rules that do not
improve the confidence w.r.t their parents.
|
protected boolean |
testLength(Rule candidate)
Check whether the rule meets the length criteria configured in the object.
|
protected int recursivityLimit
protected KB kb
protected KB kbSchema
protected long totalObjectCount
protected long totalSubjectCount
protected javatools.datatypes.ByteString typeString
protected javatools.datatypes.ByteString subPropertyString
protected double minStdConfidence
protected double minPcaConfidence
protected int maxDepth
protected java.util.HashMap<java.lang.String,java.lang.Double> headCardinalities
protected boolean allowConstants
protected boolean enforceConstants
protected java.util.Collection<javatools.datatypes.ByteString> bodyExcludedRelations
protected java.util.Collection<javatools.datatypes.ByteString> headExcludedRelations
protected java.util.Collection<javatools.datatypes.ByteString> bodyTargetRelations
protected boolean countAlwaysOnSubject
protected boolean enabledFunctionalityHeuristic
protected boolean enabledConfidenceUpperBounds
protected boolean verbose
protected boolean avoidUnboundTypeAtoms
protected boolean exploitMaxLengthOption
protected boolean enableQueryRewriting
protected boolean enablePerfectRules
protected ConfidenceMetric confidenceMetric
public MiningAssistant(KB dataSource)
dataSource - public int getRecursivityLimit()
public void setRecursivityLimit(int recursivityLimit)
public long getTotalCount(Rule candidate)
public long getTotalSubjectCount()
public long getTotalObjectCount()
public int getMaxDepth()
public void setMaxDepth(int maxAntecedentDepth)
maxAntecedentDepth - public double getMinConfidence()
public double getPcaConfidenceThreshold()
public void setPcaConfidenceThreshold(double minConfidence)
minConfidence - the minPcaConfidence to setpublic void setStdConfidenceThreshold(double minConfidence)
minConfidence - the minConfidence to setpublic KB getKb()
public KB getKbSchema()
public java.lang.String getDescription()
public void setKbSchema(KB schemaSource)
public boolean registerHeadRelation(Rule query)
public long getHeadCardinality(Rule query)
public double getRelationCardinality(java.lang.String relation)
public double getRelationCardinality(javatools.datatypes.ByteString relation)
protected java.util.Set<javatools.datatypes.ByteString> getSubClasses(javatools.datatypes.ByteString className)
protected boolean canAddInstantiatedAtoms()
public void getInitialAtomsFromSeeds(java.util.Collection<javatools.datatypes.ByteString> relations,
double minSupportThreshold,
java.util.Collection<Rule> output)
relations - minSupportThreshold - Only relations of size bigger or equal than this value will be considered.output - The results of the method are added directly to this collection.public void getInitialAtoms(double minSupportThreshold,
java.util.Collection<Rule> output)
minSupportThreshold - Only relations of size bigger or equal than this value will
be considered.output - protected void buildInitialQueries(javatools.datatypes.IntHashMap<javatools.datatypes.ByteString> relations,
double minSupportThreshold,
java.util.Collection<Rule> output)
relations - minSupportThreshold - Only relations with support equal or above this value are considered.output - public void getDanglingAtoms(Rule rule, double minSupportThreshold, java.util.Collection<Rule> output)
rule - minSupportThreshold - output - protected int findCountingVariable(javatools.datatypes.ByteString[] headAtom)
headAtom - public void calculateConfidenceMetrics(Rule candidate)
candidate - public void getClosingAtoms(Rule rule, double minSupportThreshold, java.util.Collection<Rule> output)
rule - minSupportThreshold - Only candidates with support above or equal this value are returned.output - public void getInstantiatedAtoms(Rule rule, double minSupportThreshold, java.util.Collection<Rule> danglingEdges, java.util.Collection<Rule> output)
rule - minSupportThreshold - danglingEdges - output - protected boolean testLength(Rule candidate)
candidate - public boolean calculateConfidenceBoundsAndApproximations(Rule candidate)
candidate - protected boolean calculateConfidenceApproximationForGeneralCase(Rule candidate)
candidate - protected boolean calculateConfidenceApproximationFor3Atoms(Rule candidate)
candidate - public boolean testConfidenceThresholds(Rule candidate)
candidate - protected void getInstantiatedAtoms(Rule queryWithDanglingEdge, Rule parentQuery, int danglingAtomPosition, int danglingPositionInEdge, double minSupportThreshold, java.util.Collection<Rule> output)
queryWithDanglingEdge - parentQuery - danglingAtomPosition - danglingPositionInEdge - minSupportThreshold - output - public double computeCardinality(Rule rule)
rule - public double computePCAConfidence(Rule rule)
rule - public double computeStandardConfidence(Rule candidate)
candidate - public void setAllowConstants(boolean allowConstants)
public boolean isEnforceConstants()
public void setEnforceConstants(boolean enforceConstants)
public java.util.Collection<javatools.datatypes.ByteString> getBodyExcludedRelations()
public void setBodyExcludedRelations(java.util.Collection<javatools.datatypes.ByteString> excludedRelations)
public java.util.Collection<javatools.datatypes.ByteString> getHeadExcludedRelations()
public void setHeadExcludedRelations(java.util.Collection<javatools.datatypes.ByteString> headExcludedRelations)
public java.util.Collection<javatools.datatypes.ByteString> getBodyTargetRelations()
public boolean isAvoidUnboundTypeAtoms()
public void setAvoidUnboundTypeAtoms(boolean avoidUnboundTypeAtoms)
public void setTargetBodyRelations(java.util.Collection<javatools.datatypes.ByteString> bodyTargetRelations)
public long getTotalCount(int projVarPosition)
public void setCountAlwaysOnSubject(boolean countAlwaysOnSubject)
public long getFactsCount()
public boolean isEnabledFunctionalityHeuristic()
public void setEnabledFunctionalityHeuristic(boolean enableOptimizations)
public boolean isEnabledConfidenceUpperBounds()
public void setEnabledConfidenceUpperBounds(boolean enabledConfidenceUpperBounds)
public boolean isVerbose()
public void setVerbose(boolean silent)
public boolean isExploitMaxLengthOption()
public void setExploitMaxLengthOption(boolean exploitMaxLengthOption)
public boolean isEnableQueryRewriting()
public void setEnableQueryRewriting(boolean enableQueryRewriting)
public boolean isEnablePerfectRules()
public void setEnablePerfectRules(boolean enablePerfectRules)
public ConfidenceMetric getConfidenceMetric()
public void setConfidenceMetric(ConfidenceMetric confidenceMetric)