org.aksw.commons.jena.util
Class CommonProperties

java.lang.Object
  extended by org.aksw.commons.jena.util.CommonProperties

@Guarded
public class CommonProperties
extends Object

Author:
Konrad Höffner

Constructor Summary
CommonProperties()
           
 
Method Summary
static LinkedHashMap<String,Integer> getCommonProperties(String endpoint, String where, Double threshold, Integer maxResultSize, Integer sampleSize)
          For a given SPARQL where clause, creates a table of their most common properties.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CommonProperties

public CommonProperties()
Method Detail

getCommonProperties

public static LinkedHashMap<String,Integer> getCommonProperties(@NotEmpty@NotNull
                                                                String endpoint,
                                                                @NotEmpty@NotNull
                                                                String where,
                                                                @Range(min=0.0,max=1.0)
                                                                Double threshold,
                                                                @Min(value=1.0)
                                                                Integer maxResultSize,
                                                                @Min(value=1.0)
                                                                Integer sampleSize)
For a given SPARQL where clause, creates a table of their most common properties. Also available with a file cache: CachedCommonProperties. The following example shows the 5 most common properties for the where clause "?s a dbpedia-owl:Settlement". Attention: You may only use ?s, ?p and ?o as variable names for subject, predicate and object respectively.
p count
http://www.w3.org/1999/02/22-rdf-syntax-ns#type 50
http://www.w3.org/2000/01/rdf-schema#label 50
http://xmlns.com/foaf/0.1/page 50
http://www.w3.org/2000/01/rdf-schema#comment 49
http://purl.org/dc/terms/subject 49

Parameters:
endpoint - the URL of the SPARQL endpoint to be queried
where - the contents of a SPARQL select "where" clause which may only use ?s, ?p and ?o as variable names for subject, predicate and object.
threshold - a value between 0 and 1, specifying what fraction of the instances must have this property for it to be counted as common property. Set to null if you want no restriction on this.
maxResultSize - a non-negative integer value, specifying the maximum amount of properties to return.
sampleSize - the number of instances whose triples are examined. Set to null to look at all triples (may take a long time). On the other hand, using a sample instead of all data may give a wrong result even for a big sample size because the sample is not random but the selection depends on the SPARQL server (uses Virtuoso SPARQL for subqueries).
Returns:
the most common properties sorted by occurrence in descending order. Each property p is counted at most once for each instance s, even if there are multiple triples (s,p,o). Example: getCommonProperties(0.5) will only return properties which are used by at least half of the uris in the cache.
See Also:
CachedCommonProperties


Copyright © 2012. All Rights Reserved.