A Practical Guide to Understanding Ontology Structure and Data Patterns in the Wild
DBpedia remains one of the richest publicly available Knowledge Graphs derived from Wikipedia content. Its structure gives a unique window into the shape of real-world data on the Web: entity types, properties, hierarchies, and semantic relationships.
This article explores DBpedia using a sequence of SPARQL queries, each designed to highlight a specific pattern or semantic capability. Every query includes:
- A clickable link to run it directly against DBpedia.
- An explanation of what the query uncovers and why it matters.
1. Entity Types and Representative Instances
This query lists classes (types) in DBpedia, shows a sample instance for each, and counts how many entities belong to that type.
Query
SELECT ?entityType (SAMPLE(?entity) AS ?sampleEntity) (COUNT(*) AS ?count)
WHERE {
?entity a ?entityType .
}
GROUP BY ?entityType
ORDER BY DESC(?count)
Run it
Why It’s Useful
This is the fastest way to understand what types DBpedia actually contains and how many instances each type has.
It highlights:
- Dominant classes (e.g., “Person”, “Place”, “Work”)
- Niche or sparsely populated types
- Unexpected type proliferation due to Wikipedia infobox diversity
It also provides a compact sanity check before doing deeper ontology or property exploration.
2. SubProperty/SuperProperty Exploration (Random Representative Start Points)
This query samples commonly used super-properties, selects one representative sub-property for each, and computes the full transitive hierarchy.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT *
WHERE {
{
SELECT ?superProperty ?subProperty
WHERE {
{
SELECT (?c AS ?superProperty) (SAMPLE(?a) AS ?subProperty) (COUNT(*) AS ?usageCount)
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(?usageCount)
LIMIT 5
}
}
}
?subProperty rdfs:subPropertyOf* ?superProperty .
}
LIMIT 100
Run it
Random-Subproperty-Transitive-Closure
Why It’s Useful
This demonstrates:
- How DBpedia’s property hierarchy is structured
- Which super-properties dominate usage
- How transitive closure (*) reveals inherited meaning
This is particularly useful when mapping DBpedia’s ontology to external ontologies or evaluating property alignment for integration tasks.
3. SubProperties Using the {+} Property Path Operator (Strictly Descendant Only)
This query selects the single most reused super-property and retrieves all of its sub-properties at any depth—but only those reachable via at least one rdfs:subPropertyOf relationship.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (?c AS ?superProperty) (?a AS ?subProperty)
WHERE {
{
SELECT ?c
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(COUNT(?a))
LIMIT 1
}
?a rdfs:subPropertyOf+ ?c .
}
LIMIT 500
Run it
Subproperty-Strict-Descendants
Why It’s Useful
The + operator ensures that only proper descendants are returned—not the property itself.
This is ideal for:
- Auditing ontology depth
- Creating visual property hierarchies
- Identifying redundant or overly specific properties
4. SubProperties Using the {2} Property Path Operator
This focuses on properties that are exactly two steps below a frequently used super-property.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (?c AS ?superProperty) (?a AS ?subProperty)
WHERE {
{
SELECT ?c
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(COUNT(?a))
LIMIT 1
}
?a rdfs:subPropertyOf{2} ?c .
}
LIMIT 500
Run it
Why It’s Useful
The {2} operator gives you a controlled look at mid-depth ontology structure.
This is helpful for:
- Ontology debugging
- Identifying second-order refinements of major properties
- Extracting property layers for tools that require bounded depth
5. Two-Hop SubProperty Exploration for the Top 10 Super-Properties
This expands the previous pattern to explore multiple major super-properties simultaneously.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (?c AS ?superProperty) (?a AS ?subProperty)
WHERE {
{
SELECT ?c
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(COUNT(?a))
LIMIT 10
}
?a rdfs:subPropertyOf{2} ?c .
}
LIMIT 100
Run it
Why It’s Useful
Powerful exploration of the DBpedia Knowledge Graph
You can quickly spot:
- Consistent modeling patterns
- Inconsistencies across similar property families
- Opportunities for ontology normalization
6. Property Usage and Dominance
This query counts the usage of every property (predicate) in the entire knowledge graph. It is the most direct way to discover which relationships form the backbone of DBpedia.
Query
SELECT ?p (COUNT(*) AS ?usageCount)
WHERE { ?s ?p ?o }
GROUP BY ?p
ORDER BY DESC (?usageCount)
Run it
Why It’s Useful
This query provides a high-level statistical overview of the graph’s structure. It answers the question: “What are the most common facts stored in DBpedia?”
It helps you immediately identify:
- Core RDF/RDFS properties:
rdf:type,rdfs:label,rdfs:comment. - Dominant data properties:
dbo:wikiPageWikiLink,dct:subject. - Metadata vs. factual properties: Distinguishing between properties about an entity (like
prov:wasDerivedFrom) and properties stating a fact about the entity (likedbo:birthPlace). - The most promising properties to explore in more detail with subsequent queries.
7. Top-5 Property Hierarchies by Usage and Transitive Closure
This advanced query combines statistical analysis with ontology traversal. It first identifies the five most-used properties in the entire graph, calculates their usage count and percentage, and then finds all of their respective sub-properties at any depth.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?startProperty ?usageCount ?usagePercent ?subProperty ?superProperty
WHERE {
#####################################################################
# 1. Determine the 5 most-used properties (global property ranking)
#####################################################################
{
SELECT ?startProperty ?usageCount ?usagePercent
WHERE {
# Compute usage count per property
{
SELECT ?p (COUNT(*) AS ?usageCount)
WHERE { ?s ?p ?o }
GROUP BY ?p
}
# Compute percentage of total usage
{
SELECT (SUM(?cnt) AS ?totalCount)
WHERE {
SELECT (COUNT(*) AS ?cnt)
WHERE { ?s ?p ?o }
}
}
BIND(?p AS ?startProperty)
BIND((100 * ?usageCount / ?totalCount) AS ?usagePercent)
}
ORDER BY DESC(?usageCount)
LIMIT 5
}
#####################################################################
# 2. Use the ranked properties as starting points of closure
#####################################################################
?subProperty rdfs:subPropertyOf* ?startProperty .
BIND(?startProperty AS ?superProperty)
}
LIMIT 200
Run it
Top-5-Property-Hierarchies-by-Usage
Why It’s Useful
This is the ultimate “high-impact” exploration query. It directly connects the statistical backbone of the knowledge graph (the most used properties) with its semantic structure (the property hierarchies).
This allows you to:
- Prioritize analysis: Immediately focus on the ontologies of the properties that matter most in practice.
- Understand semantic depth: See if a heavily used property like
dct:subjectis a standalone predicate or the root of a deeper hierarchy. - Discover the “semantic backbone”: The results show the main pillars of the graph (
rdf:type,dbo:wikiPageWikiLink, etc.) and the full scaffolding that supports them. - Guide data integration: When mapping an external schema to DBpedia, this query tells you exactly which property families are the most important to align with.