OPTIONAL clause seems not behaving expected

Hi,

https://dbpedia.org/sparql

SELECT DISTINCT * WHERE
{
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Building>.
OPTIONAL { ?s <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> ?o1 . }
OPTIONAL { ?o1 rdf:type ?o1_type . }
}

This query gives unexpected random results 2-3 rows.
It should be giving 118694 rows. One row for each building with optional details as blank.
If we remove last OPTIONAL then data seems coming fine.

Logic in query : getting all buildings… for each building if isPrimaryTopicOf present then
show it otherwise blank…for all present isPrimaryTopicOf loading further rdf type.

Please guide further on this, how optional clause works in virtuoso.

Thanks,
NYadav

Hi NYadav,

This is all about the Virtuoso “Anytime Query” feature and its utilization re “DBpedia Fair Use” policy i.e., prevent any single User Agent from using the endpoint in ways that impede others.

Your query is very expensive, that’s all :slight_smile:

Workarounds?

  1. You can use OFFSET and LIMIT to page through the data
  2. You can instantiate your own DBpedia instance and the query will work just fine.

Related

@kidehen

If for ?o1 variable, i add null check
to make value blank for null and assign in new variable
then it works fine giving correct data.

https://dbpedia.org/sparql?default-graph-uri=http://dbpedia.org&query=SELECT

OpenLink Virtuoso SPARQL Query Editor
SELECT DISTINCT * WHERE
{
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Building>.
OPTIONAL { ?s <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> ?o1 . }
BIND ( COALESCE(?o1, "") AS ?o1A)
OPTIONAL { ?o1A rdf:type ?o1_type . }
}

As i understood few buildings(26 in 10k results) i.e ?s variable do not have isPrimaryTopicOf relation…that was reason i put in OPTIONAL clause so they also come in output…
Possibly those null values in ?o1 are making up something unexpected for join of rdf:type.

For Optional clause do we always need to do such null check ?
What is standard best practice for such cases ? Please guide.

Thanks,
Nyadav

It is advised since it will reduce the cost of your query in they eyes of Virtuoso’s cost-based Query Optimizer.