Issue when using several from clauses and inference

Hello,

I am building a system that generates queries and sends them to my sparql endpoint.
The system is relying on two graphs, one that contains the actual data, and the other that contains a few triplets representing the knowledge added by the user.
I also need inferencing, to exploit any “subclass” and “subproperty” that might be in either graph, so I have defined a rule set that includes the two graphs.

My query looks like this :

define input:inference 'user_g'
SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

But this returns only one value rdf:Class.
The expected result is a list of all classes that my Papers are linked to.

With the following query, with just the two FROM clauses without the inference, the results are correct.

SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

I first thought the issue was caused by the inference rule set containing the queried graph, but if I remove the second FROM clause, and keep the first one, the results are also correct.

define input:inference 'user_g'
SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

Note that the user_g rule set contains both <http://hyperstorylines.com/session/g> and <http://localhost:8890/inria-hal-expand>.

These queries are run on the sparql endpoint, both in the web interface and through SPARQLWrapper for python, results are the same either way.

I’ve tried using define input:default-graph-uri instead of FROM but the results are the same

I cannot use a GRAPH clause because I need triplets from one graph to be able to blend with the other (as most of my result are in the first graph, supplemented by the occasional triplets from the second one).

EDIT :
I’ve realized that swapping the two from clauses causes it to just return nothing, is there just something I don’t understand about FROM clauses and their implementation in Virtuoso ?

For information, right now the graph <http://hyperstorylines.com/session/g> is almost empty.

Please provide a synthetic dataset to aid resolution of this problem. Ditto your actual inference rules graph.

Here is a minimal example for the first dataset (equivalent to <http://localhost:8890/inria-hal-expand>) : inria_hal_minimal.ttl - Pastebin.com
This issue appears even if <http://hyperstorylines.com/session/g> is completely empty.

For the inference I set up in isql

rdfs_rule_set('user_g','http://localhost:8890/inria-hal-expand');
rdfs_rule_set('user_g','http://hyperstorylines.com/session/g');

and verified with

select * from db.dba.sys_rdf_schema
result :
[...]
user_g	 http://hyperstorylines.com/session/g	 <DB NULL>
user_g	 http://localhost:8890/inria_hal_minimal	 <DB NULL>

Are you saying the inria_hal_minimal.ttl - Pastebin.com dataset should be load into the <http://localhost:8890/inria-hal-expand> only or both graphs ie <http://localhost:8890/inria-hal-expand> and <http://hyperstorylines.com/session/g> as it is not clear to me what data should be in the <http://hyperstorylines.com/session/g> graph?

I also don’t understand why you load two inference rulesets ie

rdfs_rule_set('user_g','http://localhost:8890/inria-hal-expand');
rdfs_rule_set('user_g','http://hyperstorylines.com/session/g');

and why in your SPARQL query both graphs are in the query FROM clause ie

define input:inference 'user_g'
SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

as I would expect one graph to the the ruleset and the other to be the dataset being queried against ?

The pastebin sample should be loaded only in <http://localhost:8890/inria-hal-expand>. <http://hyperstorylines.com/session/g> can stay empty, and the issue is already present.

I really need both graphs in the rule set, because I need to be able to make use of any rdfs:subClassOf that would be defined in either graph.

And I also need both graphs in the query, as they can both contain triplets that might complement each other and be relevant to my query. This is also why I can’t use a GRAPH clause.

To be specific about my use case, the <http://hyperstorylines.com/session/g> is kind of an “overlay” graph, where one would inject new triplets without affecting the base data graph, but those “user knowledge” triplets could be relevant both to inference AND the queries themselves.

Actually is it this statement from your first post as to why you are loading the same data in both graphs and loading both graphs as inference rules ?

I also need inferencing, to exploit any “subclass” and “subproperty” that might be in either graph, so I have defined a rule set that includes the two graphs.

As loading the dataset as such I think I can see the problem you report with:

  • The query with the inference rule in place returns no results ie
define input:inference 'user_g'
SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}
  • Whereas the query without the inference rule does return results ie
SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

The results being:

c
http://example.com/test#INRIATeam
http://example.com/test#Person
http://www.w3.org/2000/01/rdf-schema#Class

Is this the problem being reported ?

Yes that is the issue.

Normally I’d want the query with inference to give the same results as without. (and later when I add triplets to session/g for their data to appear as well)

OK, we shall look into this …

Where are the TBox triples used in your inference rule i.e., the content of both http://localhost:8890/inria-hal-expand and http://hyperstorylines.com/session/g?

You MUST keep the TBox and ABox triples in separate graphs. As for your rule setup, you shouldn’t be using rdfs_rule_set to map two distinct TBox graphs to the same rule, since you can simply make one TBox graph from http://localhost:8890/inria-hal-expand and http://hyperstorylines.com/session/g for the rule.

/cc @hwilliams

I do not get why TBox and ABox need to be separated.
My system is quite complex, and in reality the second graph <http://localhost:8890/inria-hal-expand> is regularly changed to another graph, depending on the user’s choice of dataset. And I can’t know which TBox graph is associated to the ABox graph selected by the user.
Which is why I made the choice to have them in the same graph, and that approach is in no way against RDF specification.

I figured rdfs_rule_set was meant to be used with several graphs at the same time since it doesn’t replace the old graph automatically and needs a specific instruction to delete a mapping. Especially when I realized I couldn’t use several inference rules in the same query.
For the same reason as stated above, even if I had a separate TBox graph associated to the main dataset, I can’t really afford to build a new TBox graph every time I change dataset, since I need the inference graph to be a union of both graphs’ TBox triplets and those datasets are quite big.

Would a solution working with owl:imports be preferable ? I’ve avoided that since I didn’t think Virtuoso supported it.
Please correct me if one of my assumptions is incorrect.

The TBox and ABox separation is primarily to ease troubleshooting and problem recreation.

BTW – @hwilliams has informed me that the issue has been recreated in-house using the Open Source Edition. Note, the Commercial Edition doesn’t exhibit the problem.

I understand the motivation for the separate TBox and ABox but I have genuine reasons to prefer a mixed graph.

For the rdfs_rule_set mapping being unique is that a restriction of the system or a recommendation ?

I am indeed using the Open Source Edition, does the fact the issue only appears in it mean that it should be considered a bug ? Or does it mean I have no way of solving my problem without the Commercial Edition ?

Back porting to the open source edition is a grey area regarding prioritization, and incentives.

/cc @hwilliams