Issue when using several from clauses and inference

nk-fouque · November 28, 2024, 8:21am

Hello,

I am building a system that generates queries and sends them to my sparql endpoint.
The system is relying on two graphs, one that contains the actual data, and the other that contains a few triplets representing the knowledge added by the user.
I also need inferencing, to exploit any “subclass” and “subproperty” that might be in either graph, so I have defined a rule set that includes the two graphs.

My query looks like this :

define input:inference 'user_g'
SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

But this returns only one value rdf:Class.
The expected result is a list of all classes that my Papers are linked to.

With the following query, with just the two FROM clauses without the inference, the results are correct.

SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

I first thought the issue was caused by the inference rule set containing the queried graph, but if I remove the second FROM clause, and keep the first one, the results are also correct.

define input:inference 'user_g'
SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

Note that the user_g rule set contains both <http://hyperstorylines.com/session/g> and <http://localhost:8890/inria-hal-expand>.

These queries are run on the sparql endpoint, both in the web interface and through SPARQLWrapper for python, results are the same either way.

I’ve tried using define input:default-graph-uri instead of FROM but the results are the same

I cannot use a GRAPH clause because I need triplets from one graph to be able to blend with the other (as most of my result are in the first graph, supplemented by the occasional triplets from the second one).

EDIT :
I’ve realized that swapping the two from clauses causes it to just return nothing, is there just something I don’t understand about FROM clauses and their implementation in Virtuoso ?

For information, right now the graph <http://hyperstorylines.com/session/g> is almost empty.

kidehen · November 28, 2024, 8:05pm

Please provide a synthetic dataset to aid resolution of this problem. Ditto your actual inference rules graph.

nk-fouque · November 29, 2024, 9:45am

Here is a minimal example for the first dataset (equivalent to <http://localhost:8890/inria-hal-expand>) : https://pastebin.com/UV9392gR
This issue appears even if <http://hyperstorylines.com/session/g> is completely empty.

For the inference I set up in isql

rdfs_rule_set('user_g','http://localhost:8890/inria-hal-expand');
rdfs_rule_set('user_g','http://hyperstorylines.com/session/g');

and verified with

select * from db.dba.sys_rdf_schema
result :
[...]
user_g	 http://hyperstorylines.com/session/g	 <DB NULL>
user_g	 http://localhost:8890/inria_hal_minimal	 <DB NULL>

hwilliams · November 29, 2024, 2:03pm

Are you saying the inria_hal_minimal.ttl - Pastebin.com dataset should be load into the <http://localhost:8890/inria-hal-expand> only or both graphs ie <http://localhost:8890/inria-hal-expand> and <http://hyperstorylines.com/session/g> as it is not clear to me what data should be in the <http://hyperstorylines.com/session/g> graph?

I also don’t understand why you load two inference rulesets ie

rdfs_rule_set('user_g','http://localhost:8890/inria-hal-expand');
rdfs_rule_set('user_g','http://hyperstorylines.com/session/g');

and why in your SPARQL query both graphs are in the query FROM clause ie

define input:inference 'user_g'
SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

as I would expect one graph to the the ruleset and the other to be the dataset being queried against ?

nk-fouque · November 29, 2024, 2:30pm

The pastebin sample should be loaded only in <http://localhost:8890/inria-hal-expand>. <http://hyperstorylines.com/session/g> can stay empty, and the issue is already present.

I really need both graphs in the rule set, because I need to be able to make use of any rdfs:subClassOf that would be defined in either graph.

And I also need both graphs in the query, as they can both contain triplets that might complement each other and be relevant to my query. This is also why I can’t use a GRAPH clause.

To be specific about my use case, the <http://hyperstorylines.com/session/g> is kind of an “overlay” graph, where one would inject new triplets without affecting the base data graph, but those “user knowledge” triplets could be relevant both to inference AND the queries themselves.

hwilliams · November 29, 2024, 2:35pm

Actually is it this statement from your first post as to why you are loading the same data in both graphs and loading both graphs as inference rules ?

I also need inferencing, to exploit any “subclass” and “subproperty” that might be in either graph, so I have defined a rule set that includes the two graphs.

As loading the dataset as such I think I can see the problem you report with:

The query with the inference rule in place returns no results ie

define input:inference 'user_g'
SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

Whereas the query without the inference rule does return results ie

SELECT DISTINCT ?c 
FROM <http://localhost:8890/inria-hal-expand> 
FROM  <http://hyperstorylines.com/session/g>
WHERE {?s a <http://example.com/test#Paper>. ?s [] ?o. ?o a ?c}

The results being:

c
http://example.com/test#INRIATeam
http://example.com/test#Person
http://www.w3.org/2000/01/rdf-schema#Class

Is this the problem being reported ?

nk-fouque · November 29, 2024, 2:42pm

Yes that is the issue.

Normally I’d want the query with inference to give the same results as without. (and later when I add triplets to session/g for their data to appear as well)

hwilliams · November 29, 2024, 2:46pm

OK, we shall look into this …

kidehen · November 29, 2024, 2:49pm

Where are the TBox triples used in your inference rule i.e., the content of both http://localhost:8890/inria-hal-expand and http://hyperstorylines.com/session/g?

You MUST keep the TBox and ABox triples in separate graphs. As for your rule setup, you shouldn’t be using rdfs_rule_set to map two distinct TBox graphs to the same rule, since you can simply make one TBox graph from http://localhost:8890/inria-hal-expand and http://hyperstorylines.com/session/g for the rule.

/cc @hwilliams

nk-fouque · November 29, 2024, 3:17pm

I do not get why TBox and ABox need to be separated.
My system is quite complex, and in reality the second graph <http://localhost:8890/inria-hal-expand> is regularly changed to another graph, depending on the user’s choice of dataset. And I can’t know which TBox graph is associated to the ABox graph selected by the user.
Which is why I made the choice to have them in the same graph, and that approach is in no way against RDF specification.

I figured rdfs_rule_set was meant to be used with several graphs at the same time since it doesn’t replace the old graph automatically and needs a specific instruction to delete a mapping. Especially when I realized I couldn’t use several inference rules in the same query.
For the same reason as stated above, even if I had a separate TBox graph associated to the main dataset, I can’t really afford to build a new TBox graph every time I change dataset, since I need the inference graph to be a union of both graphs’ TBox triplets and those datasets are quite big.

Would a solution working with owl:imports be preferable ? I’ve avoided that since I didn’t think Virtuoso supported it.
Please correct me if one of my assumptions is incorrect.

kidehen · November 29, 2024, 5:22pm

The TBox and ABox separation is primarily to ease troubleshooting and problem recreation.

BTW – @hwilliams has informed me that the issue has been recreated in-house using the Open Source Edition. Note, the Commercial Edition doesn’t exhibit the problem.

nk-fouque · December 2, 2024, 8:48am

I understand the motivation for the separate TBox and ABox but I have genuine reasons to prefer a mixed graph.

For the rdfs_rule_set mapping being unique is that a restriction of the system or a recommendation ?

I am indeed using the Open Source Edition, does the fact the issue only appears in it mean that it should be considered a bug ? Or does it mean I have no way of solving my problem without the Commercial Edition ?

kidehen · December 2, 2024, 9:44pm

Back porting to the open source edition is a grey area regarding prioritization, and incentives.

/cc @hwilliams

nk-fouque · December 4, 2024, 11:18am

Thank you for your answer.

To be clear for anyone troubleshooting this issue : I ran more tests with several different scenarios,

Properly separating ABox and Tbox doesn’t solve the issue.
Using only one graph in the inference rule set doesn’t solve the issue.
- Moreover, when I have only one FROM clause, having two graphs in the inference rule set does not introduce any errors.

This issue is only happening in the specific scenario where I have all of those conditions :

2 FROM Clauses (or 2 define input:default-graph-uri pragmas)
1 Inference rule set
The predicate rdf:type (or a) in the query more than once

I am on Virtuoso Open Source Version: 07.20.3240