Performance impact of using multiple FROM clauses

Hello,

I’m working on an application that send user-customized queries on our Virtuoso server (OS edition)
The user select one or more entity types, and the query is sent only on graphs containing these entities, using FROM clauses.

(The aim being both ‘targeting’ the relevant graphs, and managing users permissions from the application).

Except, it seems like that using FROM clauses is significantly slower than not specifying any graphs (which, as I understand, means send the query to all available graphs). I did not notice the problem before, as I only used small graphs (and thus the graph aggregation was quick, I think?)

As an exemple, here is the following query (automatically-generated, thus the variable names)

PREFIX : <http://askomics.org/data/>
PREFIX askomics: <http://askomics.org/internal/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX faldo: <http://biohackathon.org/resource/faldo/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?Metabolomics_Study1_Label ?Experimentation1_Label ?Concentration_AU1_Label ?Genotype36_uri

FROM <urn:sparql:askomics:1_:brassimet_racines_cau_concentrationau_0.ttl_1649313602>
FROM <urn:sparql:askomics:1_:brassimet_feuilles_cau_concentrationau_0.ttl_1649299093>
FROM <urn:sparql:askomics:1_:brassimet_racines_ms_meteroutputfile_0.ttl_1649317171>
FROM <urn:sparql:askomics:1_:brassimet_feuilles_ms_meteroutputfile_0.ttl_1649313588>
FROM <urn:sparql:askomics:1_:collection-probiodiv.owl_1649317105>

WHERE {
    ?Metabolomics_Study1_uri <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#has_experimentation> ?Experimentation6_uri .
    ?Experimentation6_uri <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#has_concentration_au> ?Concentration_AU13_uri .
    ?Genotype36_uri <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#has_value> ?Concentration_AU13_uri .
    ?Metabolomics_Study1_uri rdf:type <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#MetabolomicsStudy> .
    ?Metabolomics_Study1_uri rdfs:label ?Metabolomics_Study1_Label .
    ?Experimentation6_uri rdf:type <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#Experimentation> .
    ?Experimentation6_uri rdfs:label ?Experimentation1_Label .
    ?Concentration_AU13_uri rdf:type <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#ConcentrationAU> .
    ?Concentration_AU13_uri rdfs:label ?Concentration_AU1_Label .
    ?Genotype36_uri rdf:type <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#Genotype> .


    VALUES ?Genotype36_uri { <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#Aviso> } .
}

This query takes 6 minutes. If I remove all FROM clauses, it now takes 10 seconds.
Is there any reason for the time difference? I’d rather select the graphs instead of querying all, if possible.

Thanks!

Does the following amendment to the query using the GRAPH keyword invoking the graph index enable the query to run faster:

PREFIX : <http://askomics.org/data/>
PREFIX askomics: <http://askomics.org/internal/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX faldo: <http://biohackathon.org/resource/faldo/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?Metabolomics_Study1_Label ?Experimentation1_Label ?Concentration_AU1_Label ?Genotype36_uri

FROM <urn:sparql:askomics:1_:brassimet_racines_cau_concentrationau_0.ttl_1649313602>
FROM <urn:sparql:askomics:1_:brassimet_feuilles_cau_concentrationau_0.ttl_1649299093>
FROM <urn:sparql:askomics:1_:brassimet_racines_ms_meteroutputfile_0.ttl_1649317171>
FROM <urn:sparql:askomics:1_:brassimet_feuilles_ms_meteroutputfile_0.ttl_1649313588>
FROM <urn:sparql:askomics:1_:collection-probiodiv.owl_1649317105>

WHERE 
{
GRAPH ?g 
    {
    ?Metabolomics_Study1_uri <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#has_experimentation> ?Experimentation6_uri .
    ?Experimentation6_uri <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#has_concentration_au> ?Concentration_AU13_uri .
    ?Genotype36_uri <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#has_value> ?Concentration_AU13_uri .
    ?Metabolomics_Study1_uri rdf:type <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#MetabolomicsStudy> .
    ?Metabolomics_Study1_uri rdfs:label ?Metabolomics_Study1_Label .
    ?Experimentation6_uri rdf:type <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#Experimentation> .
    ?Experimentation6_uri rdfs:label ?Experimentation1_Label .
    ?Concentration_AU13_uri rdf:type <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#ConcentrationAU> .
    ?Concentration_AU13_uri rdfs:label ?Concentration_AU1_Label .
    ?Genotype36_uri rdf:type <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#Genotype> .

    VALUES ?Genotype36_uri { <https://p2m2.github.io/resource/ontologies/2022/2/p2m2-ontology-3#Aviso> } .
    }
}

Sadly, our graphs are separated by entity type, so adding a GRAPH clause return no results. We need the graph aggregation for the query to return something.

Two of the graphs (the ones with ‘concentration’ in the name), are ‘big’. (4 and 7 millions triples). The others are at most a few thousands. We did not notice this issue until now because we were using smaller graphs. I suppose the join is very costly in this case, but I don’t really understand why it would be faster to not specify any graph.

Yes, but what was the response time with the revised query I provided ?

Sorrry about that. It took roughly 3 seconds.

So that is even faster than when no graphs are specified and the default of ALL graphs is used, as the graph ?g specification in the where clause instructs the compiler to use the graph index when performing the query.

Seems like it. Is there any way to make this works with the way our graphs are setup?

Not sure what you are asking as my revised query runs against your database the way your graphs are setup , with only the graph ?g specification added to the where clause of your original query ?

Well, as I mentioned, the new query (with the GRAPH clause) returns no results, so I cannot use it.
Sorry if that wasn’t clear beforehand.

Please share your query example. Even better, if possible, share you SPARQL Query Service endpoint URL.

This will aid our ability to assist you.

Kingsley