Why
Automated query optimization is an imperfect science that doesn’t always produce optimal outcomes for query solution production pipelines.
What
Virtuoso has a number of pragmas that can be passed in SPARQL queries to control or influence query execution. Among these are the sql:table-option "index {option-id}" and sql:table-option "{ LOOP | HASH }" pragmas, which may be used to inform a Virtuoso runtime session about query optimization preferences via explicit INDEX selection and/or JOINmethods.
Note that almost all keywords discussed in this article are case-insensitive, meaning you may use uppercase, lowercase, or any mix of these. The only case-sensitive string herein is sql:, which is a Virtuoso built-in PREFIX for accessing SQL functions from within SPARQL queries.
How
Basic INDEX Option Usage
Here’s a breakdown of the INDEX option identifiers associated with the sql:table-option "INDEX {option-id}" pragma in relation to SPARQL query execution.
DEFINE sql:table-option "index RDF_QUAD"— Invokes thePSOGprimary key indexDEFINE sql:table-option "index RDF_QUAD_POGS"— Invokes thePOGSindexDEFINE sql:table-option "index S"or
DEFINE sql:table-option "index RDF_QUAD_SP"— Invokes theSPindexDEFINE sql:table-option "index G"or
DEFINE sql:table-option "index RDF_QUAD_GS"— Invokes theGSindexDEFINE sql:table-option "index O"or
DEFINE sql:table-option "index RDF_QUAD_OP"— Invokes theOPindex
When these are placed in the query prolog (i.e., preceding any PREFIX declarations), they are applied to the entire query. As will be detailed further below, similar clauses with a somewhat different syntax may be placed within the graph patterns, to be applied only to the specific pattern in which they are found.
Note: If a query is executed with an invalid index option ID, the query will fail with error SQ188: TABLE OPTION index {option-id} not defined for table DB.DBA.RDF_QUAD.
Basic INDEX Option Usage Example, Global to SPARQL query
DEFINE sql:table-option "index S"
PREFIX ex: <http://example.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?name
WHERE {
VALUES ?person {
<urn:person_0>
<urn:person_1>
}
?person ex:hasSkill ?skill .
?skill rdfs:label ?name .
}
Basic JOIN Option Usage
For queries that touch large quantities of RDF data and have many selection conditions, use of HASH JOIN is often desirable. For short lookup queries, HASH JOIN is usually not desirable.
Syntax for application to an entire SPARQL query is as follows, with the relevant line preceding any PREFIX declarations —
DEFINE sql:table-option "HASH"DEFINE sql:table-option "LOOP"
Syntax on per triple pattern basis is as follows, with the relevant OPTION clause following the object of the relevant triple pattern —
OPTION ( TABLE_OPTION "HASH" )OPTION ( TABLE_OPTION "LOOP" )
Basic JOIN Option Usage Example
DEFINE sql:table-option "LOOP"
PREFIX ex: <http://example.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?name
WHERE {
VALUES ?person {
<urn:person_0>
<urn:person_1>
}
?person ex:hasSkill ?skill
?skill rdfs:label ?name .
}
Graph Pattern-specific INDEX Option Usage
If global scope of the INDEX option isn’t desired — such as in situations where index usage is to be scoped to a specific graph pattern — you simply move the OPTION clause from the query prolog to the triple-pattern level, and adjust the syntax as shown in the examples that follow.
OPTION ( TABLE_OPTION "index RDF_QUAD" )OPTION ( TABLE_OPTION "index RDF_QUAD_POGS" )OPTION ( TABLE_OPTION "index S" )OPTION ( TABLE_OPTION "index RDF_QUAD_SP" )OPTION ( TABLE_OPTION "index G" )OPTION ( TABLE_OPTION "index RDF_QUAD_GS" )OPTION ( TABLE_OPTION "index O" )OPTION ( TABLE_OPTION "index RDF_QUAD_OP" )
Graph Pattern-specific Index Option Usage Example
PREFIX ex: <http://example.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?name
WHERE {
VALUES ?person {
<urn:person_0>
<urn:person_1>
}
?person ex:hasSkill ?skill
OPTION (table_option "hash, index RDF_QUAD_POGS") .
?skill rdfs:label ?name .
}
Combined Option Usage
The INDEX and JOIN options may be used together, either across the entire SPARQL query or limited to a single graph pattern, as described above.
When combined on a single graph pattern, both options should be declared in one OPTIONclause, comma-separated within the double-quoted term, such as OPTION (table_option "HASH, INDEX RDF_QUAD_POGS" ) or OPTION (table_option "LOOP, O" ).
When applied to the entire SPARQL query, a single DEFINE may be used, like —
DEFINE sql:table-option "hash, index RDF_QUAD_POGS"
— or you may use two DEFINE clauses, as in —
DEFINE sql:table-option "hash"
DEFINE sql:table-option "index RDF_QUAD_POGS"
Combined Option Usage Example
PREFIX ex: <http://example.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?name
WHERE {
VALUES ?person {
<urn:person_0>
<urn:person_1>
}
?person ex:hasSkill ?skill
OPTION (table_option "hash, index RDF_QUAD_POGS") .
?skill rdfs:label ?name .
}