SPARQL-BI -- Extract and Contruct (DRAFT)

sparql-bi
documentation
virtuoso-docs

#1

Here is a raw dump from an email from @imikhailov

**

SPARQL-BI Operator EXTRACT {…} FROM CONSTRUCT {…}…

**

Rationale

When data are gathered from heterogenous sources of various quality, running a SPARQL query may require some data enrichment to fetch or calculate missing items, canonicalize some parts of data etc.
Static data can be enriched once by running some amount of INSERT statements after loading raw data — static enrichment for static data.
That’s good for ease of writing queries on that data and the queries will run quickly, however loading next version of data will require re-running same INSERT statements again.
When the data are dynamic, the cost of everlasting enrichment can easily overweight conveniences of the querying.
SPARQL-BI offers a new operator suitable for ad-hoc enrichment of small portions of raw dynamic data during a run of a specific query.

CONSTRUCT operator as a data source for a group pattern

W3C SPARQL specification let the query access numerous RDF graphs identified by IRIs but does not provide a way to fill in ad-hoc graphs during the query run.
SPARQL-BI let a specific portion of query to run on result of CONSTRUCT statement.
The syntax is

{ EXTRACT { <group_pattern> } FROM CONSTRUCT { <ctor_triples> } WHERE { <group_pattern> } <solution_modifiers> }

First of all, WHERE {…}… clause forms solutions, like in any CONSTRUCT or SELECT query, using usual dataset of the query in a usual way.
Then CONSTRUCT {…} forms a list of RDF triples using bindings from solutions and external parameters.

The triples are stored as a temporary RDF graph.
EXTRACT {…} group pattern is applied to that RDF graph as if it were a plain default graph, the found solutions are solutions of the whole operator.

There are no restrictions on CONSTRUCT {…} and WHERE {…}… clauses; any grouping and other solution modifiers are allowed.

There is one restriction on EXTRACT {…} clause: the group pattern must not contain graph group patterns (i.e., GRAPH …{…} clauses).

Triple patterns of EXTRACT {…} deal with temporary graph only.
If EXTRACT clause should access some data outside the temporary graph, it should contain an explicit SELECT subquery; the subquery can use its own FROM and FROM NAMED clauses in a usual way.

Temporary graph is a graph of physical triples, but unlike usual triples stored in usual DB.DBA.RDF_QUAD table they reside in dedicated DB.DBA.RDF_QUAD_TMP table used solely by EXTRACT feature.
The structure of the table is identical to DB.DBA.RDF_QUAD, same G,S,P and O columns etc., but indicies differ due to different write/read patterns.

Multiple CONSTRUCTs for single EXTRACT

Data to be enriched are not necessarily uniform, so different parts may require different processing.

Moreover, a single source may require many different enrichment activities that will form different solutions and different grouping of results of that solutions.
Formally speaking, any combination of queries can be written as a single big CONSTRUCT over a big UNION of subqueries of all sorts, but the resulting behemoth operator is hard to maintain.

To eliminate the problem, EXTRACT operator can contain a list of CONSTRUCT {…} WHERE {…}… statements delimited by UNION keywords.
Statements are executed in an unspecified order and the constructed triples are all placed into single temporary graph.

Querying the mix of raw and constructed data

Consider an almost-fine dataset such that almost all objects in question have all desired properties and only small fraction needs an ad-hoc CONSTRUCT to get some of that properties calculated.

Duplicating of numerous existing properties into temporary graph seems to be waste of resources so some properties of some objects are only in raw data whereas the minor rest is stored temporarily.

Plain EXTRACT is not convenient for processing of the result, because every triple pattern of EXTRACT {…} clause will be applied to temporary graph or to raw data (when wrapped into a subquery with GRAPH…{…} ), but not to union of them.

Every triple pattern of plain EXTRACT {…} should be turned into a union of a triple pattern and a subquery; that is very inconvenient.

The workaround is a special syntax in UNION of CONSTRUCTs: the list may contain keyword STORAGE.

EXTRACT {…} FROM CONSTRUCT {…}… UNION STORAGE

will be compiled in such a way that temporary graph will be included into the context set of graphs.
I.e., if EXTRACT {…}… is placed inside graph group pattern then the temporary graph is added to the list of named graphs, as if it is in one of FROM NAMED … clauses;
otherwise the temporary graph is added to the list of named graphs, as if it is in one of FROM… clauses.

With this trick, triples created by enrichment and triples of raw data are interchangeable.
However subqueries inside EXTRACT {…} behaves same way regardless “…UNION STORAGE” trick.

CONSTRUCT macro

A CONSTRUCT statement (or a UNION-delimited list of them) can be used as a body of a macro definition.

This definition can be stored as any other macro definition.
The explicit macro invocation can be placed in EXTRACT operator like a plain CONSTRUCT, either alone or in UNION with other explicit macro invocations and CONSTRUCTs.
However one can not use the invocation of macro as a standalone CONSTRUCT operator.

Security considerations

Unlike triple patterns over DB.DBA.RDF_QUAD, EXTRACT always deals with single temporary graph and this graph is known in advance, also every temporary graph is filled in only once, erased as a whole and not edited in between.

Tne query with an EXTRACT operator can not get access to temporary data of other EXTRACT, however the temporary data can “stuck” in the table if query is aborted by weird disconnect of a client and they may be stored in backup images.

Due to this risk, do not use EXTRACT for materializing highly confidential data such as temporarily decoded passwords or de-obfuscated personal data.

DB.DBA.RDF_QUAD_TMP is fully erased at server restart.

Rationale

The whole evolution of programming languages demonstrates weird fact: the price of a single line of program code does not depend on the language in use.
(More correctly, individual differencies between programmers are way more important than the language in use).
As a result, high-level languages eliminate minor details of the implemntation from the handwritten code and thus let the developer to fit more application logic into same number of lines and same development costs.

In order to cut BI costs via SPARQL, queries againts existing data should become shorter than equivalent SQL statements.

Two cases demonstrate good “compression ratio” right now: complicated unique SPARQL queries on any sources and most of SPARQL over RDF Views.

However routine SPARQL queries over “native RDF” data and trivial SPARQL over RDF Views are no shorter than equivalent SQLs over “reusable” definitions of SQL views.
The write-once-use-everywhere nature of SQL views let the development team to separate two activities: (1) the infrequent enriching of the database schema with carefully designed views and (2) the everyday routine reuse of these views.

The SQL compiler replace references to SQL views in a given SQL query with bodies of these views, that resembles macroexpansion in other languages.
If that cut costs for SQL, adding macro to SPARQL-BI may cut costs as well.

When the query should use result of view as a whole, not just fetch it once row after row, the result can be temporarily stored.

In SQL, the query can form intermediate tables in a special place for temporary data.
SPARQL-BI query can do the same, because Virtuoso translates SPARQL to SQL and then applies any nice tricks of an SQL engine.

However, a SPARQL query can contain much more joins, so SPARQL-BI offers an additional operator named EXTRACT.

Basic Syntax

SPARQL macro definitions resembles SQL views, they are implemented as term rewriting, not as a substitution of “macro call” lexems with “macro body” lexems right in the source text.
As a result, there are four sorts of SPARQL macro declarations, for four different methods of macro invocation.

There four are triple template macro, group pattern macro, expression macro and construct macro.

All four have few common properties:

  • They are defined at the very end of query prolog, i.e. after all DEFINE, BASE and PREFIX clauses but before SELECT, DESCRIBE, INSERT or the like.
  • They have distinct URIs as identifiers.
    It may be practical to double-check namespace declarations if macro names are written in “namespace:local” notation, because debugging of typos can be costly.
  • The order of macro definitions is important: the body part of a macro can not refer to the macro itself or to any macro defined below.
    As a consequence, recursive definitions are prohibited even at syntax level.

> [1]* Query ::= Prolog ( QueryBody | SparulAction* | ( QmStmt (’.’ QmStmt)* ‘.’? ) )
> [2]* Prolog ::= Define* BaseDecl? PrefixDecl* Defmacro*
> [Virt] Defmacro ::= ‘DEFMACRO’ Q_IRI_REF (
> DefmacroArgs ( ‘LOCAL’ DefmacroArgs )? ( GraphGroupPattern | Expn ) |
> DefmacroPattern ( ‘LOCAL’ DefmacroArgs )? GraphGroupPattern )
> [Virt] DefmacroPattern ::= (( ‘GRAPH’ PatternItemGorS ) | ( ‘DEFAULT’ ‘GRAPH’ ))?
> ‘{’ PatternItemGorS PatternItemP PatternItemO ‘}’
> [Virt] DefmacroArgs ::= ‘(’ ((VAR1 | VAR2)* | ((VAR1 | VAR2) ( ‘,’ (VAR1 | VAR2))+)) ‘)’

A macro head can contain list of names of local variables, preceded by a keyword LOCAL.
If this clause presens, еvery variable mentioned in the body of a macro should be listed in macro head, i.e. in list of arguments, pattern or list of local variables.

Triple Template Macro

The triple template macro replaces any triple pattern that matches the specified template.
Depending type of the template it can match

  • only patterns that refer to named graphs (to any named graphs or to a single specified named graph).
    These macro work for matching triple patterns located inside GRAPH {…} graph group patterns.
  • only patterns that refer to default graph.
    These macro work for matching triple patterns located outside any GRAPH {…} graph group patterns.
  • any triple patterns regardless GRAPH {…}.

The template contains three items for subject, predicate and object (plus fourth for graph of a named graph template).
Each item is a constant of some sort or a variable.
A given triple pattern of a query matches the given template if and only if for every constant item of the template there is equal constant in the triple pattern.

[Virt] DefmacroPattern ::= (( ‘GRAPH’ PatternItemGorS ) | ( ‘DEFAULT’ ‘GRAPH’ ))?
_ ‘{’ PatternItemGorS PatternItemP PatternItemO ‘}’_
[Virt] PatternItemGorS ::= VAR1 | VAR2 | IRIref
[Virt] PatternItemP ::= VAR1 | VAR2 | ‘a’ | IRIref
[Virt] PatternItemO ::= VAR1 | VAR2 | IRIref
_ | RDFLiteral | ( ‘-’ | ‘+’ )? NumericLiteral | BooleanLiteral | NIL_

The body of the template macro is graph group pattern.
When the macro is expanded, all members of this pattern are inserted into the graph group pattern that contained the macro call (“context group pattern”), and the order of members is preserved.

All filters of this pattern are added to the end of list of filters of the context group pattern.

Template Macro Example 1

Consider query

DEFMACRO <sample_plain> { ?me ?page } LOCAL ( ?author )
{ ?me foaf:knows ?author .
{ ?author foaf:homePage ?page } UNION { ?author foaf:workplaceHomePage ?page } }

DEFMACRO <sample_named_graph> GRAPH ?g { ?s englishTopicOfGraph ?t }
{ ?s foaf:topic ?t .
FILTER (?s in (?g, IRI(bif:concat (STR(?g), ‘#this’)))) .
FILTER (LANGMATCHES (?t, ‘en’)) }

SELECT * WHERE {
http://example.com/#me ?pg .
GRAPH ?pg { ?s ?topic }

The template { ?me ?page } matches the first triple pattern of the WHERE clause.
The template GRAPH ?g { ?s foaf:topic ?topic } matches the second one.
As a result, the query will be macroexpanded into

SELECT * WHERE {
http://example.com/#me foaf:knows ?author-1 .
{ ?author1 foaf:homePage ?pg } UNION { ?author-1 foaf:workplaceHomePage ?pg }
GRAPH ?pg {
?s foaf:topic ?topic .
FILTER (?s in (?pg, IRI(bif:concat (STR(?pg), ‘#this’)))) .
FILTER (LANGMATCHES (?topic, ‘en’)) } }

The LOCAL clause in the DEFMACRO <sample_plain> enumerates local variables of macro, i.e., variables of the template body that are not template arguments.
This provides some protection from typos in variable names.

The LOCAL clause is optional; when omited then no check for variable names is made and the compiler builds its list of local variables by observing the template body.

Writing LOCAL is boring but typos in macro can result in debugging nightmare.
In SQL, typo in column name is detected at compile time; in SPARQL, there are no database schema data and no fixed names.

Whenever possible, do not omit LOCAL (…) .

When a “clone” of the macro body replaces the occurence of macro call in query, every local variable gets a new unique name.

With this renaming, local variables of different macro expansions does not interfere.
If macro body uses blank node notation for some items of triple patterns, these blank nodes will be handled like local variables but there is no need to list them in LOCAL (…) list.

Template Macro Example 2

Note the difference between

DEFMACRO <named_becomes_named> GRAPH ?g { ?s

?o } { ?s ?o }

and

DEFMACRO <named_graph> { ?s

?o } LOCAL (?g) { GRAPH ?g { ?s ?o } }

The first template means that the macro can be expanded only if triple pattern deals with named graph, not a default.

The second template means that the macro can be expanded for both triple patterns for named and for default graph,

but the resulting code will deal with named graphs: each macro call will produce a pattern on some named graph and graphs mentioned by different patterns may be different, because
local ?g variable will become unique ?g1, ?g2 ?g3 and so on.

Group Pattern Macro

The triple template macro is a convenient shorthand and can be used in order to quickly tweak existing queries (e.g., add more FILTERs to remove garbage from data).
However they are not usable if arguments are too numerous to fit into triple template or if the macro should get group graph pattern as an argument.

[Virt] MacroCall ::= ‘MACRO’ IRIref MacroArgList?
[Virt] MacroArgList ::= ‘(’ ExpnOrGgps? ‘)’
[Virt] ExpnOrGgps ::= ExpnOrGgp ( ‘,’ ExpnOrGgp )*
[Virt] ExpnOrGgp ::= Expn | GroupGraphPattern

The call of group pattern macro can be placed in any clause where a plain triple can be placed.

Arguments of group pattern macro are usually scalar expressions.
However it is possible to pass a group graph pattern as an argument.
This pattern is used as if it is a fragment of text of the macro.

To use the argument ?x of this sort, place MACRO ?x into a group pattern where the fragment should appear, in a place where a triple pattern can appear.

All variables used inside the argument pattern should be listed in the header of the group pattern macro.

Expression Macro

The body of an expression macro is a scalar expression and its call can reside in any place where expressions are allowed.

The syntax is same as for plain function call, but macro call can be prepended by an additional (and optional) “MACRO” keyword.

The “MACRO” keyword also permits passing group graph patterns as arguments of macro call.
Passing patterns as arguments is more frequent for group pattern macro and an exotic but still possible case for expression macro:
arguments of this sort can be used inside scalar subqueries nested into scalar expression.

The “MACRO” keyword brings additional safety when used inside a macro body in long list of macro definitions: the keyword differenciates between function call and an attempt of calling some macro defined below, not above.

Nested Macro Calls

The argument of a macro call can in turn contain macro calls.
The body of a macro can also contain macro calls.
These nested macro calls are first of all placed into the result of the macroexpansion and then macroexpanded there.

Example

prefix macro: http://example.com/macro/
defmacro macro:DBpedia-ResSetName () “resource”
defmacro macro:DBpedia-IRI (?set_name, ?item_name) iri (bif:concat (“http://dbpedia.org/”, ?set_name, “/”, ?item_name))
defmacro macro:DBpedia-type (?subset, ?name) local (?t)
((select ?t where { service http://dbpedia.org/sparql (define lang:dialect 65535) { graph http://dbpedia.org {
<macro:DBpedia-IRI> (?subset, ?name) a ?t } } }))

With these three definitions,

macro:DBpedia-type (macro:DBpedia-ResSetName(), “Novosibirsk”))

will become an equivalent of scalar subquery

((select ?t where { service http://dbpedia.org/sparql (define lang:dialect 65535) { graph http://dbpedia.org {
iri (bif:concat ("http://dbpedia.org/", "resource", "/", "Novosibirsk")) a ?t } } }))

i.e., selection of some rdf:type of http://dbpedia.org/resource/Novosibirsk in DBpedia .

Macro Libraries

[Virt] CreateMacroLib ::= ‘CREATE’ ‘MACRO’ ‘LIBRARY’ IRIref ‘{’ Defmacro* ‘}’
[Virt] QmAttachMacroLib ::= ‘ATTACH’ ‘MACRO’ ‘LIBRARY’ QmIRIrefConst
[Virt] QmDetachMacroLib ::= ‘DETACH’ ‘SILENT’? ‘MACRO’ ‘LIBRARY’ QmIRIrefConst?
[Virt] DropMacroLib ::= ‘DROP’ ‘SILENT’? ‘MACRO’ ‘LIBRARY’ PrecodeExpn

Macro definitions can be grouped into named “libraries”, stored in graph of system metadata and loaded in memory with all other RDF storage metadata and used in SPARQL queries.

To make a library, one should write a SPARQL query with some DEFMACRO statements grouped at its beginning.

This query should signal an error or return a nonempty result set (e.g., some diagnostic strings) if the library can not be used on the given database.
Empty result means “no troubles”.

A macro library can be referred to in queries in two ways.

First, a quad storage can have an associated macro library, so every SPARQL query that inputs data from that storage can call macro from the library.

This feature can be disabled by placing
define input:disable-storage-macro-lib
at the beginning of the query; this feature is especially useful when a next version of an existing macro library is being developed and it should not interfere with one already in use.

Other approach is to specify the usage explicitly, by placing one or more
define input:macro-lib
at the beginning of the query.

Library can refer to other libraries, combining big collections from smaller parts.
It’s an error if a macro defined in one part is redefined in other but not an error if one part is included more than once in a tree of nested parts.

However a macro associated with storage can be redefined in some included library (but not right in query).

Trivial Macro Library Example

sparql
define sql:signal-void-variables 1
base http://base/
prefix ex: http://example.com/sample/
prefix macro: http://example.com/macro/
create macro library http://example.com/numbers {
defmacro macro:PI() 3.1415926
defmacro macro:Six() (2 + 2 * 2) }
ask from where { graph { ?s ?o } . filter (!bound(?s)) }
;

It is similar to plain query with macro definitions, but the list of macro definitions is inside
create macro library http://example.com/numbers { … } group.

sparql alter quad storage virtrdf:DefaultQuadStorage { attach macro library http://example.com/numbers }

makes all macros of the library automatically available in all queries over default quad storage.

The inverse operation is

sparql alter quad storage virtrdf:DefaultQuadStorage { detach macro library http://example.com/sml1 }

In some cases, it is important to detach any library and to not signal errors if nothing has been attached before:

sparql alter quad storage virtrdf:DefaultQuadStorage { detach silent macro library }

These two variants of “detach macro library” does not destroy the library itself, only disables the automatic usage of the library in queries.

The library detached from one storage can stay attached to other storages or included explicitly.
However the statement

sparql drop macro library http://example.com/numbers
;

will both delete the library from metadata and detach the library from all quad storages before it is deleted.

When a library is changed or dropped, the SQL compiler marks for future recompilation all “related” cached SQL and SPARQL queries and stored procedures that use the macro library.
A query is “related” if it direcly mentions the library or if it deals with a storage that has the library attached.

The total number of queries to recompile can be large, resulting in significant loss of server performance for some period of time.

That is normal and it is no more inconvenient than compilation of Virtuoso/PL code of applications at the server startup.

Rule-and-Exception Conflict Resolution

Consider three triple template macro, one for triple patterns like { ?s ?o }, one for { ?o } and one for { ?s }
so first macro is a “common rule” whereas two others are for “exceptional cases”
Which one should be expanded instead of triple pattern ?

The SPARQL compiler does not check whether one template macro is an “exception” from other and it does not check declarations foк pariwise conflicts.

When the compiler checks whether a given triple pattern should be treated as a amcro invocation, it scans the list of macro definitions of the query (i.e. everything
that come from “define input:macro-lib …” and then defmacro), and if nothing is found then the macro library attached to the quad storage, if any.

First found match is used and the rest does not matter.

Thus, like majority of languages that use pattern matching, define your exceptions before your rules.
In the above example, the developer should resolve rule-and-exception conflict by placing macro for { ?s ?o } after two other and to chose the expansion of
by placing the desired “exceptional cases” in front of other.
(The developer may also wish to create a fourth triple template macro specifically for template and put it above all three others)

Passing Expressions and Group Patterns to Macro as Parameters

SPARQL-BI uses variables im many different ways:

  • A variable can be used as a field of a triple pattern.
  • A variable can be used as a field of a triple pattern but that is only a guess because the triple pattern may be an invocation of a triple template macro.
  • A variable is an argument of an expression, so its value is used there but can not be changed there.
  • A variable is a macro parameter and the parameter is used as a whole group pattern.
    Let ?x listed as parameter in the head of a macro, the macro body contains " ?x . " instead of triple pattern in group pattern and the macro call contains {

    ?o } as an argument;
    the macroexpansion will place the passed

    ?o triple pattern to the instantiation of the group pattern ?x belongs to.

  • A variable name is used as an alias, like name ?x in BIND … AS ?x, in SELECT … AS ?x, in VALUES (… ?x …), in options of transitive subquery like T_STEP(?x) etc.

When a variable is used as a macro parameter, developer may create combinations of expressions that would be syntax errors in plain SPARQL.

Consider a macro that contains a clause BIND (2+2) as ?x where ?x is a macro parameter and the macro invocation specifies that ?x is actually 5.

The result of a mechanical substitution, BIND (2+2) as 5, is absolutely meaningless.
SPARQL-BI compiler tries to make more meaningful substitutions when that’s possible, so in this case it will check whether FILTER ((2+2) = 5) is suitable instead.

Obviously, some misuses of macro parameters can not be fixed.

Say, if a macro parameter is used as a group pattern in one place and as a variable name in other then the macro can not be instantiated under any circumstances.

Property Variables

When a SPARQL query is not written from beginning to end for a single purpose but mostly composed from common-purpose macro definitions, it may happen that one and the same property of one subject is queried many times in many macro invocations.

The straightforward solution is to ignore the problem: redundand retrieval of same property might add extra work to the SQL compiler but it is relatively cheap for runtime because second read is almost always from hot cache.

However this may bring two sorts of errors if a subject has more than one value for a property.
First of all, there will be more solutions than needed. If a property

of subject ?s has two values, and , and the expansion of three macro calls will produce triple patterns
?s

?local_variable_from_macro_1 .
?s

?local_variable_from_macro_2 .
?s

?local_variable_from_macro_3 .
then the query will deal with 222 = 8 solutions instead of 2.

| ?local_variable_from_macro_1 | ?local_variable_from_macro_2 | ?local_variable_from_macro_3 |
| A | A | A |
| A | A | B |
| A | B | A |
| A | B | B |
| B | A | A |
| B | A | B |
| B | B | A |
| B | B | B |

Even worse, logical problems may occur when different macro in different parts of same expression will deal with different values of a property, not a consistent everywhere or everywhere but a mix.

The correct but inconvenient solution is to write a triple pattern that gets the property at the beginning of a SPARQL query and then pass the fetched property value as a parameter to every macro that needs it.

The disadvantages are numerous: one should write a triple pattern by hands, one should pass both subject and property value to every macro, instead of passing just a subject, one should beready to debug errors added by this complication.
Multiple properties will add the mess.

SPARQL-BI offers a special extension, named “property variables”, to pass only subject, not its properties, and still get a behavior of a single triple pattern for property.

If ?s is a variable (or macro parameter that will be used to pass variable) and P is property name, then an operator “+>” makes an expression ?s+>P that is refered to as “property variable”.

This expression tells the SPARQL compiler to create an invisible variable ?x and add
?s P ?x .

triple pattern to the group pattern where the expression is used (say, where the macro is called, if the expression is not buried into some group pattern inside the macro).
The value bound to ?x become the value of the expression.

If same ?s+>P expression is used for the second time in same group pattern then second triple pattern is NOT added and the previous ?x variable is used.

Property variables can be “chained”, from left to right. The expression ?s+>P1+>P2+>P3 will first create an invisible variable ?x for ?s+>P1 and add a triple pattern for it if needed, then ?x+>P2 will create invisible ?y, following same rules,

finally ?y+>P3 will make an invisible ?z for the desired sub-sub-property of ?s . Three property variables ?s+>P1, ?s+>P1+>P2 and ?s+>P1+>Q2 will “share” a triple pattern made for ?s+>P1 .

There is also ">" property variable operator to deal with proeprties that may have no known value. An expression ?s>P tells the SPARQL compiler to add

OPTIONAL { ?s P ?x }

instead of mandatory

?s P ?x

and this is the only difference.

Property variables can be used outside any macro as well as inside macro bodies.
They can be used inside SERVICE {…} clause even if the remote SPARQL web service endpoint in querstion does not support SPARQL-BI, because the SPARQL compiler will convert them to plain triple patterns anyway.

Debugging

New features of the language add new ways of making errors, so default Virtuoso web service endpoint, “/sparql/”, offers some debugging tools.
First of all, it provides a link to “Macros” page, with enumeration of all know macro libraries, individual macro declarations, and warnings about potential risks of misuse of parameters of macro.
Next, it has checkboxes to signal errors if some variables are never bound or if some variables are not logically connected to each other but have same name and thus looks like they are connected.

Finally, it has a checkbox to generate SPARQL compilation report (instead of executing the query).

The report is long and most of it is for OpenLink support team, but its beginning contains the details of the query as the compiler understands it.

/cc @hwilliams @TallTed @PvK @danielhm


#2

@kidehen @hwilliams @imikhailov @PvK @danielhm

I’ve made one major pass for better markdown, and fixed many English issues. Some markup may need further cleanup (where I wasn’t sure what was part of the content, and what was accidental markup), especially on the sections flagged as bnf (view source and search these out; they don’t show visibly in human-friendly rendering).

I have not written over the original, to hopefully make it easier for others to see where I may have misunderstood or otherwise erred.

I am certain there are some incorrect section depths below – because levels of headers were not (and are not) entirely clear to me. These are changed with the number of line-leading ###. Discourse doesn’t have automatic TOC generation, so it’s not easy to track nor review section descendance here. I’ve manually built the following to help in this review.

#    SPARQL-BI Operator `EXTRACT {...} FROM CONSTRUCT {...}...`
##   Rationale
##   `CONSTRUCT` operator as a data source for a group pattern
###  Multiple `CONSTRUCTs` for single `EXTRACT`
###  Querying the mix of raw and constructed data
#    `CONSTRUCT` macro
##   Security considerations
##   Rationale
##   Basic Syntax
###  Triple Template Macro
###  Template Macro Example 1
###  Template Macro Example 2
###  Group Pattern Macro
###  Expression Macro
###  Nested Macro Calls
#### Example
##   Macro Libraries
###  Trivial Macro Library Example
##   Rule-and-Exception Conflict Resolution
##   Passing Expressions and Group Patterns to Macro as Parameters
###  Property Variables
##   Debugging


SPARQL-BI Operator EXTRACT {...} FROM CONSTRUCT {...}...

Rationale

When data are gathered from heterogenous sources of various quality, running a SPARQL query may require some data enrichment to fetch or calculate missing items, canonicalize some parts of data, etc. Static data can be enriched once by running some amount of INSERT statements after loading raw data — static enrichment for static data. That’s good for ease of writing queries on that data and the queries will run quickly, however loading next version of data will require re-running same INSERT statements again. When the data are dynamic, the cost of everlasting enrichment can easily outweigh conveniences of the querying. SPARQL-BI offers a new operator suitable for ad-hoc enrichment of small portions of raw dynamic data during a run of a specific query.

CONSTRUCT operator as a data source for a group pattern

W3C SPARQL specification lets the query access numerous RDF graphs identified by IRIs, but does not provide a way to fill in ad-hoc graphs during the query run. SPARQL-BI lets a specific portion of query run on the result of CONSTRUCT statement. The syntax is –

 { EXTRACT { <group_pattern> } 
   FROM CONSTRUCT { <ctor_triples> } 
   WHERE { <group_pattern> } 
   <solution_modifiers> 
 }

First of all, WHERE {...}... clause forms solutions, like in any CONSTRUCT or SELECT query, using usual dataset of the query in a usual way. Then CONSTRUCT {...} forms a list of RDF triples using bindings from solutions and external parameters.

The triples are stored as a temporary RDF graph. EXTRACT {...} group pattern is applied to that RDF graph as if it were a plain default graph; the found solutions are solutions of the whole operator.

There are no restrictions on CONSTRUCT {...} and WHERE {...}... clauses; any grouping and other solution modifiers are allowed.

There is one restriction on EXTRACT {...} clause: the group pattern must not contain graph group patterns (i.e., GRAPH ...{...}clauses).

Triple patterns of EXTRACT {...} deal with temporary graph only. If EXTRACT clause should access some data outside the temporary graph, it should contain an explicit SELECT subquery; the subquery can use its own FROM and FROM NAMED clauses in a usual way.

Temporary graph is a graph of physical triples, but unlike usual triples stored in usual DB.DBA.RDF_QUAD table they reside in dedicated DB.DBA.RDF_QUAD_TMP table used solely by EXTRACT feature. The structure of the table is identical to DB.DBA.RDF_QUAD, same G, S, P, and O columns, etc., but indices differ due to different write/read patterns.

Multiple CONSTRUCTs for single EXTRACT

Data to be enriched are not necessarily uniform, so different parts may require different processing.

Moreover, a single source may require many different enrichment activities that will form different solutions and different grouping of results of that solutions. Formally speaking, any combination of queries can be written as a single big CONSTRUCT over a big UNION of subqueries of all sorts, but the resulting behemoth operator is hard to maintain.

To eliminate the problem, EXTRACT operator can contain a list of CONSTRUCT {...} WHERE {...}... statements delimited by UNION keywords. Statements are executed in an unspecified order, and the constructed triples are all placed into single temporary graph.

Querying the mix of raw and constructed data

Consider an almost-fine dataset such that almost all objects in question have all desired properties and only a small fraction needs an ad-hoc CONSTRUCT to get some of their properties calculated.

Duplicating numerous existing properties into a temporary graph seems to be a waste of resources, so some properties of some objects are only in raw data whereas the minor rest is stored temporarily.

Plain EXTRACT is not convenient for processing of the result, because every triple pattern of EXTRACT {...} clause will be applied to temporary graph or to raw data (when wrapped into a subquery with GRAPH...{...}), but not to union of them.

Every triple pattern of plain EXTRACT {...} should be turned into a union of a triple pattern and a subquery; that is very inconvenient.

The workaround is a special syntax in UNION of CONSTRUCTs: the list may contain keyword STORAGE.

EXTRACT {...} 
FROM CONSTRUCT {...}... 
UNION STORAGE

– will be compiled in such a way that the temporary graph will be included into the context set of graphs; i.e., if EXTRACT {...}... is placed inside graph group pattern then the temporary graph is added to the list of named graphs, as if it is in one of FROM NAMED ... clauses; otherwise the temporary graph is added to the list of named graphs, as if it is in one of FROM... clauses.

With this trick, triples created by enrichment and triples of raw data are interchangeable. However subqueries inside EXTRACT {...} behave the same way regardless of ...UNION STORAGE trick.

CONSTRUCT macro

A CONSTRUCT statement (or a UNION-delimited list of them) can be used as a body of a macro definition.

This definition can be stored as any other macro definition. The explicit macro invocation can be placed in EXTRACT operator like a plain CONSTRUCT, either alone or in UNION with other explicit macro invocations and CONSTRUCTs. However one can not use the invocation of macro as a standalone CONSTRUCT operator.

Security considerations

Unlike triple patterns over DB.DBA.RDF_QUAD, EXTRACT always deals with a single temporary graph and this graph is known in advance; also, every temporary graph is filled in only once, erased as a whole, and not edited in between.

A query with an EXTRACT operator cannot get access to temporary data of another EXTRACT, however the temporary data can “stuck” in the table if a query is aborted by weird disconnect of a client and they may be stored in backup images.

Due to this risk, do not use EXTRACT to materialize highly-confidential data such as temporarily decoded passwords or de-obfuscated personal data.

DB.DBA.RDF_QUAD_TMP is fully erased at server restart.

Rationale

The whole evolution of programming languages demonstrates a weird fact: the price of a single line of program code does not depend on the language in use. (More correctly, individual differences between programmers are way more important than the language in use). As a result, high-level languages eliminate minor details of the implementation from the handwritten code, and thus let the developer fit more application logic into the same number of lines and the same development costs.

In order to cut BI costs via SPARQL, queries against existing data should become shorter than equivalent SQL statements.

Two cases demonstrate good “compression ratio” right now: complicated unique SPARQL queries on any sources and most SPARQL over RDF Views.

However routine SPARQL queries over “native RDF” data and trivial SPARQL over RDF Views are no shorter than equivalent SQL over “reusable” definitions of SQL views. The write-once-use-everywhere nature of SQL views led the development team to separate two activities: (1) the infrequent enriching of the database schema with carefully designed views and (2) the everyday routine reuse of these views.

The SQL compiler replaces references to SQL views in a given SQL query with bodies of these views, that resembles macro-expansion in other languages. If that cut costs for SQL, adding macro to SPARQL-BI may cut costs as well.

When the query should use result of view as a whole, not just fetch it once row after row, the result can be temporarily stored.

In SQL, the query can form intermediate tables in a special place for temporary data. SPARQL-BI query can do the same, because Virtuoso translates SPARQL to SQL and then applies any nice tricks of an SQL engine.

However, a SPARQL query can contain much more joins, so SPARQL-BI offers an additional operator named EXTRACT.

Basic Syntax

SPARQL macro definitions resembles SQL views, they are implemented as term rewriting, not as a substitution of “macro call” lexems with “macro body” lexems right in the source text. As a result, there are four sorts of SPARQL macro declarations, for four different methods of macro invocation.

These four are –

  • triple template macro
  • group pattern macro
  • expression macro
  • construct macro

All four have a few common properties:

  • They are defined at the very end of query prolog, i.e., after all DEFINE, BASE, and PREFIX clauses, but before SELECT, DESCRIBE, INSERT, or the like.
  • They have distinct URIs as identifiers. It may be practical to double-check namespace declarations if macro names are written in namespace:local notation, because debugging of typos can be costly.
  • The order of macro definitions is important: the body part of a macro cannot refer to the macro itself or to any macro defined below. As a consequence, recursive definitions are prohibited even at syntax level.
_> [1]*	Query		 ::=  Prolog ( QueryBody | SparulAction* | ( QmStmt ('.' QmStmt)* '.'? ) )_
_> [2]*	Prolog		 ::=  Define* BaseDecl? PrefixDecl* Defmacro*_
_> [Virt]	Defmacro	 ::=  'DEFMACRO' Q_IRI_REF (_
_> 			DefmacroArgs ( 'LOCAL' DefmacroArgs )? ( GraphGroupPattern | Expn ) |_
_> 			DefmacroPattern ( 'LOCAL' DefmacroArgs )? GraphGroupPattern )_
_> [Virt]	DefmacroPattern	 ::=  (( 'GRAPH' PatternItemGorS ) | ( 'DEFAULT' 'GRAPH' ))?_
_> 			'{' PatternItemGorS PatternItemP PatternItemO '}'_
_> [Virt]	DefmacroArgs	 ::=  '(' ((VAR1 | VAR2)* | ((VAR1 | VAR2) ( ',' (VAR1 | VAR2))+)) ')'_

A macro head can contain list of names of local variables, preceded by a keyword LOCAL. If this clause is present, еvery variable mentioned in the body of a macro should be listed in the macro head, i.e., in the list of arguments, pattern, or list of local variables.

Triple Template Macro

The triple template macro replaces any triple pattern that matches the specified template. Depending type of the template it can match –

  • only patterns that refer to named graphs (to any named graphs or to a single specified named graph). These macros work for matching triple patterns located inside GRAPH {...} graph group patterns.
  • only patterns that refer to default graph. These macro work for matching triple patterns located outside any GRAPH {...} graph group patterns.
  • any triple patterns regardless GRAPH {...}.

The template contains three items for subject, predicate and object (plus a fourth for graph of a named graph template). Each item is a constant of some sort or a variable. A given triple pattern of a query matches the given template if and only if for every constant item of the template, there is equal constant in the triple pattern.

> _[Virt]	DefmacroPattern	 ::=  (( 'GRAPH' PatternItemGorS ) | ( 'DEFAULT' 'GRAPH' ))?_
> _			'{' PatternItemGorS PatternItemP PatternItemO '}'_
> _[Virt]	PatternItemGorS	 ::=  VAR1 | VAR2 | IRIref_
> _[Virt]	PatternItemP	 ::=  VAR1 | VAR2 | 'a' | IRIref_
> _[Virt]	PatternItemO	 ::=  VAR1 | VAR2 | IRIref_
> _			| RDFLiteral | ( '-' | '+' )? NumericLiteral | BooleanLiteral | NIL_

The body of the template macro is a graph group pattern. When the macro is expanded, all members of this pattern are inserted into the graph group pattern that contained the macro call (“context group pattern”), and the order of members is preserved.

All filters of this pattern are added to the end of the list of filters of the context group pattern.

Template Macro Example 1

Consider this query –

DEFMACRO <sample_plain> { ?me <fellowPage> ?page } LOCAL ( ?author )
  { ?me foaf:knows ?author .
      { ?author foaf:homePage ?page } 
      UNION
      { ?author foaf:workplaceHomePage ?page } }

DEFMACRO <sample_named_graph> GRAPH ?g { ?s englishTopicOfGraph ?t }
  { ?s foaf:topic ?t .
    FILTER (?s in (?g, IRI(bif:concat (STR(?g), '#this')))) .
    FILTER (LANGMATCHES (?t, 'en')) }

SELECT * WHERE { 
    <http://example.com/#me> <fellowPage> ?pg .
    GRAPH ?pg { ?s <englishTopicofGraph> ?topic }

The template { ?me <fellowpage> ?page } matches the first triple pattern of the WHERE clause. The template GRAPH ?g { ?s foaf:topic ?topic } matches the second one. As a result, the query will be macro-expanded into

SELECT * WHERE { 
    <http://example.com/#me> foaf:knows ?author-1 .
      { ?author1 foaf:homePage ?pg } 
      UNION
      { ?author-1 foaf:workplaceHomePage ?pg }
    GRAPH ?pg {
        ?s foaf:topic ?topic .
        FILTER (?s in (?pg, IRI(bif:concat (STR(?pg), '#this')))) .
        FILTER (LANGMATCHES (?topic, 'en')) } }

The LOCAL clause in the DEFMACRO <sample_plain> enumerates local variables of the macro, i.e., variables of the template body that are not template arguments. This provides some protection from typos in variable names.

The LOCAL clause is optional; when omitted, then no check for variable names is made, and the compiler builds its list of local variables by observing the template body.

Writing LOCAL is boring, but typos in macro can result in debugging nightmare.
In SQL, typo in column name is detected at compile time; in SPARQL, there are no database schema data and no fixed names.

Whenever possible, do not omit LOCAL (...).

When a “clone” of the macro body replaces the occurrence of macro call in query, every local variable gets a new unique name.

With this renaming, local variables of different macro expansions does not interfere. If macro body uses blank node notation for some items of triple patterns, these blank nodes will be handled like local variables, but there is no need to list them in LOCAL (...) list.

Template Macro Example 2

Note the difference between –

DEFMACRO <named_becomes_named> GRAPH ?g { ?s <p> ?o } { ?s <p2> ?o }

and

DEFMACRO <named_graph> { ?s <p> ?o } LOCAL (?g) { GRAPH ?g { ?s <p2> ?o } }

The first template means that the macro can be expanded only if triple pattern deals with named graph, not a default.

The second template means that the macro can be expanded for both triple patterns for named and for default graph, but the resulting code will deal with named graphs: each macro call will produce a pattern on some named graph and graphs mentioned by different patterns may be different, because local ?g variable will become unique ?g1, ?g2, ?g3, and so on.

Group Pattern Macro

The triple template macro is a convenient shorthand and can be used in order to quickly tweak existing queries (e.g., add more FILTERs to remove garbage from data).
However they are not usable if arguments are too numerous to fit into triple template or if the macro should get group graph pattern as an argument.

> _[Virt]	MacroCall	 ::=  'MACRO' IRIref MacroArgList?_
> _[Virt]	MacroArgList	 ::=  '(' ExpnOrGgps? ')'_
> _[Virt]	ExpnOrGgps	 ::=  ExpnOrGgp ( ',' ExpnOrGgp )*_
> _[Virt]	ExpnOrGgp	 ::=  Expn | GroupGraphPattern_

The call of group pattern macro can be placed in any clause where a plain triple can be placed.

Arguments of group pattern macro are usually scalar expressions. However it is possible to pass a group graph pattern as an argument. This pattern is used as if it is a fragment of text of the macro.

To use the argument ?x of this sort, place MACRO ?x into a group pattern where the fragment should appear, in a place where a triple pattern can appear.

All variables used inside the argument pattern should be listed in the header of the group pattern macro.

Expression Macro

The body of an expression macro is a scalar expression and its call can reside in any place where expressions are allowed.

The syntax is the same as for plain function call, but a macro call can be prepended by an additional (and optional) MACRO keyword.

The MACRO keyword also permits passing group graph patterns as arguments of macro call. Passing patterns as arguments is more frequent for group pattern macro and an exotic but still possible case for expression macro: arguments of this sort can be used inside scalar subqueries nested into scalar expression.

The MACRO keyword brings additional safety when used inside a macro body in a long list of macro definitions: the keyword differentiates between a function call and an attempt of calling some macro defined below, not above.

Nested Macro Calls

The argument of a macro call can in turn contain macro calls. The body of a macro can also contain macro calls. These nested macro calls are first of all placed into the result of the macro-expansion and then macro-expanded there.

Example

PREFIX macro: <http://example.com/macro/>
defmacro macro:DBpedia-ResSetName () "resource"
defmacro macro:DBpedia-IRI (?set_name, ?item_name) iri (bif:concat ("http://dbpedia.org/", ?set_name, "/", ?item_name))
defmacro macro:DBpedia-type (?subset, ?name) local (?t)
  ((SELECT ?t 
    WHERE
      { 
        SERVICE <http://dbpedia.org/sparql> 
          (define lang:dialect 65535) 
          { GRAPH <http://dbpedia.org> 
            { `<macro:DBpedia-IRI> (?subset, ?name)` a ?t  } 
          } 
       }
  ))

With these three definitions,

macro:DBpedia-type (macro:DBpedia-ResSetName(), "Novosibirsk"))

will become an equivalent of scalar subquery

  ((SELECT ?t 
    WHERE
      { 
        SERVICE <http://dbpedia.org/sparql> 
          (define lang:dialect 65535) 
          { GRAPH <http://dbpedia.org> 
            { `<macro:DBpedia-IRI> (?subset, ?name)` a ?t  } 
          } 
       }
  ))

i.e., selection of some rdf:type of <http://dbpedia.org/resource/Novosibirsk> in DBpedia.

Macro Libraries

> _[Virt] CreateMacroLib   ::=  'CREATE' 'MACRO' 'LIBRARY' IRIref '{' Defmacro* '}'_
> _[Virt] QmAttachMacroLib         ::=  'ATTACH' 'MACRO' 'LIBRARY' QmIRIrefConst_
> _[Virt] QmDetachMacroLib         ::=  'DETACH' 'SILENT'? 'MACRO' 'LIBRARY' QmIRIrefConst?_
> _[Virt] DropMacroLib     ::=  'DROP' 'SILENT'? 'MACRO' 'LIBRARY' PrecodeExpn_

Macro definitions can be grouped into named “libraries”, stored in graph of system metadata, loaded in memory with all other RDF storage metadata, and used in SPARQL queries.

To make a library, one should write a SPARQL query with some DEFMACRO statements grouped at its beginning.

This query should signal an error or return a nonempty result set (e.g., some diagnostic strings) if the library can not be used on the given database. Empty result means “no troubles”.

A macro library can be referred to in queries in two ways.

First, a quad storage can have an associated macro library, so every SPARQL query that inputs data from that storage can call macro from the library.

This feature can be disabled by placing –

define input:disable-storage-macro-lib <any value>

– at the beginning of the query; this feature is especially useful when a next-version of an existing macro library is being developed and it should not interfere with one already in use.

Another approach is to specify the usage explicitly, by placing one or more –

define input:macro-lib <macro-library-iri>

– at the beginning of the query.

Library can refer to other libraries, combining big collections from smaller parts. It’s an error if a macro defined in one part is redefined in another, but not an error if one part is included more than once in a tree of nested parts.

However a macro associated with storage can be redefined in some included library (but not right in query).

Trivial Macro Library Example

DEFINE sql:signal-void-variables 1
BASE <http://base/>
PREFIX ex: <http://example.com/sample/>
PREFIX macro: <http://example.com/macro/>
CREATE MACRO LIBRARY <http://example.com/numbers> {
    defmacro macro:PI() 3.1415926
    defmacro macro:Six() (2 + 2 * 2) }
ASK FROM <nowhere> 
WHERE { GRAPH <nosuch> { ?s <nosuch> ?o } . 
FILTER (!bound(?s)) }

It is similar to plain query with macro definitions, but the list of macro definitions is inside CREATE MACRO LIBRARY <http://example.com/numbers> { ... } group.

This makes all macros of the library automatically available in all queries over default quad storage –

ALTER QUAD STORAGE virtrdf:DefaultQuadStorage
  { ATTACH MACRO LIBRARY <http://example.com/numbers> }

The inverse operation is –

ALTER QUAD STORAGE virtrdf:DefaultQuadStorage
 { DETACH MACRO LIBRARY <http://example.com/sml1> }

In some cases, it is important to detach any library and to not signal errors if nothing has been attached before:

ALTER QUAD STORAGE virtrdf:DefaultQuadStorage
 { DETACH SILENT MACRO LIBRARY }

These two variants of DETACH MACRO LIBRARY do not destroy the library itself; they only disables the automatic usage of the library in queries.

The library detached from one storage can stay attached to other storages or included explicitly. However this statement will both delete the library from metadata and detach the library from all quad storages before it is deleted –

DROP MACRO LIBRARY <http://example.com/numbers>

When a library is changed or dropped, the SQL compiler marks for future recompilation all “related” cached SQL and SPARQL queries and stored procedures that use the macro library. A query is “related” if it directly mentions the library or if it deals with a storage that has the library attached.

The total number of queries to recompile can be large, resulting in significant loss of server performance for some period of time.

That is normal and it is no more inconvenient than compilation of Virtuoso/PL code of applications at the server startup.

Rule-and-Exception Conflict Resolution

Consider three triple template macros – one for triple patterns like { ?s <magicProperty> ?o }, one for { <SpecialSubject> <magicProperty> ?o }, and one for { ?s <magicProperty> <SpecialObject> } – so the first macro is a “common rule”, whereas the other two are for “exceptional cases”. Which one should be expanded instead of triple pattern <SpecialSubject> <magicProperty> <SpecialObject>?

The SPARQL compiler does not check whether one template macro is an “exception” from another, and it does not check declarations for pairwise conflicts.

When the compiler checks whether a given triple pattern should be treated as a macro invocation, it scans the list of macro definitions of the query (i.e. everything that come from define input:macro-lib ... and then defmacro), and if nothing is found then the macro library attached to the quad storage, if any.

First found match is used and the rest do not matter.

Thus, like majority of languages that use pattern matching, define your exceptions before your rules. In the above example, the developer should resolve rule-and-exception conflicts by placing macro for { ?s <magicProperty> ?o } after the two others and choose the expansion of <SpecialSubject> <magicProperty> <SpecialObject> by placing the desired “exceptional cases” in front of the other. (The developer may also wish to create a fourth triple template macro specifically for <SpecialSubject> <magicProperty> <SpecialObject> template and put it above all three others.)

Passing Expressions and Group Patterns to Macro as Parameters

SPARQL-BI uses variables im many different ways:

  • A variable can be used as a field of a triple pattern.
  • A variable can be used as a field of a triple pattern but that is only a guess because the triple pattern may be an invocation of a triple template macro.
  • A variable is an argument of an expression, so its value is used there but cannot be changed there.
  • A variable is a macro parameter and the parameter is used as a whole group pattern. Let ?x listed as parameter in the head of a macro, the macro body contains ?x . instead of triple pattern in group pattern and the macro call contains { <s> <p> ?o } as an argument; the macro-expansion will place the passed <s> <p> ?o triple pattern to the instantiation of the group pattern ?x belongs to.
  • A variable name is used as an alias, like name ?x in BIND ... AS ?x, in SELECT ... AS ?x, in VALUES (... ?x ...), in options of transitive subquery like T_STEP(?x), etc.

When a variable is used as a macro parameter, developer may create combinations of expressions that would be syntax errors in plain SPARQL.

Consider a macro that contains a clause BIND (2+2) AS ?x where ?x is a macro parameter and the macro invocation specifies that ?x is actually 5.

The result of a mechanical substitution, BIND (2+2) AS 5, is absolutely meaningless. SPARQL-BI compiler tries to make more meaningful substitutions when that’s possible, so in this case it will check whether FILTER ( (2+2) = 5 ) is suitable instead.

Obviously, some misuses of macro parameters can not be fixed.

Say, if a macro parameter is used as a group pattern in one place and as a variable name in another, then the macro can not be instantiated under any circumstances.

Property Variables

When a SPARQL query is not written from beginning to end for a single purpose but mostly composed from common-purpose macro definitions, it may happen that one and the same property of one subject is queried many times in many macro invocations.

The straightforward solution is to ignore the problem: redundant retrieval of same property might add extra work to the SQL compiler but it is relatively cheap for runtime because second read is almost always from hot cache.

However this may bring two sorts of errors if a subject has more than one value for a property. First of all, there will be more solutions than needed. If a property <P> of subject ?s has two values, <A> and <B>, and the expansion of three macro calls will produce triple patterns –

?s <P> ?local_variable_from_macro_1 .
?s <P> ?local_variable_from_macro_2 .
?s <P> ?local_variable_from_macro_3 .

– then the query will deal with 2*2*2 = 8 solutions instead of 2.

?local_variable_from_macro_1 ?local_variable_from_macro_2 ?local_variable_from_macro_3
A A A
A A B
A B A
A B B
B A A
B A B
B B A
B B B

Even worse, logical problems may occur when different macro in different parts of same expression will deal with different values of a property; not a consistent <A> everywhere or <B> everywhere, but a mix.

The correct but inconvenient solution is to write a triple pattern that gets the property at the beginning of a SPARQL query and then passes the fetched property value as a parameter to every macro that needs it.

The disadvantages are numerous: one should write a triple pattern by hand; one should pass both subject and property value to every macro, instead of passing just a subject; one should be ready to debug errors added by this complication. Multiple properties will add the mess.

SPARQL-BI offers a special extension, named “property variables”, to pass only subject, not its properties, and still get a behavior of a single triple pattern for property.

If ?s is a variable (or macro parameter that will be used to pass variable) and P is property name, then an operator +> makes an expression ?s+>P that is referred to as a “property variable”.

This expression tells the SPARQL compiler to create an invisible variable ?x and add triple pattern ?s P ?x . to the group pattern where the expression is used (say, where the macro is called, if the expression is not buried into some group pattern inside the macro). The value bound to ?x become the value of the expression.

If same ?s+>P expression is used for a second time in the same group pattern, then the second triple pattern is NOT added, and the previous ?x variable is used.

Property variables can be “chained”, from left to right. The expression ?s+>P1+>P2+>P3 will first create an invisible variable ?x for ?s+>P1 and add a triple pattern for it if needed; then ?x+>P2 will create invisible ?y, following same rules; finally ?y+>P3 will make an invisible ?z for the desired sub-sub-property of ?s. Three property variables ?s+>P1, ?s+>P1+>P2, and ?s+>P1+>Q2 will “share” a triple pattern made for ?s+>P1.

There is also *> property variable operator to deal with properties that may have no known value. An expression ?s*>P tells the SPARQL compiler to add OPTIONAL { ?s P ?x } instead of mandatory ?s P ?x. This is the only difference.

Property variables can be used outside any macro as well as inside macro bodies. They can be used inside SERVICE {...} clause even if the remote SPARQL web service endpoint in question does not support SPARQL-BI, because the Virtuoso SPARQL compiler will convert them to plain triple patterns anyway.

Debugging

New features of the language add new ways of making errors, so the default Virtuoso web service endpoint, /sparql/, offers some debugging tools.

First of all, it provides a link to the Macros page, with enumeration of all known macro libraries, individual macro declarations, and warnings about potential risks of misuse of parameters of macro.

Next, it has checkboxes to signal errors if some variables are never bound, or if some variables are not logically connected to each other but have the same name and thus looks like they are connected.

Finally, it has a checkbox to generate SPARQL compilation report (instead of executing the query). The report is long and most of it is for the OpenLink support team, but its beginning contains the details of the query as the compiler understands it.