Community Forum Note: SPARQL-BI and Transitivity

Transitivity in SPARQL

Virtuoso SPARQL allows access to Virtuoso’s SQL transitivity extension (read the SQL Transitivity Note for details about this feature).

The syntax used in SPARQL is slightly different from the syntax used in SQL, although the option names and meanings are the same.

In SPARQL, the transitive options occur after a subquery enclosed in braces.

The query below produces all the IRI’s that are the same as <http://dbpedia.org/resource/New_York>:

SPARQL
SELECT ?syn
WHERE
  {
    {
      SELECT ?x
             ?syn
      WHERE
        {
          { ?x owl:sameAs ?syn }
          UNION
          { ?syn owl:sameAs ?x }
        }
    }
    OPTION ( TRANSITIVE ,
             T_IN ( ?x ) ,
             T_OUT ( ?syn) ,
             T_DISTINCT ,
             T_MIN ( 0)
           )
    FILTER ( ?x = <http://dbpedia.org/resource/New_York> ) .
  }

In this case, we provide a binding for ?x in the FILTER outside of the transitive subquery. The subquery therefore is made to run from in to out. The same effect would be accomplished if we bound ?syn and SELECT ?x. The designations of in and out are arbitrary, and for transitive steps that can be evaluated equally well in both directions, this makes no difference.

The transitive subquery in the query above is shown below:

SELECT ?syn
WHERE
  {
    { SELECT ?x
             ?syn
      WHERE
       {
         { ?x owl:sameAs ?syn }
         UNION
         { ?syn owl:sameAs ?x}
       }
    }
    OPTION ( TRANSITIVE ,
             T_IN ( ?x ) ,
             T_OUT ( ?syn ) ,
             T_DISTINCT ,
             T_MIN ( 0 ) )
    }
  }

If the OPTION were left out, it would just look for one step of owl:sameAs. Making it TRANSITIVE will apply the subquery to all bindings produced until all are visited at least once (the T_DISTINCT modifier).

If the TRANSITIVE step consists of a single triple pattern, there is a shorthand:

<alice> foaf:knows ?friend
OPTION ( TRANSITIVE
         T_MIN (1)
       )

This will bind ?friend to all directly and indirectly found foaf:known individuals. If T_MIN had been 0, <alice> would have also been in the generated bindings.

The syntax is:

OPTION (TRANSITIVE transitivity_option [,...] )

  transitivity_option ::= T_IN ( <variable_list> )
                        | T_OUT ( <variable_list> )
                        | T_DISTINCT
                        | T_SHORTEST_ONLY
                        | T_NO_CYCLES
                        | T_CYCLES_ONLY
                        | T_MIN ( intnum )
                        | T_MAX ( intnum )
                        | T_END_FLAG ( <variable> )
                        | T_STEP ( <variable_or_step> )
                        | T_DIRECTION intnum

  variable_list ::= <variable> [,...]

  variable_or_step ::= <variable> | 'path_id' | 'step_no'

Where column numbers are used in SQL, variable names are used in SPARQL. Otherwise all the options have the same meaning.

Some examples of the use of transitivity follow.

Collection of Transitivity Option Demo Queries for SPARQL

Example 1: owl:sameAs for <http://dbpedia.org/resource/New_York>

This query finds all graphs that contain owl:sameAs for <http://dbpedia.org/resource/New_York>.

Here we select all paths that start with the initial URI and pass through one or more owl:sameAs statements. Each step produces a result of the transitive subquery. The graph where the owl:sameAs triple was found is returned and used as the grouping column. In this way, we see how many times each graph is used. Note that graphs are counted many times since the graphs containing immediate owl:sameAs statements are counted for paths of length 1, then again as steps on paths that reach to their aliases and so on.

SELECT                ?g
                      ?x
       ( COUNT (*) AS ?count )
WHERE
  {
    {
      SELECT ?x
             ?alias
             ?g
      WHERE
        {
          {
            GRAPH ?g
              { ?x owl:sameAs ?alias }
          }
          UNION
          {
            GRAPH ?g
              {?alias owl:sameAs ?x}
          }
        }
    }
    OPTION ( TRANSITIVE,
             T_IN ( ?x ),
             T_OUT ( ?alias ),
             T_DISTINCT,
             T_MIN ( 1 )
           ) .
    FILTER ( ?x = <http://dbpedia.org/resource/New_York> ) .
  }

Example 2: people known by Tim Berners-Lee

This query takes all the people known by Tim Berners-Lee, to a depth between 1 and 4 applications of the subquery. It then sorts them by the distance and the descending count of connections of each found connection. This is equivalent to the default connections list shown by LinkedIn.

SPARQL
SELECT ?o
       ?dist
       ( ( SELECT ( COUNT (*) AS ?cnt )
           WHERE { ?o foaf:knows ?xx }
       ) )
WHERE
  {
    {
      SELECT ?s
             ?o
      WHERE
        {
          ?s foaf:knows ?o
        }
    } 
    OPTION ( TRANSITIVE ,
             T_DISTINCT ,
             T_IN ( ?s ) ,
             T_OUT ( ?o ) ,
             T_MIN ( 1 ) ,
             T_MAX ( 4 ) ,
             T_STEP ( 'step_no' ) AS ?dist ) .
    FILTER ( ?s = <http://www.w3.org/People/Berners-Lee/card#i> )
  }
ORDER BY ?dist DESC 3
LIMIT 50

Example 3: people known by Tim Berners-Lee (modified)

The following query takes all the people known by <http://www.w3.org/People/Berners-Lee/card#i>, to a depth between 2 and 4 applications of the subquery. It then sorts them by the distance and the descending count of connections of each found connection. This is equivalent to the default connections list shown by LinkedIn.

SPARQL
SELECT ?o 
       ?dist 
       ( ( SELECT ( COUNT (*) as ?cnt ) 
           WHERE { ?o foaf:knows ?xx }
       ) )
WHERE
  {
    {
      SELECT ?s ?o
      WHERE
        {
          ?s foaf:knows ?o
        }
    } 
    OPTION ( TRANSITIVE ,
             T_DISTINCT ,
             T_IN ( ?s ) ,
             T_OUT ( ?o ) ,
             T_MIN ( 2 ) ,
             T_MAX ( 4 ) ,
             T_STEP ( 'step_no' ) AS ?dist) .
    FILTER ( ?s = <http://www.w3.org/People/Berners-Lee/card#i> )
  }
ORDER BY ?dist DESC 3
LIMIT 50

Example 4: How do Tim Berners-Lee and Miguel Paraz Know Each Other?

The next query finds how two people know each other and what graphs are involved in the connection.

This query binds both the T_IN and T_OUT variables. The ?g is left as a free variable. Also, specifying ?s and the system defined constants step_no and path_id as with t_step, we get for each transitive step a row of results with the intermediate binding of ?s, the count of steps from the initial ?s and a distinct identifier for the individual path, since there can be many distinct paths that link the ?s and ?o specified in the filter.

SPARQL
SELECT ?link 
       ?g 
       ?step 
       ?path
WHERE
  {
    {
      SELECT ?s 
             ?o 
             ?g
      WHERE
        {
          GRAPH ?g
            { ?s foaf:knows ?o }
        }
    } 
    OPTION ( TRANSITIVE ,
             T_DISTINCT ,
             T_IN ( ?s ) ,
             T_OUT ( ?o ) ,
             T_NO_CYCLES ,
             T_SHORTEST_ONLY ,
             T_STEP ( ?s ) AS ?link ,
             T_STEP ( 'path_id' ) AS ?path ,
             T_STEP ( 'step_no' ) AS ?step ,
             T_DIRECTION 3) .
    FILTER ( ?s = <http://www.w3.org/People/Berners-Lee/card#i>
          && ?o = <http://www.advogato.org/person/mparaz/foaf.rdf#me> 
           )
  }
LIMIT 20

Related