Virtuoso Replication of Public Only RDF Graphs

What

Virtuoso allows the replication of public only RDF graphs from a MASTER publisher node to SLAVE subscriber nodes, using the new DB.DBA.RDF_REPL_GRAPH_INS ('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world'); option.

Why

In situations where RDF Graph created on the MASTER subscriber node that are not to be replicated to the SLAVE subscriber, Virtuoso provides a new http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world option, whereby any RDF graphs in the Virtuoso special http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs graph group will not be replicated to SLAVE subscriber nodes.

How

The option to enable replication of the public only graphs can achieved with the http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world parameter passed to the DB.DBA.RDF_REPL_GRAPH_INS() function on the MASTER publisher with the command:

   DB.DBA.RDF_REPL_GRAPH_INS ('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world');

Then any RDF graphs added to the special http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs graph group with the DB.DBA.RDF_GRAPH_GROUP_INS() function will not be replicated to SLAVE subscriber nodes, with the command:

  DB.DBA.RDF_GRAPH_GROUP_INS ('http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs', 'private_graph_name');

The following shows how SPARQL update queries and RDF Bulk Load operations performed on a private graph on the MASTER publisher are not replicated to the SLAVE subscriber(s):

Setup Graph Replication Publisher and Subscriber

On Publisher

  • Stop any active RDF Graph replication publications on the MASTER publisher node:
SQL> DB.DBA.RDF_REPL_STOP();

Done. -- 8 msec.
SQL>
  • Start new RDF Graph replication publication of public (world) only graphs:
SQL> rdf_repl_start();

Done. -- 31 msec.
SQL> DB.DBA.RDF_REPL_GRAPH_INS ('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world');

Done. -- 6 msec.
SQL> repl_stat();
server   account  level       stat
VARCHAR  VARCHAR  INTEGER     INTEGER
_______________________________________________________________________________

MASTER   MASTER   0           OFF
MASTER   __rdf_repl  2           OFF

2 Rows. -- 1 msec.
SQL>

On Subscriber

  • On the SLAVE subscriber node(s) subscribe to the RDF Graph replication publication by the MASTER publisher node:
SQL> repl_server ('MASTER', 'MASTER_DSN');  

Done. -- 11 msec.
SQL> repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba');

Done. -- 81 msec.
SQL> repl_sync_all ();

Done. -- 1 msec.
SQL> DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1);

Done. -- 2 msec.
SQL>

Test SPARUL INSERTS to Private and Public Graphs

On Publisher

  • Add graph http://private_graph to the Virtuoso private graph group:
SQL> DB.DBA.RDF_GRAPH_GROUP_INS ('http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs', 'http://private_graph');

Done. -- 4 msec.
SQL> 
  • Insert RDF Triple into http://private_graph and http://public_graph test graphs:
SPARQL INSERT INTO GRAPH <http://private_graph> { <1> <2> <3> };

Done. -- 2 msec.
SQL> SPARQL SELECT * FROM  <http://private_graph>  WHERE { ?s ?p ?o };
s                                                                                 p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

1                                                                                 2                                                                                 3

1 Rows. -- 1 msec.
SQL>
SQL> SPARQL INSERT INTO GRAPH <http://public_graph> { <1> <2> <3> };

Done. -- 3 msec.
SQL>
SQL> SPARQL SELECT * FROM  <http://public_graph>  WHERE { ?s ?p ?o };
s                                                                                 p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

1                                                                                 2                                                                                 3

1 Rows. -- 1 msec.
SQL>

On Subscriber

  • Check what graphs have been replicated to the subscriber:
SQL> SPARQL SELECT * FROM  <http://public_graph>  WHERE { ?s ?p ?o };
s                                                                                 p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

1                                                                                 2                                                                                 3

1 Rows. -- 1 msec.
SQL> SPARQL SELECT count(*) FROM  <http://private_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_______________________________________________________________________________

0

1 Rows. -- 8 msec.
SQL>

As can be seen the data in the private_graph is not replicated to the subscriber only the data in the public_graph.

Test RDF Bulk Load into private and public graphs

On Publisher:

  • Setup Virtuoso RDF Bulk Load of sample dataset into the private_graph on the publisher node:
SQL> ld_dir ('.', 'Dataset.ttl', 'http://private_graph');

Done. -- 3 msec.
SQL> select * from load_list;
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_

./Dataset.ttl                                                      http://private_graph                                                              0           NULL                 NULL                 NULL        NULL        NULL

1 Rows. -- 0 msec.
SQL> rdf_loader_run(log_enable=>3);

Done. -- 62 msec.
SQL>
  • Setup Virtuoso RDF Bulk Load of sample dataset into the public_graph on the publisher node:
SQL> delete from load_list;

Done. -- 0 msec.
SQL> ld_dir ('.', 'Dataset.ttl', 'http://public_graph');

Done. -- 1 msec.
SQL> select * from load_list;
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_

./Dataset.ttl                                                      http://public_graph                                                               0           NULL                 NULL                 NULL        NULL        NULL

1 Rows. -- 0 msec.
SQL> rdf_loader_run(log_enable=>3);

Done. -- 101 msec.
SQL>
  • Check the graph counts of the private and public graphs on the publisher:
SQL> SPARQL SELECT count(*) FROM  <http://private_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_

1754

1 Rows. -- 1 msec.
SQL> SPARQL SELECT count(*) FROM  <http://public_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_

1754

1 Rows. -- 1 msec.
SQL>

On Subscriber:

==============

  • Check on the subscriber node what data has been replicated from the subscriber node:
SQL> SPARQL SELECT count(*) FROM  <http://private_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_

0

1 Rows. -- 16 msec.
SQL> SPARQL SELECT count(*) FROM  <http://public_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_

1754

1 Rows. -- 1 msec.
SQL>  

As can be seen the Bulk loaded data on the MASTER publisher node into the private_graph is not replicated to the SLAVE subscriber node, whereas that of the public_graph is replicated to the SLAVE subscriber node.

Notes

  • Protection of private graphs on the publisher node, only start from the point RDF graph replication is enabled.
  • When a graph is added to the private graph group on the publisher node it will be automatically removed from the subscriber node, and vice-versa ie if a graph is removed from the private graph group on the publisher node (is now public) it will be automatically replicated to the subscriber node.

Related