Virtuoso Replication of Public Only RDF Graphs

OpenLink · October 19, 2024, 4:15pm

What

Virtuoso allows the replication of public only RDF graphs from a MASTER publisher node to SLAVE subscriber nodes, using the new DB.DBA.RDF_REPL_GRAPH_INS ('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world'); option.

Why

In situations where RDF Graph created on the MASTER subscriber node that are not to be replicated to the SLAVE subscriber, Virtuoso provides a new http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world option, whereby any RDF graphs in the Virtuoso special http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs graph group will not be replicated to SLAVE subscriber nodes.

How

The option to enable replication of the public only graphs can achieved with the http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world parameter passed to the DB.DBA.RDF_REPL_GRAPH_INS() function on the MASTER publisher with the command:

   DB.DBA.RDF_REPL_GRAPH_INS ('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world');

Then any RDF graphs added to the special http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs graph group with the DB.DBA.RDF_GRAPH_GROUP_INS() function will not be replicated to SLAVE subscriber nodes, with the command:

  DB.DBA.RDF_GRAPH_GROUP_INS ('http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs', 'private_graph_name');

The following shows how SPARQL update queries and RDF Bulk Load operations performed on a private graph on the MASTER publisher are not replicated to the SLAVE subscriber(s):

Setup Graph Replication Publisher and Subscriber

On Publisher

Stop any active RDF Graph replication publications on the MASTER publisher node:

SQL> DB.DBA.RDF_REPL_STOP();

Done. -- 8 msec.
SQL>

Start new RDF Graph replication publication of public (world) only graphs:

SQL> rdf_repl_start();

Done. -- 31 msec.
SQL> DB.DBA.RDF_REPL_GRAPH_INS ('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world');

Done. -- 6 msec.
SQL> repl_stat();
server   account  level       stat
VARCHAR  VARCHAR  INTEGER     INTEGER
_______________________________________________________________________________

MASTER   MASTER   0           OFF
MASTER   __rdf_repl  2           OFF

2 Rows. -- 1 msec.
SQL>

On Subscriber

On the SLAVE subscriber node(s) subscribe to the RDF Graph replication publication by the MASTER publisher node:

SQL> repl_server ('MASTER', 'MASTER_DSN');  

Done. -- 11 msec.
SQL> repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba');

Done. -- 81 msec.
SQL> repl_sync_all ();

Done. -- 1 msec.
SQL> DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1);

Done. -- 2 msec.
SQL>

Test SPARUL INSERTS to Private and Public Graphs

On Publisher

Add graph http://private_graph to the Virtuoso private graph group:

SQL> DB.DBA.RDF_GRAPH_GROUP_INS ('http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs', 'http://private_graph');

Done. -- 4 msec.
SQL>

Insert RDF Triple into http://private_graph and http://public_graph test graphs:

SPARQL INSERT INTO GRAPH <http://private_graph> { <1> <2> <3> };

Done. -- 2 msec.
SQL> SPARQL SELECT * FROM  <http://private_graph>  WHERE { ?s ?p ?o };
s                                                                                 p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

1                                                                                 2                                                                                 3

1 Rows. -- 1 msec.
SQL>
SQL> SPARQL INSERT INTO GRAPH <http://public_graph> { <1> <2> <3> };

Done. -- 3 msec.
SQL>
SQL> SPARQL SELECT * FROM  <http://public_graph>  WHERE { ?s ?p ?o };
s                                                                                 p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

1                                                                                 2                                                                                 3

1 Rows. -- 1 msec.
SQL>

On Subscriber

Check what graphs have been replicated to the subscriber:

SQL> SPARQL SELECT * FROM  <http://public_graph>  WHERE { ?s ?p ?o };
s                                                                                 p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

1                                                                                 2                                                                                 3

1 Rows. -- 1 msec.
SQL> SPARQL SELECT count(*) FROM  <http://private_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_______________________________________________________________________________

0

1 Rows. -- 8 msec.
SQL>

As can be seen the data in the private_graph is not replicated to the subscriber only the data in the public_graph.

Test RDF Bulk Load into private and public graphs

On Publisher:

Setup Virtuoso RDF Bulk Load of sample dataset into the private_graph on the publisher node:

SQL> ld_dir ('.', 'Dataset.ttl', 'http://private_graph');

Done. -- 3 msec.
SQL> select * from load_list;
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_

./Dataset.ttl                                                      http://private_graph                                                              0           NULL                 NULL                 NULL        NULL        NULL

1 Rows. -- 0 msec.
SQL> rdf_loader_run(log_enable=>3);

Done. -- 62 msec.
SQL>

Setup Virtuoso RDF Bulk Load of sample dataset into the public_graph on the publisher node:

SQL> delete from load_list;

Done. -- 0 msec.
SQL> ld_dir ('.', 'Dataset.ttl', 'http://public_graph');

Done. -- 1 msec.
SQL> select * from load_list;
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_

./Dataset.ttl                                                      http://public_graph                                                               0           NULL                 NULL                 NULL        NULL        NULL

1 Rows. -- 0 msec.
SQL> rdf_loader_run(log_enable=>3);

Done. -- 101 msec.
SQL>

Check the graph counts of the private and public graphs on the publisher:

SQL> SPARQL SELECT count(*) FROM  <http://private_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_

1754

1 Rows. -- 1 msec.
SQL> SPARQL SELECT count(*) FROM  <http://public_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_

1754

1 Rows. -- 1 msec.
SQL>

On Subscriber:

==============

Check on the subscriber node what data has been replicated from the subscriber node:

SQL> SPARQL SELECT count(*) FROM  <http://private_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_

0

1 Rows. -- 16 msec.
SQL> SPARQL SELECT count(*) FROM  <http://public_graph>  WHERE { ?s ?p ?o };
callret-0
INTEGER
_

1754

1 Rows. -- 1 msec.
SQL>

As can be seen the Bulk loaded data on the MASTER publisher node into the private_graph is not replicated to the SLAVE subscriber node, whereas that of the public_graph is replicated to the SLAVE subscriber node.

Notes

Protection of private graphs on the publisher node, only start from the point RDF graph replication is enabled.
When a graph is added to the private graph group on the publisher node it will be automatically removed from the subscriber node, and vice-versa ie if a graph is removed from the private graph group on the publisher node (is now public) it will be automatically replicated to the subscriber node.

Virtuoso Replication of Public Only RDF Graphs

What

Why

How

Setup Graph Replication Publisher and Subscriber

On Publisher

On Subscriber

Test SPARUL INSERTS to Private and Public Graphs

On Publisher

On Subscriber

Test RDF Bulk Load into private and public graphs

On Publisher:

On Subscriber:

Notes

Related