What
Virtuoso allows the replication of public only RDF graphs from a MASTER publisher node to SLAVE subscriber nodes, using the new DB.DBA.RDF_REPL_GRAPH_INS ('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world'); option.
Why
In situations where RDF Graph created on the MASTER subscriber node that are not to be replicated to the SLAVE subscriber, Virtuoso provides a new http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world option, whereby any RDF graphs in the Virtuoso special http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs graph group will not be replicated to SLAVE subscriber nodes.
How
The option to enable replication of the public only graphs can achieved with the http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world parameter passed to the DB.DBA.RDF_REPL_GRAPH_INS() function on the MASTER publisher with the command:
DB.DBA.RDF_REPL_GRAPH_INS ('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world');
Then any RDF graphs added to the special http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs graph group with the DB.DBA.RDF_GRAPH_GROUP_INS() function will not be replicated to SLAVE subscriber nodes, with the command:
DB.DBA.RDF_GRAPH_GROUP_INS ('http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs', 'private_graph_name');
The following shows how SPARQL update queries and RDF Bulk Load operations performed on a private graph on the MASTER publisher are not replicated to the SLAVE subscriber(s):
Setup Graph Replication Publisher and Subscriber
On Publisher
- Stop any active RDF Graph replication publications on the MASTER publisher node:
SQL> DB.DBA.RDF_REPL_STOP();
Done. -- 8 msec.
SQL>
- Start new RDF Graph replication publication of
public(world) only graphs:
SQL> rdf_repl_start();
Done. -- 31 msec.
SQL> DB.DBA.RDF_REPL_GRAPH_INS ('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_world');
Done. -- 6 msec.
SQL> repl_stat();
server account level stat
VARCHAR VARCHAR INTEGER INTEGER
_______________________________________________________________________________
MASTER MASTER 0 OFF
MASTER __rdf_repl 2 OFF
2 Rows. -- 1 msec.
SQL>
On Subscriber
- On the SLAVE subscriber node(s) subscribe to the RDF Graph replication publication by the MASTER publisher node:
SQL> repl_server ('MASTER', 'MASTER_DSN');
Done. -- 11 msec.
SQL> repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba');
Done. -- 81 msec.
SQL> repl_sync_all ();
Done. -- 1 msec.
SQL> DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1);
Done. -- 2 msec.
SQL>
Test SPARUL INSERTS to Private and Public Graphs
On Publisher
- Add graph
http://private_graphto the Virtuoso private graph group:
SQL> DB.DBA.RDF_GRAPH_GROUP_INS ('http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs', 'http://private_graph');
Done. -- 4 msec.
SQL>
- Insert RDF Triple into
http://private_graphandhttp://public_graphtest graphs:
SPARQL INSERT INTO GRAPH <http://private_graph> { <1> <2> <3> };
Done. -- 2 msec.
SQL> SPARQL SELECT * FROM <http://private_graph> WHERE { ?s ?p ?o };
s p o
LONG VARCHAR LONG VARCHAR LONG VARCHAR
_______________________________________________________________________________
1 2 3
1 Rows. -- 1 msec.
SQL>
SQL> SPARQL INSERT INTO GRAPH <http://public_graph> { <1> <2> <3> };
Done. -- 3 msec.
SQL>
SQL> SPARQL SELECT * FROM <http://public_graph> WHERE { ?s ?p ?o };
s p o
LONG VARCHAR LONG VARCHAR LONG VARCHAR
_______________________________________________________________________________
1 2 3
1 Rows. -- 1 msec.
SQL>
On Subscriber
- Check what graphs have been replicated to the subscriber:
SQL> SPARQL SELECT * FROM <http://public_graph> WHERE { ?s ?p ?o };
s p o
LONG VARCHAR LONG VARCHAR LONG VARCHAR
_______________________________________________________________________________
1 2 3
1 Rows. -- 1 msec.
SQL> SPARQL SELECT count(*) FROM <http://private_graph> WHERE { ?s ?p ?o };
callret-0
INTEGER
_______________________________________________________________________________
0
1 Rows. -- 8 msec.
SQL>
As can be seen the data in the private_graph is not replicated to the subscriber only the data in the public_graph.
Test RDF Bulk Load into private and public graphs
On Publisher:
- Setup Virtuoso RDF Bulk Load of sample dataset into the
private_graphon the publisher node:
SQL> ld_dir ('.', 'Dataset.ttl', 'http://private_graph');
Done. -- 3 msec.
SQL> select * from load_list;
ll_file ll_graph ll_state ll_started ll_done ll_host ll_work_time ll_error
VARCHAR NOT NULL VARCHAR INTEGER TIMESTAMP TIMESTAMP INTEGER INTEGER VARCHAR
_
./Dataset.ttl http://private_graph 0 NULL NULL NULL NULL NULL
1 Rows. -- 0 msec.
SQL> rdf_loader_run(log_enable=>3);
Done. -- 62 msec.
SQL>
- Setup Virtuoso RDF Bulk Load of sample dataset into the
public_graphon the publisher node:
SQL> delete from load_list;
Done. -- 0 msec.
SQL> ld_dir ('.', 'Dataset.ttl', 'http://public_graph');
Done. -- 1 msec.
SQL> select * from load_list;
ll_file ll_graph ll_state ll_started ll_done ll_host ll_work_time ll_error
VARCHAR NOT NULL VARCHAR INTEGER TIMESTAMP TIMESTAMP INTEGER INTEGER VARCHAR
_
./Dataset.ttl http://public_graph 0 NULL NULL NULL NULL NULL
1 Rows. -- 0 msec.
SQL> rdf_loader_run(log_enable=>3);
Done. -- 101 msec.
SQL>
- Check the graph counts of the private and public graphs on the publisher:
SQL> SPARQL SELECT count(*) FROM <http://private_graph> WHERE { ?s ?p ?o };
callret-0
INTEGER
_
1754
1 Rows. -- 1 msec.
SQL> SPARQL SELECT count(*) FROM <http://public_graph> WHERE { ?s ?p ?o };
callret-0
INTEGER
_
1754
1 Rows. -- 1 msec.
SQL>
On Subscriber:
==============
- Check on the subscriber node what data has been replicated from the subscriber node:
SQL> SPARQL SELECT count(*) FROM <http://private_graph> WHERE { ?s ?p ?o };
callret-0
INTEGER
_
0
1 Rows. -- 16 msec.
SQL> SPARQL SELECT count(*) FROM <http://public_graph> WHERE { ?s ?p ?o };
callret-0
INTEGER
_
1754
1 Rows. -- 1 msec.
SQL>
As can be seen the Bulk loaded data on the MASTER publisher node into the private_graph is not replicated to the SLAVE subscriber node, whereas that of the public_graph is replicated to the SLAVE subscriber node.
Notes
- Protection of
privategraphs on the publisher node, only start from the point RDF graph replication is enabled. - When a graph is added to the private graph group on the publisher node it will be automatically removed from the subscriber node, and vice-versa ie if a graph is removed from the private graph group on the publisher node (is now public) it will be automatically replicated to the subscriber node.
Related
- Installation & Configuration of Virtuoso RDF Graph Replication Cluster
- Virtuoso RDF Graph Replication
- Virtuoso Graph Replication Guide
- Create a Virtuoso RDF Graph Replication slave Subscriber node from a master Publisher node
- Virtuoso RDF Graph Replication log files Purge Function
- Virtuoso RDF Graph Groups
- Virtuoso additional replication keepalive settings
- Virtuoso RDF Bulk Loader