Installation & Configuration of Virtuoso RDF Graph Replication Cluster

This document provides a quick start guide detailing the Installation & Configuration of a Virtuoso RDF Graph Replication Cluster with a single Master Publisher and Slave subscriber to demonstrate the command line process, enabling the process to be scripted if required.

Installation

Install Virtuoso for required Operating System as detailed in installation guides. In this document the Master publisher and Slave subscriber nodes with be setup on the same instance. From Virtuoso installation Directory:

  1. Make a copy of the “database” directory
  2. Rename the “database” directory to “master-pub” and rename copy to “slave-sub”
  3. Edit the “virtuoso.ini” file in the “slave-sub” directory and set the SQL port to “1112” and HTTP port to “8892”
  4. Configure an ODBC DSN for connecting the the “master-pub” instance on port 1111 , in the “odbc.ini” file located in the “bin/odbc.ini” directory of the Virtuoso installation:
[ODBC Data Sources]
..
MASTER_DSN = OpenLink Virtuoso
..

[MASTER_DSN]
Driver = OpenLink Virtuoso
Address = MASTER_IP:1111

Configure Master Publisher Node

  1. Make the following changes in the “virtuoso.ini” file of the “master-pub” directory
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 1 ; enable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[URIQA]
DefaultHost = test.example.com
...
[Replication]
ServerName = MASTER
ServerEnable = 1
QueueMax = 5000000
  1. Connect to “master-pub” instance on port 1111 , to initiate replication publication
$ isql MASTER-IP:1111
  1. Initiate publisher replication publication by running the following commands:
rdf_repl_start(); -- enable this instance as a publisher
rdf_repl_graph_ins('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_all'); -- add all graphs to replication list

Configure Slave Subscriber Node

  1. Make the following changes in the “virtuoso.ini” file of the “slave-sub” directory
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 0 ; disable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[Replication]
; each SLAVE machine needs to have a unique replication server name
ServerName = SLAVE-1 
ServerEnable = 1
QueueMax = 5000000
  1. Connect to “slave-sub” instance on port 1112 , to subscribe to the publisher
$ isql SLAVE-IP:1112
  1. Initiate subscription to the master publisher by running the following commands:
repl_server ('MASTER', 'MASTER_DSN'); -- connect to master
repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba'); -- start subscribing to __rdf_repl
repl_sync_all (); -- start initial replication
DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1); -- add subscription to scheduler

Test Replication Successful

  1. Perform SPARQL insert of triple on master publisher node:
$ isql MASTER-IP:1111

SQL> SPARQL INSERT INTO GRAPH <http://example.org> { <1> <2> <3> };
Done. -- 1ms
SQL>
  1. Check triple has been replicated to slave subscriber node:
$ isql SLAVE-IP:1112

SQL> SPARQL SELECT * FROM <http://example.org> WHERE { ?s ?p ?o };
s p o
VACHAR VARCHAR VARCHAR

-------------------------------------------------------------------------------------------------
<1> <2> <3>
Done. -- 1ms
SQL>

Typical Publisher and Subscriber Logs

Master Publisher

$ tail -f virtuoso.log
22:18:57 PL LOG: Installing Virtuoso Conductor version 1.00.8727 (DAV)
22:18:57 Checkpoint started
22:18:57 Checkpoint finished, log reused
22:19:00 HTTP/WebDAV server online at 8890
22:19:00 Server online at 1111 (pid 4058)
22:19:01 ZeroConfig registration virtuoso-pub (OPLLINUX6)
22:39:18 Started replication log '__rdf_repl.log'.
.
.
.
23:00:47 Started replication log '__rdf_repl20141117230047154001.log'.
23:00:47 Subscription of 'SLAVE-1' for '__rdf_repl' sync starts at 2.
23:00:48 Started replication log '__rdf_repl20141117230048575997.log'.
23:00:49 Subscription of 'SLAVE-1' for '__rdf_repl' level 83 moved to sync set.

Slave Subscriber

$ tail -f virtuoso.log
22:23:39 PL LOG: Installing Virtuoso Conductor version 1.00.8727 (DAV)
22:23:39 Checkpoint started
22:23:39 Checkpoint finished, log reused
22:23:42 HTTP/WebDAV server online at 8892
22:23:42 Server online at 1112 (pid 4112)
22:23:42 ZeroConfig name conflict on virtuoso-sub (OPLLINUX6)
22:49:29 Connected to replication server 'localhost:1111'.
22:49:29 Requesting sync from 'MASTER' for '__rdf_repl' level 0.
22:49:29 Replication Account MASTER __rdf_repl IN sync, level 1.
.
.
.
23:00:45 Replication server MASTER disconnected, level of __rdf_repl is 2.
23:00:47 Connected to replication server 'localhost:1111'.
23:00:47 Requesting sync from 'MASTER' for '__rdf_repl' level 2.
23:00:47 Sync request sent to publishing server 'MASTER' for account '__rdf_repl' as 'dba'.
23:00:49 Replication Account MASTER __rdf_repl IN sync, level 83.

Troubleshooting RDF Graph Replication Issues

  • Use DB.DBA.REPL_STAT(); to check the replication status on the publisher or subscriber
  • Use DB.DBA.RDF_REPL_STOP(); function for stopping/unpublishing the RDF Graph replication publications. The call DB.DBA.REPL_UNPUBLISH ('__rdf_repl'); should not be used.
  • Use REPL_UNSUBSCRIBE('MASTER', '__rdf_repl') — to unsubscribe a subscriber from publisher
  • Replication items to be removed from a copy of MASTER publisher database to then make it a slave subscriber (or another MASTER):
delete from sys_repl_accounts;
delete from sys_repl_subscribers;
registry_remove ('DB.DBA.RDF_REPL’);
  • Check for information on DSNs being used for replication with the query:
 select * from SYS_SERVERS; 

Related