Installation & Configuration of Virtuoso RDF Graph Replication Cluster

This document provides a quick start guide detailing the Installation & Configuration of a Virtuoso RDF Graph Replication Cluster with a single Master Publisher and Slave subscriber to demonstrate the command line process, enabling the process to be scripted if required.

Installation

Install Virtuoso for required Operating System as detailed in installation guides. In this document the Master publisher and Slave subscriber nodes with be setup on the same instance. From Virtuoso installation Directory:

  1. Make a copy of the “database” directory
  2. Rename the “database” directory to “master-pub” and rename copy to “slave-sub”
  3. Edit the “virtuoso.ini” file in the “slave-sub” directory and set the SQL port to “1112” and HTTP port to “8892”
  4. Configure an ODBC DSN for connecting the the “master-pub” instance on port 1111 , in the “odbc.ini” file located in the “/etc/odbc.ini” or “bin/odbc.ini” directory of the Virtuoso installation:
[ODBC Data Sources]
..
MASTER_DSN = OpenLink Virtuoso
..

[MASTER_DSN]
Driver = OpenLink Virtuoso
Address = MASTER_IP:1111

Configure Master Publisher Node

  1. Make the following changes in the “virtuoso.ini” file of the “master-pub” directory
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 1 ; enable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[URIQA]
DefaultHost = test.example.com
...
[Replication]
ServerName = MASTER
ServerEnable = 1
QueueMax = 5000000
  1. Connect to “master-pub” instance on port 1111 , to initiate replication publication
$ isql MASTER-IP:1111
  1. Initiate publisher replication publication by running the following commands:
rdf_repl_start(); -- enable this instance as a publisher
rdf_repl_graph_ins('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_all'); -- add all graphs to replication list

Configure Slave Subscriber Node

  1. Make the following changes in the “virtuoso.ini” file of the “slave-sub” directory
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 0 ; disable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[Replication]
; each SLAVE machine needs to have a unique replication server name
ServerName = SLAVE-1 
ServerEnable = 1
QueueMax = 5000000
  1. Connect to “slave-sub” instance on port 1112 , to subscribe to the publisher
$ isql SLAVE-IP:1112
  1. Initiate subscription to the master publisher by running the following commands:
repl_server ('MASTER', 'MASTER_DSN'); -- connect to master
repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba'); -- start subscribing to __rdf_repl
repl_sync_all (); -- start initial replication
DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1); -- add subscription to scheduler

Test Replication Successful

  1. Perform SPARQL insert of triple on master publisher node:
$ isql MASTER-IP:1111

SQL> SPARQL INSERT INTO GRAPH <http://example.org> { <1> <2> <3> };
Done. -- 1ms
SQL>
  1. Check triple has been replicated to slave subscriber node:
$ isql SLAVE-IP:1112

SQL> SPARQL SELECT * FROM <http://example.org> WHERE { ?s ?p ?o };
s p o
VACHAR VARCHAR VARCHAR

-------------------------------------------------------------------------------------------------
<1> <2> <3>
Done. -- 1ms
SQL>

Typical Publisher and Subscriber Logs

Master Publisher

$ tail -f virtuoso.log
22:18:57 PL LOG: Installing Virtuoso Conductor version 1.00.8727 (DAV)
22:18:57 Checkpoint started
22:18:57 Checkpoint finished, log reused
22:19:00 HTTP/WebDAV server online at 8890
22:19:00 Server online at 1111 (pid 4058)
22:19:01 ZeroConfig registration virtuoso-pub (OPLLINUX6)
22:39:18 Started replication log '__rdf_repl.log'.
.
.
.
23:00:47 Started replication log '__rdf_repl20141117230047154001.log'.
23:00:47 Subscription of 'SLAVE-1' for '__rdf_repl' sync starts at 2.
23:00:48 Started replication log '__rdf_repl20141117230048575997.log'.
23:00:49 Subscription of 'SLAVE-1' for '__rdf_repl' level 83 moved to sync set.

Slave Subscriber

$ tail -f virtuoso.log
22:23:39 PL LOG: Installing Virtuoso Conductor version 1.00.8727 (DAV)
22:23:39 Checkpoint started
22:23:39 Checkpoint finished, log reused
22:23:42 HTTP/WebDAV server online at 8892
22:23:42 Server online at 1112 (pid 4112)
22:23:42 ZeroConfig name conflict on virtuoso-sub (OPLLINUX6)
22:49:29 Connected to replication server 'localhost:1111'.
22:49:29 Requesting sync from 'MASTER' for '__rdf_repl' level 0.
22:49:29 Replication Account MASTER __rdf_repl IN sync, level 1.
.
.
.
23:00:45 Replication server MASTER disconnected, level of __rdf_repl is 2.
23:00:47 Connected to replication server 'localhost:1111'.
23:00:47 Requesting sync from 'MASTER' for '__rdf_repl' level 2.
23:00:47 Sync request sent to publishing server 'MASTER' for account '__rdf_repl' as 'dba'.
23:00:49 Replication Account MASTER __rdf_repl IN sync, level 83.

Making a Subscriber from copy of Publisher node

A replication subscriber node can be made from a copy of the publisher node it subscribes to and is a replica of as follows:

  • Assume the publisher replication server name is MASTER and the subscriber replication server name is SLAVE, as in the example above.
  • Copy publisher database to the subscriber database directory
  • Start the publisher database
  • Run the following command to disable the replication publications
delete from sys_repl_accounts;
delete from sys_repl_subscribers;
registry_remove ('DB.DBA.RDF_REPL');
  • restart the database
  • Determine the replication level the publisher was at with the query:
select sequence_set ('repl_MASTER___rdf_repl', 0, 2);
  • Select the subscriber replication level to be one less than the publisher level with the query:
sequence_set ('repl_MASTER___rdf_repl', 3462, 0);
  • Setup the new subscriber replication to the original publisher with the normal commands:
repl_server ('MASTER', 'MASTER_DSN'); -- connect to master
repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba'); -- start subscribing to __rdf_repl
repl_sync_all (); -- start initial replication
DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1); -- add subscription to scheduler
  • Test the replication is working by inserting triple on publisher and check it is replicated to the subscriber, and also confirm repl_stat() reports the subscriber to be IN-SYNC.

Making a Bi-Directional (Publisher/Subscriber) replication node from an existing copy

The steps are similar to making a subscriber from a publisher, except that as the bi-directional replication node needs to both subscribe to and publish data, the copy also needs to be setup as a publisher and any existing nodes in the replication cluster need to re-subscribe to it.

Firstly, the bi-directional replication node database copy needs to be made subscriber node as per the steps in the previous section.

Then the bi-directional replication node database copy can be made a publisher and other nodes of the cluster re-subscribe to it as follows:

  • Any existing replication log files from a previous existence of the cluster node being replaced must be removed or renamed, the file names being repl.cfg and ___rdf_repl*
$ rm repl.cfg 
$ rm __rdf_repl*.log
  • Setup the database copy RDF replication publication with the commands:
rdf_repl_start();
rdf_repl_graph_ins('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_all');
  • Run repl_stat() to confirm the publication is setup:
SQL> repl_stat();
server   account  level       stat
VARCHAR  VARCHAR  INTEGER     INTEGER
_______________________________________________________________________________

MASTER    __rdf_repl  62          IN SYNC
SLAVE    __rdf_repl  2           OFF
SLAVE    SLAVE    0           OFF

3 Rows. -- 2 msec.
SQL>
  • On any replication cluster node that previously had a subscription to the publication, connect to it and run the following commands to reset its replication level:
sequence_set ('repl_SLAVE___rdf_repl', 1, 0);
repl_sync_all();
  • Run repl_stat() to confirm the subscription is setup:
SQL> repl_stat();
server   account  level       stat
VARCHAR  VARCHAR  INTEGER     INTEGER
_______________________________________________________________________________

MASTER    MASTER    0           OFF
MASTER    __rdf_repl  63          OFF
SLAVE    __rdf_repl  1           IN SYNC

3 Rows. -- 2 msec.
SQL>
  • Test the replication is working by inserting triples on both nodes of the bi-directional replication nodes and check they have replicated in both directions, and also confirm repl_stat() run on both nodes reports them to be IN-SYNC and replication levels are incrementing.

Disconnection of Subscriber from Publisher

Given the BO publisher replication server name of “MASTER”, which I see in the logs, a replication subscription on a FOx subscriber node can be disconnected with the command:

   REPL_DISCONNECT('MASTER'); 

The Virtuoso scheduled event would also have to be disabled first to prevent it from reconnecting, by toggling the SE_DISABLED column of the SYS_SCHEDULED_EVENT table with the following query:

    update  SYS_SCHEDULED_EVENT SET SE_DISABLED=1  where SE_NAME='repl_MASTER___rdf_repl'; 

The “REPL_STAT();” command can also be run verify the subscription has been disconnected.

For example the following is typical output of “isql” commands run:

SQL> select SE_NAME, SE_DISABLED from SYS_SCHEDULED_EVENT where SE_NAME='repl_MASTER___rdf_repl';
SE_NAME                                                                           SE_DISABLED
VARCHAR NOT NULL                                                                  INTEGER
_______________________________________________________________________________

repl_MASTER___rdf_repl                                                            0

1 Rows. -- 1 msec.
SQL> update  SYS_SCHEDULED_EVENT SET SE_DISABLED=1  where SE_NAME='repl_MASTER___rdf_repl'; 

Done. -- 1 msec. 
SQL> select SE_NAME, SE_DISABLED from SYS_SCHEDULED_EVENT where SE_NAME='repl_MASTER___rdf_repl';
SE_NAME                                                                           SE_DISABLED
VARCHAR NOT NULL                                                                  INTEGER
_______________________________________________________________________________

repl_MASTER___rdf_repl                                                            1

1 Rows. -- 1 msec.
SQL>  REPL_STAT(); 
server   account  level       stat        ts 
VARCHAR  VARCHAR  INTEGER     INTEGER     VARCHAR 
_______________________________________________________________________________ 

SLAVE    SLAVE    0           OFF          
MASTER   __rdf_repl  8063        IN SYNC     2023-05-17T11:20:06Z 

2 Rows. -- 1 msec. 
SQL> REPL_DISCONNECT('MASTER'); 

Done. -- 1 msec. 
SQL> REPL_STAT(); 
server   account  level       stat        ts 
VARCHAR  VARCHAR  INTEGER     INTEGER     VARCHAR 
_______________________________________________________________________________ 

SLAVE    SLAVE    0           OFF          
MASTER   __rdf_repl  8063        DISCONNECTED  2023-05-17T11:26:04Z 

2 Rows. -- 0 msec. 
SQL> 

When ready re-enable scheduler replication event to re-enable replication on the subscriber node with the query:

   update  SYS_SCHEDULED_EVENT SET SE_DISABLED=0  where SE_NAME='repl_MASTER___rdf_repl';

For example:

SQL> REPL_STAT(); 
server   account  level       stat        ts 
VARCHAR  VARCHAR  INTEGER     INTEGER     VARCHAR 
_______________________________________________________________________________ 

SLAVE    SLAVE    0           OFF          
MASTER   __rdf_repl  8063        DISCONNECTED  2023-05-17T11:26:04Z 

2 Rows. -- 0 msec. 
SQL> update  SYS_SCHEDULED_EVENT SET SE_DISABLED=0  where SE_NAME='repl_MASTER___rdf_repl'; 

Done. -- 0 msec. 
SQL> select SE_NAME, SE_DISABLED from SYS_SCHEDULED_EVENT where SE_NAME='repl_MASTER___rdf_repl';
SE_NAME                                                                           SE_DISABLED
VARCHAR NOT NULL                                                                  INTEGER
_______________________________________________________________________________

repl_MASTER___rdf_repl                                                            0

1 Rows. -- 1 msec.
SQL> REPL_STAT(); 
server   account  level       stat        ts 
VARCHAR  VARCHAR  INTEGER     INTEGER     VARCHAR 
_______________________________________________________________________________ 

SLAVE    SLAVE    0           OFF          
MASTER   __rdf_repl  8063        DISCONNECTED  2023-05-17T11:26:04Z 

2 Rows. -- 0 msec. 
SQL>  

After maximum of 10mins the subscriber node will reconnect and SYNC with the publisher node:

SQL> REPL_STAT(); 
server   account  level       stat        ts 
VARCHAR  VARCHAR  INTEGER     INTEGER     VARCHAR 
_______________________________________________________________________________ 

SLAVE    SLAVE    0           OFF          
MASTER   __rdf_repl  8064        IN SYNC     2023-05-17T11:49:14Z 

2 Rows. -- 1 msec. 
SQL> 

Troubleshooting RDF Graph Replication Issues

  • Use DB.DBA.REPL_STAT(); to check the replication status on the publisher or subscriber
  • Use DB.DBA.RDF_REPL_STOP(); function for stopping/unpublishing the RDF Graph replication publications. The call DB.DBA.REPL_UNPUBLISH ('__rdf_repl'); should not be used.
  • Use REPL_UNSUBSCRIBE('MASTER', '__rdf_repl') — to unsubscribe a subscriber from publisher
  • Replication items to be removed from a copy of MASTER publisher database to then make it a slave subscriber (or another MASTER):
delete from sys_repl_accounts;
delete from sys_repl_subscribers;
registry_remove ('DB.DBA.RDF_REPL’);
  • Check for information on DSNs being used for replication with the query:
 select * from SYS_SERVERS; 

Related