This document provides a quick start guide detailing the Installation & Configuration of a Virtuoso RDF Graph Replication Cluster with a single Master Publisher
and Slave subscriber to demonstrate the command line process, enabling the process to be scripted if required.
Installation
Install Virtuoso for required Operating System as detailed in installation guides. In this document the Master publisher
and Slave subscriber
nodes with be setup on the same instance. From Virtuoso installation Directory:
- Make a copy of the “database” directory
- Rename the “database” directory to “master-pub” and rename copy to “slave-sub”
- Edit the “virtuoso.ini” file in the “slave-sub” directory and set the SQL port to “1112” and HTTP port to “8892”
- Configure an ODBC DSN for connecting the the “master-pub” instance on port 1111 , in the “odbc.ini” file located in the “/etc/odbc.ini” or “bin/odbc.ini” directory of the Virtuoso installation:
[ODBC Data Sources]
..
MASTER_DSN = OpenLink Virtuoso
..
[MASTER_DSN]
Driver = OpenLink Virtuoso
Address = MASTER_IP:1111
Configure Master Publisher Node
- Make the following changes in the “virtuoso.ini” file of the “master-pub” directory
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 1 ; enable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[URIQA]
DefaultHost = test.example.com
...
[Replication]
ServerName = MASTER
ServerEnable = 1
QueueMax = 5000000
- Connect to “master-pub” instance on port 1111 , to initiate replication publication
$ isql MASTER-IP:1111
- Initiate publisher replication publication by running the following commands:
rdf_repl_start(); -- enable this instance as a publisher
rdf_repl_graph_ins('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_all'); -- add all graphs to replication list
Configure Slave Subscriber Node
- Make the following changes in the “virtuoso.ini” file of the “slave-sub” directory
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 0 ; disable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[Replication]
; each SLAVE machine needs to have a unique replication server name
ServerName = SLAVE-1
ServerEnable = 1
QueueMax = 5000000
- Connect to “slave-sub” instance on port 1112 , to subscribe to the publisher
$ isql SLAVE-IP:1112
- Initiate subscription to the master publisher by running the following commands:
repl_server ('MASTER', 'MASTER_DSN'); -- connect to master
repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba'); -- start subscribing to __rdf_repl
repl_sync_all (); -- start initial replication
DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1); -- add subscription to scheduler
Test Replication Successful
- Perform SPARQL insert of triple on master publisher node:
$ isql MASTER-IP:1111
SQL> SPARQL INSERT INTO GRAPH <http://example.org> { <1> <2> <3> };
Done. -- 1ms
SQL>
- Check triple has been replicated to slave subscriber node:
$ isql SLAVE-IP:1112
SQL> SPARQL SELECT * FROM <http://example.org> WHERE { ?s ?p ?o };
s p o
VACHAR VARCHAR VARCHAR
-------------------------------------------------------------------------------------------------
<1> <2> <3>
Done. -- 1ms
SQL>
Typical Publisher and Subscriber Logs
Master Publisher
$ tail -f virtuoso.log
22:18:57 PL LOG: Installing Virtuoso Conductor version 1.00.8727 (DAV)
22:18:57 Checkpoint started
22:18:57 Checkpoint finished, log reused
22:19:00 HTTP/WebDAV server online at 8890
22:19:00 Server online at 1111 (pid 4058)
22:19:01 ZeroConfig registration virtuoso-pub (OPLLINUX6)
22:39:18 Started replication log '__rdf_repl.log'.
.
.
.
23:00:47 Started replication log '__rdf_repl20141117230047154001.log'.
23:00:47 Subscription of 'SLAVE-1' for '__rdf_repl' sync starts at 2.
23:00:48 Started replication log '__rdf_repl20141117230048575997.log'.
23:00:49 Subscription of 'SLAVE-1' for '__rdf_repl' level 83 moved to sync set.
Slave Subscriber
$ tail -f virtuoso.log
22:23:39 PL LOG: Installing Virtuoso Conductor version 1.00.8727 (DAV)
22:23:39 Checkpoint started
22:23:39 Checkpoint finished, log reused
22:23:42 HTTP/WebDAV server online at 8892
22:23:42 Server online at 1112 (pid 4112)
22:23:42 ZeroConfig name conflict on virtuoso-sub (OPLLINUX6)
22:49:29 Connected to replication server 'localhost:1111'.
22:49:29 Requesting sync from 'MASTER' for '__rdf_repl' level 0.
22:49:29 Replication Account MASTER __rdf_repl IN sync, level 1.
.
.
.
23:00:45 Replication server MASTER disconnected, level of __rdf_repl is 2.
23:00:47 Connected to replication server 'localhost:1111'.
23:00:47 Requesting sync from 'MASTER' for '__rdf_repl' level 2.
23:00:47 Sync request sent to publishing server 'MASTER' for account '__rdf_repl' as 'dba'.
23:00:49 Replication Account MASTER __rdf_repl IN sync, level 83.
Making a Subscriber from copy of Publisher node
A replication subscriber node can be made from a copy of the publisher node it subscribes to and is a replica of as follows:
- Assume the publisher replication server name is
MASTER
and the subscriber replication server name isSLAVE
, as in the example above. - Copy publisher database to the subscriber database directory
- Start the publisher database
- Run the following command to disable the replication publications
delete from sys_repl_accounts;
delete from sys_repl_subscribers;
registry_remove ('DB.DBA.RDF_REPL');
- restart the database
- Determine the replication level the publisher was at with the query:
select sequence_set ('repl_MASTER___rdf_repl', 0, 2);
- Select the subscriber replication level to be one less than the publisher level with the query:
sequence_set ('repl_MASTER___rdf_repl', 3462, 0);
- Setup the new subscriber replication to the original publisher with the normal commands:
repl_server ('MASTER', 'MASTER_DSN'); -- connect to master
repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba'); -- start subscribing to __rdf_repl
repl_sync_all (); -- start initial replication
DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1); -- add subscription to scheduler
- Test the replication is working by inserting triple on publisher and check it is replicated to the subscriber, and also confirm
repl_stat()
reports the subscriber to beIN-SYNC
.
Making a Bi-Directional (Publisher/Subscriber) replication node from an existing copy
The steps are similar to making a subscriber from a publisher, except that as the bi-directional replication node needs to both subscribe to and publish data, the copy also needs to be setup as a publisher and any existing nodes in the replication cluster need to re-subscribe to it.
Firstly, the bi-directional replication node database copy needs to be made subscriber node as per the steps in the previous section.
Then the bi-directional replication node database copy can be made a publisher and other nodes of the cluster re-subscribe to it as follows:
- Any existing replication log files from a previous existence of the cluster node being replaced must be removed or renamed, the file names being
repl.cfg
and___rdf_repl*
$ rm repl.cfg
$ rm __rdf_repl*.log
- Setup the database copy RDF replication publication with the commands:
rdf_repl_start();
rdf_repl_graph_ins('http://www.openlinksw.com/schemas/virtrdf#rdf_repl_all');
- Run
repl_stat()
to confirm the publication is setup:
SQL> repl_stat();
server account level stat
VARCHAR VARCHAR INTEGER INTEGER
_______________________________________________________________________________
MASTER __rdf_repl 62 IN SYNC
SLAVE __rdf_repl 2 OFF
SLAVE SLAVE 0 OFF
3 Rows. -- 2 msec.
SQL>
- On any replication cluster node that previously had a subscription to the publication, connect to it and run the following commands to reset its replication level:
sequence_set ('repl_SLAVE___rdf_repl', 1, 0);
repl_sync_all();
- Run
repl_stat()
to confirm the subscription is setup:
SQL> repl_stat();
server account level stat
VARCHAR VARCHAR INTEGER INTEGER
_______________________________________________________________________________
MASTER MASTER 0 OFF
MASTER __rdf_repl 63 OFF
SLAVE __rdf_repl 1 IN SYNC
3 Rows. -- 2 msec.
SQL>
- Test the replication is working by inserting triples on both nodes of the bi-directional replication nodes and check they have replicated in both directions, and also confirm
repl_stat()
run on both nodes reports them to beIN-SYNC
and replication levels are incrementing.
Disconnection of Subscriber from Publisher
Given the BO publisher replication server name of “MASTER”, which I see in the logs, a replication subscription on a FOx subscriber node can be disconnected with the command:
REPL_DISCONNECT('MASTER');
The Virtuoso scheduled event would also have to be disabled first to prevent it from reconnecting, by toggling the SE_DISABLED
column of the SYS_SCHEDULED_EVENT
table with the following query:
update SYS_SCHEDULED_EVENT SET SE_DISABLED=1 where SE_NAME='repl_MASTER___rdf_repl';
The “REPL_STAT();” command can also be run verify the subscription has been disconnected.
For example the following is typical output of “isql” commands run:
SQL> select SE_NAME, SE_DISABLED from SYS_SCHEDULED_EVENT where SE_NAME='repl_MASTER___rdf_repl';
SE_NAME SE_DISABLED
VARCHAR NOT NULL INTEGER
_______________________________________________________________________________
repl_MASTER___rdf_repl 0
1 Rows. -- 1 msec.
SQL> update SYS_SCHEDULED_EVENT SET SE_DISABLED=1 where SE_NAME='repl_MASTER___rdf_repl';
Done. -- 1 msec.
SQL> select SE_NAME, SE_DISABLED from SYS_SCHEDULED_EVENT where SE_NAME='repl_MASTER___rdf_repl';
SE_NAME SE_DISABLED
VARCHAR NOT NULL INTEGER
_______________________________________________________________________________
repl_MASTER___rdf_repl 1
1 Rows. -- 1 msec.
SQL> REPL_STAT();
server account level stat ts
VARCHAR VARCHAR INTEGER INTEGER VARCHAR
_______________________________________________________________________________
SLAVE SLAVE 0 OFF
MASTER __rdf_repl 8063 IN SYNC 2023-05-17T11:20:06Z
2 Rows. -- 1 msec.
SQL> REPL_DISCONNECT('MASTER');
Done. -- 1 msec.
SQL> REPL_STAT();
server account level stat ts
VARCHAR VARCHAR INTEGER INTEGER VARCHAR
_______________________________________________________________________________
SLAVE SLAVE 0 OFF
MASTER __rdf_repl 8063 DISCONNECTED 2023-05-17T11:26:04Z
2 Rows. -- 0 msec.
SQL>
When ready re-enable scheduler replication event to re-enable replication on the subscriber node with the query:
update SYS_SCHEDULED_EVENT SET SE_DISABLED=0 where SE_NAME='repl_MASTER___rdf_repl';
For example:
SQL> REPL_STAT();
server account level stat ts
VARCHAR VARCHAR INTEGER INTEGER VARCHAR
_______________________________________________________________________________
SLAVE SLAVE 0 OFF
MASTER __rdf_repl 8063 DISCONNECTED 2023-05-17T11:26:04Z
2 Rows. -- 0 msec.
SQL> update SYS_SCHEDULED_EVENT SET SE_DISABLED=0 where SE_NAME='repl_MASTER___rdf_repl';
Done. -- 0 msec.
SQL> select SE_NAME, SE_DISABLED from SYS_SCHEDULED_EVENT where SE_NAME='repl_MASTER___rdf_repl';
SE_NAME SE_DISABLED
VARCHAR NOT NULL INTEGER
_______________________________________________________________________________
repl_MASTER___rdf_repl 0
1 Rows. -- 1 msec.
SQL> REPL_STAT();
server account level stat ts
VARCHAR VARCHAR INTEGER INTEGER VARCHAR
_______________________________________________________________________________
SLAVE SLAVE 0 OFF
MASTER __rdf_repl 8063 DISCONNECTED 2023-05-17T11:26:04Z
2 Rows. -- 0 msec.
SQL>
After maximum of 10mins the subscriber node will reconnect and SYNC with the publisher node:
SQL> REPL_STAT();
server account level stat ts
VARCHAR VARCHAR INTEGER INTEGER VARCHAR
_______________________________________________________________________________
SLAVE SLAVE 0 OFF
MASTER __rdf_repl 8064 IN SYNC 2023-05-17T11:49:14Z
2 Rows. -- 1 msec.
SQL>
Troubleshooting RDF Graph Replication Issues
- Use
DB.DBA.REPL_STAT();
to check the replication status on the publisher or subscriber - Use
DB.DBA.RDF_REPL_STOP();
function for stopping/unpublishing the RDF Graph replication publications. The callDB.DBA.REPL_UNPUBLISH ('__rdf_repl');
should not be used. - Use
REPL_UNSUBSCRIBE('MASTER', '__rdf_repl')
— to unsubscribe a subscriber from publisher - Replication items to be removed from a copy of MASTER publisher database to then make it a slave subscriber (or another MASTER):
delete from sys_repl_accounts;
delete from sys_repl_subscribers;
registry_remove ('DB.DBA.RDF_REPL’);
- Check for information on DSNs being used for replication with the query:
select * from SYS_SERVERS;