Bulk RDF data File into Clustered instance

Hi,
Im using Virtuoso 8.3, have license for one month, trying loading ttl file ( rdf data ), using DB.DBA.TTLP_MT procedure into clustered instance, the file ~ 3.5 GB but after import starts, this error is printed,

*** Error 08C02: VD [Virtuoso Server]TURTLE RDF loader, line 523711: CLDPN: Cluster operation cancelled because of peer disconnect 22204 at line 5 of Top-Level:

I noted that one of clusters ( 4 clusters) is disconnected,
any one has idea about that.

Thanks

There may be more information in the log files, for any of the nodes of the cluster, primarily the one that seems to be down/disconnected.

That said, I would suggest that you try using Virtuoso’s bulk-loading features, instead of the manual DB.DBA.TTLP_MT load function.

Hi,
I used the bulk function

SQL> LD_DIR ('/usr/graphdb2/', 'data_000004.ttl', 'graph1') ;
 
SQL> RDF_LOADER_RUN() ;                                                    

but two files are inserted but when i try the third file, i get this error:

40001 TURTLE RDF loader, line 256175: Over 10 deadlocks in rdf load, please retry load.

This is part of log file:

18:56:21 OpenLink Virtuoso Universal Server
18:56:21 Version 08.03.3314-pthreads for Linux as of May 24 2019
18:56:21 uses parts of OpenSSL, PCRE, Html Tidy
18:56:21 Registered to http://my.openlinksw.com/dataspace/person/hani2rihan2#this
18:56:21 Personal Edition license for unlimited connections
18:56:21 Issued by OpenLink Software
18:56:21 This license will expire on Mon Jan 27 00:00:00 2020 GMT
18:56:21 Enabled Cluster Extension
18:56:21 Enabled Column Store Extension
18:56:21 Enabled Virtual Database Extension
18:56:21 Enabled Replication Extension
18:56:21 Enabled Scalable ACL Extension
18:56:21 Enabled Custom Reasoning & Inference Rules
18:56:21 Database version 3126
18:56:21 SQL Optimizer enabled (max 1000 layouts)
18:56:23 Compiler unit is timed at 0.000370 msec
18:56:27 Error executing a server init statement : RDFXX: .....: iri_to_id () refers to part$
18:56:27 Error executing a server init statement : RDFXX: .....: iri_to_id () refers to part$
18:56:29 Roll forward started
18:56:29     40 transactions, 3559 bytes replayed (100 %)
18:56:29 Roll forward complete
18:56:29 Initial connection from 4
18:56:29 Txn w ids start at 16200
18:56:29 Error executing a server init statement : 08C01: Host 4: CL...: Host 4 is pending i$
18:56:29 Error executing a server init statement : 08C06: CLNJO: Cluster operations not allo$
18:56:29 Checkpoint started
18:56:29 Checkpoint finished, log reused
18:56:29 HTTP/WebDAV server online at 8897
18:56:29 Server online at 12201 (pid 24463)
18:56:30 ZeroConfig registration CLUSTER (VIRTUOSO-INSTANCE-LINUX-SMALL)
18:56:32 Host 1: Truncated 2pc log to items later than 0000-12-31 00:00:00Z
18:56:39 PL LOG: Installing Virtuoso Conductor version 1.00.8796 (DAV)
18:56:39 PL LOG: Installing with dependencies Virtuoso Conductor version 1.00.8796/2019-05-2$
18:56:39 Checkpoint started
18:56:39 Checkpoint finished, log reused
18:56:40 PL LOG: VAD_INSTALL: Can't create collection (/DAV/VAD/conductor) (42VAD)
18:56:40 PL LOG: Errors were detected during installation of "Virtuoso Conductor".
18:56:40 PL LOG: The installation of this VAD package has failed.
18:56:40 PL LOG: Please delete the transaction file
18:56:40 PL LOG: /root/virtuoso7/cluster_01/database.trx
18:56:40 PL LOG: and then restart your database server.
18:56:40 PL LOG: Note: Your database will be in its pre VAD installation
18:56:40 PL LOG: state after you restart.
18:57:00 PL LOG: Loader started
19:04:54 PL LOG:  File /usr/graphdb2//data_000001.ttl error 40001 TURTLE RDF loader, line 2$
19:04:54 PL LOG: No more files to load. Loader has finished,
19:12:19 PL LOG: Loader started
19:12:36 PL LOG:  File /usr/graphdb2//data_000004.ttl error 40001 TURTLE RDF loader,$
19:12:36 PL LOG: No more files to load. Loader has finished,

any idea to solve it.

Thanks

Looking at the errors in your Virtuoso log on startup it appears there has been a problem initialising the the scale out Cluster instance ie

What is the state of the cluster as reported by the output of the status('cluster_d'); command ?

What is your specific need for the Virtuoso scale out cluster ?

How many machines are the cluster nodes being installed on and what is the available pooled memory and CPUs across all the machine hosting the cluster ?

How many RDF triples are you seeking to host in Virtuoso ?

Hi hwilliams,

The cluster status is:

Cluster 4 nodes, 2 s. 2 m/s 0 KB/s  0% cpu 0%  read 0% clw threads 1r 0w 0i buffers 3881 28 d 0 w 0 pfs
cl 1: 1 m/s 0 KB/s  0% cpu 0%  read 0% clw threads 1r 0w 0i buffers 900 8 d 0 w 0 pfs
cl 2: 0 m/s 0 KB/s  0% cpu 0%  read 0% clw threads 0r 0w 0i buffers 655 7 d 0 w 0 pfs
cl 3: 0 m/s 0 KB/s  0% cpu 0%  read 0% clw threads 0r 0w 0i buffers 821 6 d 0 w 0 pfs
cl 4: 0 m/s 0 KB/s  0% cpu 0%  read 0% clw threads 0r 0w 0i buffers 1505 7 d 0 w 0 pfs

My needs for clusters, to get scalability for large databases and distribute my data over multi clusters or machines.

Im using now one machine and 2 vCPUs, 7.5 GB memory

RDF near 13,000,000

Thanks

Virtuoso 8.x has not been certified for use in Elastic scale-out cluster configuration; thus, if you really need to scale-out cluster, Virtuoso 7.2 should be used instead.

Even though the cluster appears to be online, the startup error in the log indicates a problem, which will most probably corrupt the database overtime.

How many triples are you expecting to host in Virtuoso? It typically requires 10GB RAM per billion triples, thus Virtuoso Single Server on an appropriately sized machine (in terms of memory and CPUs) generally suffices for most use cases …

Hi,
Thanks for that, but i installed clusters on ver 7.2 and it works well when i start it but when i run ./isql 12201, this error appears,
./isql: error while loading shared libraries: libtermcap.so.2: cannot open
shared object file: No such file or directory.
I tried it on more than one instance but the same error.

Thanks

What Linux distribution are you running on as it looks as if it is missing the Linux termcap library, which you should be able to install manually the OS package installer ?

Hi,
I used Debian GNU/Linux, but i used another version for Virtuoso 7.2 (64-bit glibc 2.12 x86_64) and it works well.

Thanks