Producing RDF dumps of Virtuoso Quad-store hosted RDF model data

What

How to export RDF model data from Virtuoso’s Quad Store in NQuad and TTL formats.

Why

Every DBMS needs to offer a mechanism for bulk export and import of data.

When exporting RDF model data from Virtuoso’s Quad Store, having the ability to retain and reflect Named Graph IRI based data partitioning provides significant value to a variety of application profiles.

Virtuoso supports dumping and reloading graph model data (e.g., RDF), as well as relational data (e.g., SQL) (discussed elsewhere).

How

Virtuoso provides two procedures for dumping Quad Store data to RDF dumps

Dump One Graph (RDF_DUMP_GRAPH)

The RDF_DUMP_GRAPH procedure can be used to dump a named graph in the Virtuoso Quad Store to RDF dataset files in TTL format.

Procedure Signature and Parameters

The procedure signature is :

RDF_DUMP_GRAPH (IN srcgraph VARCHAR, IN out_file VARCHAR, IN file_length_limit INTEGER := 1000000000)

  • IN srcgraph VARCHAR – source graph
  • IN out_file VARCHAR – output file
  • IN file_length_limit INTEGER – maximum length of dump files

Usage Example

Call the dump_one_graph procedure with appropriate arguments:

$ pwd 
/opt/virtuoso/database

$ grep DirsAllowed virtuoso.ini
DirsAllowed              = ., ../vad,

$ /opt/virtuoso/bin/isql 1111
Connected to OpenLink Virtuoso
Driver: 08.03.3323 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> RDF_DUMP_GRAPH ('http://daas.openlinksw.com/data#', './data_', 1000000000); 
Done. -- 1438 msec.

As a result, a dump of the graph http://daas.openlinksw.com/data# will be found in the files data_XX (located in your Virtuoso database directory):

$ ls
data_000001.ttl
data_000002.ttl
....
data_000001.ttl.graph

Dump to NQuads (RDF_DUMP_NQUADS)

The dump procedure RDF_DUMP_NQUADS leverages SPARQL to facilitate data dump(s) for ALL graphs excluding the internal predefined "virtrdf: ".

Procedure Signature and Parameters

The procedure signature is:

RDF_DUMP_NQUADS (IN dir VARCHAR := 'dumps', IN start_from INT := 1, IN file_length_limit INTEGER := 100000000, IN comp INT := 1)

  • IN dir VARCHAR – folder where the dumps will be stored. Note: The dump directory must be included in the DirsAllowed parameter of the Virtuoso configuration file (e.g., virtuoso.ini), or the Virtuoso server will not be able to create or access the data files.
  • IN outstart_fromfile INTEGER – output start from number n
  • IN file_length_limit INTEGER – maximum length of dump files
  • IN comp INTEGER – when set to 0, then no gzip will be done. By default is set to 1.

Usage Example

This example demonstrates calling the RDF_DUMP_NQUADS procedure to dump all graphs to a series of compressed NQuad dumps, each with uncompressed length of 10Mb (./dumps/output000001.nq.gz):

SQL> RDF_DUMP_NQUADS ('dumps', 1, 10000000, 1);

As a result, a dataset file dump of the graph ALL the graphs in the Virtuoso Quad Store can be found in the dumps directory (located in your Virtuoso database directory):

$ ls -ltr
total 12740
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-000001.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 132 Aug 24 10:33 rdf-dump-000002.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 118 Aug 24 10:33 rdf-dump-000003.nq.gz
.
.
.
-rw-r--r-- 1 ubuntu ubuntu 113 Aug 24 10:33 rdf-dump-003138.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 110 Aug 24 10:33 rdf-dump-003139.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-003140.nq.gz
$

Load datasets

Dumped dataset files can then loaded into another Virtuoso instance using the Virtuoso RDF Bulk Loader process.

Related