What
How to export RDF model data from Virtuoso’s Quad Store in NQuad or TTL formats.
Why
Every DBMS needs to offer a mechanism for bulk export and import of data.
When exporting RDF model data from Virtuoso’s Quad Store, having the ability to retain and reflect Named Graph IRI based data partitioning provides significant value to a variety of application profiles.
Virtuoso supports dumping and reloading graph model data (e.g., RDF), as well as relational data (e.g., SQL) (discussed elsewhere).
How
Virtuoso provides a number procedures for dumping RDF Quad Store data to RDF dataset dumps, as detailed below.
RDF_DUMP_GRAPH()
The RDF_DUMP_GRAPH
procedure can be used to dump a named graph in the Virtuoso Quad Store to RDF dataset files in TTL format.
Procedure Signature and Parameters
The procedure signature is :
RDF_DUMP_GRAPH (IN srcgraph VARCHAR, IN out_file VARCHAR, IN file_length_limit INTEGER := 1000000000)
IN srcgraph VARCHAR
– source graphIN out_file VARCHAR
– output fileIN file_length_limit INTEGER
– maximum length of dump files
Usage Example
Call the dump_one_graph procedure with appropriate arguments:
$ pwd
/opt/virtuoso/database
$ grep DirsAllowed virtuoso.ini
DirsAllowed = ., ../vad,
$ /opt/virtuoso/bin/isql 1111
Connected to OpenLink Virtuoso
Driver: 08.03.3323 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> RDF_DUMP_GRAPH ('http://daas.openlinksw.com/data#', './data_', 1000000000);
Done. -- 1438 msec.
As a result, a dump of the graph http://daas.openlinksw.com/data# will be found in the files data_XX (located in your Virtuoso database
directory):
$ ls
data_000001.ttl
data_000002.ttl
....
data_000001.ttl.graph
RDF_DUMP_NQUADS
The dump procedure RDF_DUMP_NQUADS
procedure leverages SPARQL to facilitate data dump(s) for ALL graphs excluding the internal predefined "virtrdf:
".
Procedure Signature and Parameters
The procedure signature is:
RDF_DUMP_NQUADS (IN dir VARCHAR := 'dumps', IN start_from INT := 1, IN file_length_limit INTEGER := 100000000, IN comp INT := 1)
IN dir VARCHAR
– folder where the dumps will be stored. Note: The dump directory must be included in theDirsAllowed
parameter of the Virtuoso configuration file (e.g.,virtuoso.ini
), or the Virtuoso server will not be able to create or access the data files.IN outstart_fromfile INTEGER
– output start from number nIN file_length_limit INTEGER
– maximum length of dump filesIN comp INTEGER
– when set to 0, then no gzip will be done. By default is set to 1.
Usage Example
This example demonstrates calling the RDF_DUMP_NQUADS
procedure to dump all graphs to a series of compressed NQuad dumps, each with uncompressed length of 10Mb (./dumps/output000001.nq.gz
):
SQL> RDF_DUMP_NQUADS ('dumps', 1, 10000000, 1);
As a result, a dataset file dump of the graph ALL the graphs in the Virtuoso Quad Store can be found in the dumps
directory (located in your Virtuoso database
directory):
$ ls -ltr
total 12740
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-000001.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 132 Aug 24 10:33 rdf-dump-000002.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 118 Aug 24 10:33 rdf-dump-000003.nq.gz
.
.
.
-rw-r--r-- 1 ubuntu ubuntu 113 Aug 24 10:33 rdf-dump-003138.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 110 Aug 24 10:33 rdf-dump-003139.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-003140.nq.gz
$
RDF_DUMP_NQUADS_MT
Is a new NQuad dump procedure to enable the multi-threaded dumping of graphs, when there are many graphs to be dumped. This function also ensures all of graphs NQUADs are stored in the same file, such that blank node values are loaded at the same time to keep them consistent.
The function signature and parameters are:
RDF_DUMP_NQUADS_MT (IN n_threads INTEGER, IN dir VARCHAR := 'dumps', IN file_length_limit INTEGER := 100000000, IN comp INTEGER := 1, IN fix INTEGER := 1)
IN n_thread INTEGER
- number of threads used for the dump, typically equal to the number of available CPUsIN dir VARCHAR
– folder where the dumps will be stored. *Note: The dump directory must be included in theDirsAllowed
parameter of the Virtuoso configuration file (e.g.,virtuoso.ini
), or the Virtuoso server will not be able to create or access the data files.IN file_length_limit INTEGER
– maximum length of dump filesIN comp INTEGER
– when set to 0, then no gzip will be done. By default is set to 1.IN fix INTEGER
- internal fix enabled by default
Usage Example
This example demonstrates calling the RDF_DUMP_NQUADS_MT
procedure to dump all graphs to a series of compressed NQuad dumps, each with uncompressed length of 10Mb (./dumps/output000001.nq.gz
):
SQL> RDF_DUMP_NQUADS_MT(4, 100, 'dumps', 100000000, 1, 1);
As a result, a dataset file dump of the graph ALL the graphs in the Virtuoso Quad Store can be found in the dumps
directory (located in your Virtuoso database
directory):
$ ls -ltr
total 12740
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-000001.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 132 Aug 24 10:33 rdf-dump-000002.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 118 Aug 24 10:33 rdf-dump-000003.nq.gz
.
.
.
-rw-r--r-- 1 ubuntu ubuntu 113 Aug 24 10:33 rdf-dump-003138.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 110 Aug 24 10:33 rdf-dump-003139.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-003140.nq.gz
$
RDF_DUMP_NQUADS_MT2
The RDF_DUMP_NQUADS_MT has a 2M (million) limit on the size of a vector() for dumping the RDF store with large numbers of graphs. In which case the RDF_DUMP_NQUADS_MT2()
procedure can be used to get around this limit, although it does run slower. The function signature and parameters:
RDF_DUMP_NQUADS_MT2 (IN n_threads INTEGER, IN n_per_slice INTEGER, IN dir VARCHAR := 'dumps', IN file_length_limit INTEGER := 100000000, IN comp INTEGER := 1, IN fix INTEGER := 1)
IN n_thread INTEGER
- number of threads used for the dump, typically equal to the number of available CPUsIN n_per_slice INTEGER
- maximum number of graphs per dataset fileIN dir VARCHAR
– folder where the dumps will be stored. *Note: The dump directory must be included in theDirsAllowed
parameter of the Virtuoso configuration file (e.g.,virtuoso.ini
), or the Virtuoso server will not be able to create or access the data files.IN file_length_limit INTEGER
– maximum length of dump filesIN comp INTEGER
– when set to 0, then no gzip will be done. By default is set to 1.IN fix INTEGER
- internal fix enabled by default
The parameters are same as for RDF_DUMP_NQUADS_MT()
, except RDF_DUMP_NQUADS_MT2()
takes second argument indicating the maximum number of graphs that can be grouped together in the same dataset file.
Usage Example
This example demonstrates calling the RDF_DUMP_NQUADS_MT2
procedure to dump all graphs to a series of compressed NQuad dumps, each with uncompressed length of 10Mb (./dumps/output000001.nq.gz
):
SQL> RDF_DUMP_NQUADS_MT2(4, 100, 'dumps', 100000000, 1, 1);
As a result, a dataset file dump of the graph ALL the graphs in the Virtuoso Quad Store can be found in the dumps
directory (located in your Virtuoso database
directory):
$ ls -ltr
total 12740
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-000001.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 132 Aug 24 10:33 rdf-dump-000002.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 118 Aug 24 10:33 rdf-dump-000003.nq.gz
.
.
.
-rw-r--r-- 1 ubuntu ubuntu 113 Aug 24 10:33 rdf-dump-003138.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 110 Aug 24 10:33 rdf-dump-003139.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-003140.nq.gz
$
Note The RDF_DUMP_NQUADS_MT...
functions are only available in Virtuoso 8.x by default, but if using Virtuoso 7 (open source or commercial) this
dump_nquads_mt.sql.zip (1.5 KB)
can be loaded manually for use.
Load datasets
Dumped dataset files can then loaded into another Virtuoso instance using the Virtuoso RDF Bulk Loader process.