What
How to export RDF model data from Virtuoso’s Quad Store in NQuad and TTL formats.
Why
Every DBMS needs to offer a mechanism for bulk export and import of data.
When exporting RDF model data from Virtuoso’s Quad Store, having the ability to retain and reflect Named Graph IRI based data partitioning provides significant value to a variety of application profiles.
Virtuoso supports dumping and reloading graph model data (e.g., RDF), as well as relational data (e.g., SQL) (discussed elsewhere).
How
Virtuoso provides two procedures for dumping Quad Store data to RDF dumps
Dump One Graph (RDF_DUMP_GRAPH)
The RDF_DUMP_GRAPH
procedure can be used to dump a named graph in the Virtuoso Quad Store to RDF dataset files in TTL format.
Procedure Signature and Parameters
The procedure signature is :
RDF_DUMP_GRAPH (IN srcgraph VARCHAR, IN out_file VARCHAR, IN file_length_limit INTEGER := 1000000000)
-
IN srcgraph VARCHAR
– source graph -
IN out_file VARCHAR
– output file -
IN file_length_limit INTEGER
– maximum length of dump files
Usage Example
Call the dump_one_graph procedure with appropriate arguments:
$ pwd
/opt/virtuoso/database
$ grep DirsAllowed virtuoso.ini
DirsAllowed = ., ../vad,
$ /opt/virtuoso/bin/isql 1111
Connected to OpenLink Virtuoso
Driver: 08.03.3323 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> RDF_DUMP_GRAPH ('http://daas.openlinksw.com/data#', './data_', 1000000000);
Done. -- 1438 msec.
As a result, a dump of the graph http://daas.openlinksw.com/data# will be found in the files data_XX (located in your Virtuoso database
directory):
$ ls
data_000001.ttl
data_000002.ttl
....
data_000001.ttl.graph
Dump to NQuads (RDF_DUMP_NQUADS)
The dump procedure RDF_DUMP_NQUADS
leverages SPARQL to facilitate data dump(s) for ALL graphs excluding the internal predefined "virtrdf:
".
Procedure Signature and Parameters
The procedure signature is:
RDF_DUMP_NQUADS (IN dir VARCHAR := 'dumps', IN start_from INT := 1, IN file_length_limit INTEGER := 100000000, IN comp INT := 1)
-
IN dir VARCHAR
– folder where the dumps will be stored. Note: The dump directory must be included in theDirsAllowed
parameter of the Virtuoso configuration file (e.g.,virtuoso.ini
), or the Virtuoso server will not be able to create or access the data files. -
IN outstart_fromfile INTEGER
– output start from number n -
IN file_length_limit INTEGER
– maximum length of dump files -
IN comp INTEGER
– when set to 0, then no gzip will be done. By default is set to 1.
Usage Example
This example demonstrates calling the RDF_DUMP_NQUADS
procedure to dump all graphs to a series of compressed NQuad dumps, each with uncompressed length of 10Mb (./dumps/output000001.nq.gz
):
SQL> RDF_DUMP_NQUADS ('dumps', 1, 10000000, 1);
As a result, a dataset file dump of the graph ALL the graphs in the Virtuoso Quad Store can be found in the dumps
directory (located in your Virtuoso database
directory):
$ ls -ltr
total 12740
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-000001.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 132 Aug 24 10:33 rdf-dump-000002.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 118 Aug 24 10:33 rdf-dump-000003.nq.gz
.
.
.
-rw-r--r-- 1 ubuntu ubuntu 113 Aug 24 10:33 rdf-dump-003138.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 110 Aug 24 10:33 rdf-dump-003139.nq.gz
-rw-r--r-- 1 ubuntu ubuntu 119 Aug 24 10:33 rdf-dump-003140.nq.gz
$
Load datasets
Dumped dataset files can then loaded into another Virtuoso instance using the Virtuoso RDF Bulk Loader process.