Virtuoso VoID Graph Generation

What

RDF (Resource Description Framework) is a standard model for data interchange on the Web. VoID (Vocabulary of Interlinked Datasets) is an RDF schema to describe linked datasets. A VoID Graph in Virtuoso is a specific implementation of this schema in the Virtuoso Universal Server, a middleware and database engine hybrid that combines the functionality of a traditional RDBMS and ORDBMS.

Why

Creating an RDF VoID Graph in Virtuoso allows you to describe the metadata of your RDF datasets. This includes details about the dataset’s creators, the data it contains, and its licensing information. It also allows you to describe the linkage between datasets, which is crucial for navigating the Web of Data. This metadata can be used by data consumers to discover, select, and interpret datasets.

How

The following methods can be used for generating VoID for graph(s) in the Virtuoso RDF Quad Store.

Generation of VoID statistics for SINGLE graph in Quad Store

The DB.DBA.RDF_VOID_STORE (in graph varchar ,
in to_graph_name varchar ) procedure collects statistics for the first given graph and saves the result into the second graph. For correct results, the first graph should be loaded in the database before executing the function:

SQL>DB.DBA.RDF_VOID_STORE('http://www.openlinksw.com/blog/~kidehen','http://example.com');
Done. -- 201 msec.

SQL>sparql
select *
from <http://example.com>
where {?s ?p ?o}
limit 30;

s                                                                                 p              o
VARCHAR                                                                           VARCHAR        VARCHAR
_______________________________________________________________________________

http://www.openlinksw.com/blog/~kidehen#Dataset                                   http://www.w3.org/1999/02/22-rdf-syntax-ns#type      http://rdfs.org/ns/void#Dataset
http://www.openlinksw.com/blog/~kidehen#Dataset                                   http://www.w3.org/2000/01/rdf-schema#seeAlso         http://www.openlinksw.com/blog/~kidehen
http://www.openlinksw.com/blog/~kidehen#Dataset 
...
http://www.openlinksw.com/blog/~kidehen#sameAsStat                                http://www.w3.org/1999/02/22-rdf-syntax-ns#value     2
http://www.openlinksw.com/blog/~kidehen#sameAsStat                                http://purl.org/NET/scovo#dimension                  http://www.openlinksw.com/blog/~kidehen#sameAsType
http://www.openlinksw.com/blog/~kidehen#sameAsType                                http://www.w3.org/1999/02/22-rdf-syntax-ns#type      http://www.openlinksw.com/blog/~kidehen#TypeOfLink

30 Rows. -- 340 msec.
SQL>

Alternatively the VoID statistics for a single graph can also be be written to a turtle dataset file and then loaded into Virtuoso with the following commands:

string_to_file ('{file-name.ttl}', RDF_VOID_GEN ('{graph-name}'), -2);
ttlp (file_open ('{file-name.ttl}'), '{graph-name-void}'', '{graph-name-void}', 255);

Generation of VoID statistics for ALL graphs in Quad Store

Run the following commands to generated a TTL dataset file for the VoID statistics for all graphs in the Virtuoso RDF Quad Store:

  • Collect all graphs via DB.DBA.void_distinct_graphs()
  • Insert data into DB.DBA.RDF_VOID_GRAPH_MEMBER RDF system SQL table with query:
insert into DB.DBA.RDF_VOID_GRAPH_MEMBER select RVG_IID, RVG_IID from RDF_VOID_GRAPH;
  • Run the DB.DBA.RDF_VOID_ALL_GEN() procedure to populate turtle (TTL) dataset file with collated VOID statistics for all graphs:
 string_to_file ('void_file_name.ttl',DB.DBA.RDF_VOID_ALL_GEN('%', 1),-2);
  • The turtle dataset file can then be loaded into a graph name of choice in the Virtuoso RDF Quad store with the TTLP command:
TTLP (file_open ('void_file_name.ttl'), '{graph-name}', '{graph-name}', 255);

For example:

SQL> DB.DBA.void_distinct_graphs();

Done. -- 70 msec.
SQL> insert into DB.DBA.RDF_VOID_GRAPH_MEMBER select RVG_IID, RVG_IID from RDF_VOID_GRAPH;

Done. -- 1 msec.
SQL> string_to_file ('void_file_name.ttl',DB.DBA.RDF_VOID_ALL_GEN('%', 1),-2);

Done. -- 1050 msec.
SQL> quit;
ubuntu@ip-172-30-0-77:~/vos/virtuoso-opensource/database$ ls -ltr void_file_name.ttl 
-rw-r--r-- 1 ubuntu ubuntu 5439 Jan 12 19:03 void_file_name.ttl
ubuntu@ip-172-30-0-77:~/vos/virtuoso-opensource/database$ more void_file_name.ttl 
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix void: <http://rdfs.org/ns/void#> .

@prefix ns1: <%/> .
ns1:Dataset a void:Dataset ; 
 void:sparqlEndpoint <http://hfw.openlinksw.com:8890/sparql> . 
@prefix ns2: <http://www.openlinksw.com/schemas/virtrdf/> .
ns2:Dataset a void:Dataset . 
ns1:Dataset void:subset ns2:Dataset . 
@prefix ns3: <%/http://www.openlinksw.com/schemas/virtrdf/> .

ns3:Dataset a void:Dataset ; 
 rdfs:seeAlso <http://www.openlinksw.com/schemas/virtrdf#> ; 
 void:sparqlEndpoint <http://hfw.openlinksw.com:8890/sparql> ; 
 void:triples 2479 ; 
 void:classes 12 ; 
 void:entities 270 ; 
 void:distinctSubjects 287 ; 
 void:properties 92 ; 
 void:distinctObjects 726 . 

ns2:Dataset void:subset ns3:Dataset . 
ns2:Dataset void:statItem ns2:Stat . 
ns2:Stat a scovo:Item ; 
 rdf:value 2479 ; 
 scovo:dimension void:numOfTriples . 

@prefix ns4: <http://www.w3.org/ns/ldp/> .
ns4:Dataset a void:Dataset . 
ns1:Dataset void:subset ns4:Dataset . 
@prefix ns5: <%/http://www.w3.org/ns/ldp/> .

ns5:Dataset a void:Dataset ; 
 rdfs:seeAlso <http://www.w3.org/ns/ldp#> ; 
 void:sparqlEndpoint <http://hfw.openlinksw.com:8890/sparql> ; 
 void:triples 3 ; 
 void:classes 0 ; 
 void:entities 0 ; 
 void:distinctSubjects 3 ; 
 void:properties 1 ; 
 void:distinctObjects 1 . 
...
ns1:Dataset void:statItem ns1:Stat . 
ns1:Stat a scovo:Item ; 
 rdf:value 665400 ; 
 scovo:dimension void:numOfTriples . 
ubuntu@ip-172-30-0-77:~/vos/virtuoso-opensource/database$

SQL ttlp (file_open ('void_file_name.ttl'), 'http://graphs/void/', 'http://graphs/void/', 255);

Done. -- 70 msec.
SQL>

Generation of VoID statistics for Multiple graphs in Quad Store

Creating a VoID graph of multiple graphs in Virtuoso involves generating VoID descriptions for each of the RDF datasets and then combining them, and can be done as follows:

  1. Generate VoID Descriptions for Each Dataset: Use the DB.DBA.RDF_VOID_STORE function to generate VoID descriptions for each of your RDF datasets. This function takes two arguments: the graph IRI and a base IRI for the generated VoID URIs. For example: DB.DBA.RDF_VOID_STORE('<http://example.com/graph1>', '<http://example.com/void1>'); Repeat this for each dataset.
  2. Combine the VoID Descriptions: To create a VoID graph of multiple graphs, you need to combine the VoID descriptions generated in the previous step. This can be done by creating a new graph that includes all the individual VoID graphs. For example: SPARQL CREATE SILENT GRAPH <http://example.com/void>; followed by SPARQL ADD <http://example.com/void1> TO <http://example.com/void>; Repeat the ADD command for each VoID graph.
  3. Query the Combined VoID Graph: You can now query the combined VoID graph to retrieve metadata about all your datasets. For example: SPARQL SELECT * FROM <http://example.com/void> WHERE { ?s ?p ?o };

Please note that these are basic steps and might need to be adjusted based on your specific setup and requirements.

For example:

SQL> DB.DBA.RDF_VOID_STORE('http://dataset.org/g1/', 'http://example.com/void1');

Done. -- 4 msec.
SQL> DB.DBA.RDF_VOID_STORE('http://dataset.org/g2/', 'http://example.com/void2');

Done. -- 4 msec.
SQL> SPARQL ADD <http://example.com/void1> TO <http://example.com/void>;

Done. -- 2 msec.
SQL> SPARQL SELECT * FROM <http://example.com/void> WHERE {?s ?p ?o};
s                                                                                 p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://dataset.org/g1/Dataset                                                     http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://rdfs.org/ns/void#Dataset
http://dataset.org/g1/Dataset                                                     http://www.w3.org/2000/01/rdf-schema#seeAlso                                      http://dataset.org/g1/
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#classes                                                   0
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#distinctObjects                                           1
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#distinctSubjects                                          1
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#entities                                                  0
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#properties                                                1
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#sparqlEndpoint                                            http://hfw.openlinksw.com:8890/sparql
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#triples                                                   1

9 Rows. -- 1 msec.
SQL> SPARQL ADD <http://example.com/void2> TO <http://example.com/void>;

Done. -- 1 msec.
SQL> SPARQL SELECT * FROM <http://example.com/void> WHERE {?s ?p ?o};
s                                                                                 p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://dataset.org/g1/Dataset                                                     http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://rdfs.org/ns/void#Dataset
http://dataset.org/g2/Dataset                                                     http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://rdfs.org/ns/void#Dataset
http://dataset.org/g1/Dataset                                                     http://www.w3.org/2000/01/rdf-schema#seeAlso                                      http://dataset.org/g1/
http://dataset.org/g2/Dataset                                                     http://www.w3.org/2000/01/rdf-schema#seeAlso                                      http://dataset.org/g2/
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#classes                                                   0
http://dataset.org/g2/Dataset                                                     http://rdfs.org/ns/void#classes                                                   0
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#distinctObjects                                           1
http://dataset.org/g2/Dataset                                                     http://rdfs.org/ns/void#distinctObjects                                           1
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#distinctSubjects                                          1
http://dataset.org/g2/Dataset                                                     http://rdfs.org/ns/void#distinctSubjects                                          1
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#entities                                                  0
http://dataset.org/g2/Dataset                                                     http://rdfs.org/ns/void#entities                                                  0
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#properties                                                1
http://dataset.org/g2/Dataset                                                     http://rdfs.org/ns/void#properties                                                1
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#sparqlEndpoint                                            http://hfw.openlinksw.com:8890/sparql
http://dataset.org/g2/Dataset                                                     http://rdfs.org/ns/void#sparqlEndpoint                                            http://hfw.openlinksw.com:8890/sparql
http://dataset.org/g1/Dataset                                                     http://rdfs.org/ns/void#triples                                                   1
http://dataset.org/g2/Dataset                                                     http://rdfs.org/ns/void#triples                                                   2

18 Rows. -- 1 msec.
SQL>

Related

I’d like to know what the void:entities property describes, because it is to be the count of any URIs that satisfy the pattern void:uriRegexPattern [1] whereas VoID data generated by Virtuoso don’t have this property.
[1] Describing Linked Datasets with the VoID Vocabulary