Virtuoso 08.03.3326 Database Upgrade Notes

hwilliams · September 29, 2022, 12:20pm

Virtuoso 08.03.3326 Upgrade Notes

The Virtuoso 08.03.3326+ engine has been enhanced to use 64-bit prefix IDs in RDF_IRI which allows for even larger databases.

This enhancement fixes two important problems in Virtuoso:

When Virtuoso was upgraded to use 64-bit IRI_IDs around v6.0.x, we forgot to upgrade the RDF_IRI table to use a 64-bit prefix ID. This meant that Virtuoso could only store around 2 billion distinct prefixes before returning an error.
The algorithm to generate distinct prefixes resulted in too many prefixes being created.

While this is not a problem for storing small or even medium sized data sets, this becomes a problem when you want to host large databases the size of Uniprot which now contains over 90 billion triples.

To compare sizes, here are some of the databases that OpenLink hosts, all of which were unaffected by this issue:

Live SPARQL Endpoint	triples	distinct prefixes
URIBurner	138,881,702	9,197,016
DBpedia	1,104,129,087	27,528,113
Wikidata	12,216,143,296	990,992
LOD Cloud Cache	35,875,699,899	175,697,066

When starting an existing 8.x database with the new 08.03.3326+ binary, the following message will appear in the virtuoso.log file:

NOTE: Your database is using 32-bit prefix IDs in RDF_IRI

    This Virtuoso engine has been upgraded to use 64-bit prefix IDs
    in RDF_IRI to allow for even larger databases.

    To take advantage of this new feature, your database needs to
    be upgraded.

    The performance of your existing database should not be affected,
    except when performing certain bulkload operations.

    Please contact OpenLink Support <support@openlinksw.com> for
    more information.

As stated in the message, the engine will use a backward compatibility function to handle existing databases without causing a performance degradation when running SPARQL queries, inserts and deletes.

Caveats

Bulkloading operations on an existing database using 32-bit prefix IDs will be restricted to use non-vectored functions. This will cause a drop in bulkload performance, so users who rely on this functionality should upgrade their database as soon as possible.
Bulkloading using the rdf_loader_run() functions also automatically will downgrade to use non-vectored functions.
Bulkloading of NQUAD datasets using the “with_delete” option fails to delete existing triples in graph that are removed from the dataset file(s) being loaded.
RDF Graph Replication of data will not work with a 64-bit prefix ID binary and 32-bit prefix ID database, thus the database must be upgrade to use a 08.03.3326+ 4-bit prefix ID binary.
Calling vectored functions like TTLP_V() and RDF_LOAD_RDFXML_V() will automatically call their non-vectored equivalents like TTLP() and RDF_LOAD_RDFXML().
Some functions may fail with the following error:

[42000] Can not use dpipe IRI operations before upgrading the RDF_IRI table t

Upgrade method 1

The preferred way of upgrading to the new 08.03.3326+ format is to perform an NQUAD dump of all your triples using the RDF_DUMP_NQUADS() function and bulk loading them into a new database.

Upgrade method 2

To upgrade an existing database in-place, make sure you have a proper backup of your existing database before performing the following commands:

set echo on; 
scheduler_interval(0);
backup '/dev/null'; -- make sure db is consistent
log_enable (2,0);

-- copy
create table DB.DBA.RDF_IRI_64 (RI_NAME varchar not null primary key, RI_ID IRI_ID_8 not null);
insert into DB.DBA.RDF_IRI_64 (RI_ID, RI_NAME) select RI_ID, __iri_name_id_64(RI_NAME) from DB.DBA.RDF_IRI;
checkpoint;

-- rename
drop table DB.DBA.RDF_IRI;
alter table DB.DBA.RDF_IRI_64 rename DB.DBA.RDF_IRI;
create unique index DB_DBA_RDF_IRI_UNQC_RI_ID on DB.DBA.RDF_IRI (RI_ID);

-- set db is upgraded
__dbf_set('rdf_rpid64_mode',1);
shutdown;

Note however that depending on the number of records in the DB.DBA.RDF_IRI table, this can take a long time and will increase the size of your database.