We are trying to load the full Wikidata (latest TTL dump) into a Virtuoso Open-Source (VOS) instance (version 18.104.22.168).
We encountered some problems when loading the first batch of files: the
latest-all.ttl file was divided into 82 smaller files of 200M triples each (around 7GB each file). When we started to run the bulk upload, the process terminated with the following problem:
/data/VOS/bulk-loading/latest-all_from_1_to_200000000.ttl **42000 RDFGE: RDF box with a geometry RDF type and a non-geometry content** /data/VOS/bulk-loading/latest-all_from_200000001_to_400000000.ttl **37000 [Vectorized Turtle loader] SP029: TURTLE RDF loader, line 1: Undefined namespace prefix at skos:prefLabel** /data/VOS/bulk-loading/latest-all_from_400000001_to_600000000.ttl **37000 [Vectorized Turtle loader] SP029: TURTLE RDF loader, line 1: syntax error** /data/VOS/bulk-loading/latest-all_from_600000001_to_800000000.ttl **37000 [Vectorized Turtle loader] SP029: TURTLE RDF loader, line 1: syntax error** /data/VOS/bulk-loading/latest-all_from_800000001_to_1000000000.ttl **37000 [Vectorized Turtle loader] SP029: TURTLE RDF loader, line 1: syntax error**
We found the following description on Stackoverflow from Peter F. Patel-Schneider. In it, Peter indicates that there’s a bug in Virtuoso related to the handling of geo-coordinates.
Peter goes explaining that one should change the VOS source code in order to correct this problem.
We thought that maybe this problem would have been fixed in a newer version of VOS. But, it’s in our understanding that the latest version of VOS is 22.214.171.124 (from 2018-08-15). And, hence, we are using it. Also, we haven’t found any other/further information related to a fix for this bug. The only close reference is an evaluation of using Virtuoso as an alternative to Blazegraph for Wikidata.
- Is the description of Peter on Stackoverflow correct?
- And if so, Peter mentions that “if one is loading the complete Wikidata dump one needs a machine with at least 256GB of main memory (maybe even at least 512GB)”. Would we need to have a main memory size of 512GB in order to run Wikidata in VOS? Or, how much main memory would be necessary to run it smoothly?
Having a local copy of Wikidata on our premises will help us greatly in several ongoing research projects.
We appreciate your kind attention and assistance. Looking forward to your reply.