Low performance while traversing a whole graph

Leandro_Tabares · June 2, 2021, 6:47am

Hello,

I’m using Virtuoso for an experiment on DBpedia dataset. For my experiment, I need to do many queries similar to:

SELECT DISTINCT ?s ?p ?o FROM http://dbpedia.org WHERE {?s ?p ?o} LIMIT 5000 OFFSET 45225000

where the offset increases. The goal is obtain all the triples of DBpedia in batches of 5000 triples each. I noticed that after 32 minutes of processing, Virtuoso was retrieving around 437500 triples per minute. However, after 6 hours the performance has decreased to 125000 triples per minute. I’m running Virtuoso on a node of a HPC cluster with 36 cores and 192 GB of RAM. Is there something that I can do to avoid this performance decrease? Please, find my virtuoso.ini file here.

hwilliams · June 2, 2021, 4:58pm

What is the Virtuoso version being used and build details which can be obtained by running either:

 ./virtuoso-t -?          for open source

 ./virtuoso-iodbc-t -?     for commercial

What interface are you connecting to Virtuoso with ie the /sparql endpoint, SQL (ODBC, JDBC, Jena, RDF4J) or other ?

If you connect to the Virtuoso server after the 6hr period when the rate slows down via the “isql” command line tool what does the output of running the command status(); return ? Ditto, what does the Linux top command report in terms of memory and CPU usage generally and by Virtuoso ?