2.3 million queries

Hi,

I need to launch 2.3 million queries against 10 billions triples stored in Virtuoso (512 RAM, 32 cores, SSD disks with RAID 0).

In virtuoso.ini, I have configured

[Parameters]
MaxClientConnections = 128

From the Java side. I have configured:

  • VirtuosoConnectionPoolDataSource.setMaxPoolSize(500)
  • ThreadPoolExecutor with 32 threads and an ArrayBlockingQueue = 500

The queries are simple, just 2 types.

select ?s ?p ?o {
?s ?p ?o.
filter (?s = ?param)
}

select ?s ?p ?o {
?s ?p ?o.
filter (?o = ?param)
}

It works fine but it takes 6 hours to get the data, is there a way to speed up the queries?

Best,

Hi Adam,

As per my Wikidata list response:

You need to increase the memory available to Virtuoso. If you are at your limits that’s when the Cluster Edition will come in handy i.e., enabling you build a large pool or memory from a sharded DB horizontally partitioning over of collection of commodity computers.

There is a public Google Spreadsheet covering a variety of public Virtuoso instances that should aid you in this process [1].

Links:

[1] https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5OITtrbFw/edit#gid=812792186