Insert performance in Virtuoso

Bambus · February 13, 2019, 2:47pm

I need to insert 200 objects in Virtuoso for less than 30 ms. This is the structure of one object:

 <cim:Substation rdf:ID="_0425e670-fcbd-11e6-835d-f0def1611578">
<cim:IdentifiedObject.name>RTP Domžale</cim:IdentifiedObject.name>
   <cim:Substation.Region rdf:resource="#_052259c0-fcbb-11e6-835d-f0def1611578"/>
 </cim:Substation>

Is this possible to achieve with Virtuoso and if it is can you send me an example in Java Implementation?

TallTed · February 13, 2019, 3:06pm

As I said when you raised this initially – it is unclear what you need.

Do you need to insert 200 sets of data like the RDF/XML shown here within a total of 30ms? Or do you need each of those 200 sets of data to be inserted in less than 30ms, for a total of less than 6000ms?
Is your timer running on the end-user client-side? Is that end-user client application running on the same host as Virtuoso, or are they talking over the network? If the latter, is that network local or wide-area (e.g., Internet)? All communication factors are relevant to this kind of question…
Will all inserts be using the RDF/XML serialization of RDF?
Will you be inserting 200 such sets of data every time period (30 or 6000 ms), or do these inserts just need to be this fast whenever they happen?

The more specific requirements you can provide, within a big-picture sketch of what you’re trying to do, the better we can advise and assist toward achieving those goals.

Bambus · February 13, 2019, 3:28pm

I need to insert 200 sets of data like it is shown above, the whole need to be inserted under 30ms.
Timer is running on the same PC in java code.
All the inserts will use RDF/XML.
Inserts need to be this fast whenever they happen.

sif7en · February 14, 2019, 11:41am

HI. I’ m joining to this question with the user under alias Bambus.
The mentioned data set is a simple one and has to be inserted under 30ms. However we might have a single dataset with up to 2000 triplets, and the one hast to be inserted/extracted in under 30ms time as well.

Could you advice design (resource, conf) on a SIngle Virtuoso instance which is to store up to 0.5 Billion triplets and achive rates(throughput) as mentioned above. in another words we’re looking to achive a troughput with 70k operations in a second.

hwilliams · February 14, 2019, 3:37pm

@sif7en: As indicated previously, we’ve had customers achieve throughput rates of 30 - 40K triples per second for SPARUL (CRUD) operations, with commodity level hardware (2x Intel Xeon CPU E5-2630 running at 2.30GHz; 12 physical CPU cores; 24 logical cores from 2012) and the database tuned for optimum use with available memory. The latest generation Intel Xeon E7-8894 v4 processors claim to deliver up to 3.69x performance gains compared to the previous generation, and this should suffice. Although this would only be known through actual testing, as it is still going to dependent on the actual query work load and datasets in question as to how optimally the queries run.

sif7en · February 15, 2019, 6:57am

Thank you for your answer. Can you also provide us with some examples for optimal db memory tuning?

hwilliams · February 15, 2019, 1:23pm

We provide documentation on Virtuoso performance tuning including memory allocation settings based on available memory. Typically, Virtuoso requires 10GB RAM per billion triples, depending on how well the datasets can be compressed for storage, which in turn depends on how regular the datasets are (i.e., whether they have common predicates).