I need to insert millions triples into virtuoso as quick as possible.My goal is something like a R2RML implementation based on spark.
The data in RDB firstly are fetched and converted into Jena Triple Object, then VirtGraph send them into virtuoso, for example:
Node foo1 = Node.createURI("http://example.org/#foo1"); Node bar1 = Node.createURI("http://example.org/#bar1"); Node baz1 = Node.createURI("http://example.org/#baz1"); VirtGraph graph = new VirtGraph ("GraphName", url, "dba", "dba"); graph.add(new Triple(foo1, bar1, baz1));
However, the insert speed is not ideal enough, which is 400,000~500,000 triples per 5 minutes.Our hardware seems still sleeping, 70% CPU is in idle.
Our environment is:
Virtuoso: Virtuoso Open Source Edition v188.8.131.52
Jena Provider: VirtJena3
Data scale: 1 million rows per table in RDB be converted into 6 millions or more triples per graph.
Ideal speed: 1 million or more triples inserted per 5 minutes
And these are related parameters in virtuoso.ini I think:
[Parameters] MaxClientConnections = 10000 ServerThreads = 1000 ServerThreadSize = 500000 MainThreadSize = 1000000 ThreadCleanupInterval = 0 ThreadThreshold = 10 SingleCPU = 0 ThreadsPerQuery = 16 AsyncQueueMaxThreads = 10 NumberOfBuffers = 5450000 MaxDirtyBuffers = 4000000 ColumnStore = 1 [Flags] enable_mt_txn = 1 enable_mt_transact = 1 enable_qp = 1 qp_thread_min_usec = 100 mp_local_rc_sz = 0 dbf_explain_level = 0 enable_exact_p_stat = 1 hash_join_enable = 2 enable_g_in_sec = 1 qrc_tolerance = 40 dbf_max_itc_samples = 5
By the way, will it be more quicker that change RDB data into SPARQL Update String and use SPARQL protocol ?
Thank you very much!