I am new with Virtuoso so I faced too much problems while reading their chaotic documentation and implementing some code in Java.
Finally I managed to make some RDF Loader but I can see on their documentation there is special bulk loader which I do not know how to implement in Java.
Here is my code which is working fine:
VirtDataset virtuosoDataset = new VirtDataset(virtuosoLocation, virtuosoUser, virtuosoPass);
virtuosoDataset.begin(ReadWrite.WRITE);
Model model = virtuosoDataset.getDefaultModel();
model.read(reader, ConfigurationFile.getProperty(JENAXMLBASE));
virtuosoDataset.commit();
virtuosoDataset.close();
When I use get method over the Virtuoso I can get the objects that are inserted.
The best way to bulk load RDF data into Virtuoso is via the SQL Stored Procedures provided in the Bulk Loader Documentation.
Conceptually, you are performing the following steps (via the ISQL command-line or variant in the HTML-based Conductor UI):
Identify the location of RDF documents to be uploaded – this MUST be a folder (directory) listed as a value of the “DirsAllowed” INI setting
Determine a destination Named Graph IRI that identifies the internal (Virtuoso hosted) target document into which you plan this RDF data
Direct the Bulk Loader to the source directory comprising RDF documents that you want to schedule for loading, an RDF document selection pattern or specific name (if a single file), and a target Named Graph IRI
Verify that the Bulk Loader successfully registered the target RDF documents
Run the Bulk Loader
Verify that the Bulk Loading process was successful
Given the following –
a directory named /tmp/data/
RDF-Turtle documents with the extension .ttl have been copied to the directory above
Virtuoso Named Graph IRI file:tmp:data is the target of the bulk upload
– here are the steps you would perform steps using the following “;” separated SQL Stored Procedure calls:
LD_DIR ('/tmp/data/', '*.ttl', 'file:tmp:data') ;
SELECT * from DB.DBA.load_list ; – where ll_state value 0 indicates an item hasn’t been loaded
RDF_LOADER_RUN() ;
SPARQL SELECT COUNT (*) FROM <file:tmp:data> WHERE {?s ?p ?o} ;
SPARQL SELECT SAMPLE (?s) as ?sample COUNT (*) ?o FROM <file:tmp:data> WHERE {?s a ?o} GROUP BY ?o; – to get a quick analysis of entities and entity types associated with the Named Graph <file:tmp:data>
Example (everything happens via EXEC()): isql 1111 dba dba "EXEC=LD_DIR ('/tmp/data/', '*.ttl', 'file:tmp:data')"; SELECT * from DB.DBA.load_list ; RDF_LOADER_RUN() ; SPARQL SELECT SAMPLE (?s) as ?sample COUNT (*) ?o FROM <file:tmp:data> WHERE {?s a ?o} GROUP BY ?o;"
If i set BatchSize and remove tranasctions the loading will be faster?
I am using Jena Model, so i need to change it to VirtModel which needs VirtGraph and not VirtDataset like it is in my code and then i should set the BatchSize and the loader will be fine configured?
About ISQL command line, i wont use that in my Java code… I thought there is some way to use it in Java code without calling the ISQL…
Nobody answered my second question about Virtuoso GUI… I still can not find where can i see the imported model in localhost:8890/conductor
Virtuoso is an RDF Quad Store so your “Model” ultimately equates to an RDF Graph in Virtuoso, which can be seen from the Linked Data → Graphs → Graphs tab of the Conductor …
and clear is fine, it removes the object’s from Virtuoso but the virtuoso.db file is still 3GB? Is there some other way to remove the objects from Virtuoso and also clear the virtuoso.db file ?
Also i want to ask what is the average time of creating one object in Virtuoso, because i need to create 200 objects in Virtuoso for less than 30 ms.
The “transaction aborted” message results from an INI setting being too low for your current bulk-loading activity – but note that the current setting may be appropriate for later ongoing activity. See the manual’s discussion of TransactionAfterImageLimit –
TransactionAfterImageLimit = N Bytes default 50000000. When the roll-forward log entry of a transaction exceeds this size, the transaction is too large and is marked as uncommittable. This work as upper limit otherwise infinite (transactions). The default is 50MB. Also note that transaction roll-back data takes about 2x the space of roll-forward data. Hence when the transaction roll-forward data is 50MB, the total transient consumption is closer to 150 MB.
Deleting data from the DB does not immediately free the disk space previously occupied by that data. Virtuoso has an auto-compaction feature which will eventually free space. New data will be loaded into the space previously occupied by deleted data, but of course this reuse will be imperfect. You may be able to free some space by running a CHECKPOINT; or the DB..VACUUM (); procedure (depending on your workflow, it may make sense to run CHECKPOINT; and DB..VACUUM (); twice or three times in a cycle). You can also do a backup-dump-and-reload to immediately reclaim disk – though you must temporarily consume substantially more disk space during this process.
I would encourage you to describe what you’re really trying to achieve – both starting points and desired end results – and to define your terms as you go.
You have expressed a wish to “create 200 objects in Virtuoso for less than 30 ms” (by which I think you mean “≤ 30 ms total to create 200 objects”). However, it is not clear if those “objects” would be triples of a few dozen or hundred bytes each, for which your wish should be easily granted, or if those “objects” are graphs of thousands or millions of triples and multiple GB each, for which your wish might not be so easily granted, if at all, especially if everything is moving over the network and not moving between disk and RAM and disk on a single box…
As is true of all software, the speed of all Virtuoso activity is impacted by infrastructure between action points, and by the resources Virtuoso has to work with on its local host(s). Proper tuning matters a great deal – such as ensuring that Virtuoso knows how much memory it should use for active work, and leaving enough free RAM and disk for task-specific activities.
You might benefit by exploring some of the benchmarks in active use –
I want to ask another question. By default now i am using Quad Store DB. Is there an option to save RDF triplets with Sparql query in SQL database. I saw some converter in the documentation but i did not found some example in Java. I can make connection with virtuoso.jdbc4.Driver to the DB but i dont know how can Triplets be saved in SQL (should I create seperate tables and etc. or Virtuoso will handle that just with changing the connection driver).
I am glad to hear that adjusting the TransactionAfterImageLimit resolved your data load issue.
For many reasons, it is best to keep each topic focused on one question or issue. Your new questions (about insertion speeds, and about mixing RDF Graphs with SQL Tables) would each be better raised in a new thread/question/topic.
Note – you can avoid the SR325: Transaction aborted because its log after image size went above the limit error and having to possibly increase the TransactionAfterImageLimit INI file param with the log_enable() function (available through the DEFINE sql:log-enable N SPARQL pragma) which turns auto-commit behaviour on (level=2) and also transaction logging off (level=3).