I installed Virtuoso with Docker and imported all the triples of DBpedia, 1 billion triples, Virtuoso’s disk space(virtuoso.db) takes 66 GB .
But several days later, virtuoso.db takes 93 GB, I’m sure I didn’t add more triples . Virtuoso’s disk space(virtuoso.db) automatically increase?
why is this? Can I control it not to take up more disk space?
Here’s some information that may help :
virtuoso version : Virtuoso version 07.20.3230 on Linux (x86_64-generic_glibc25-linux-gnu), Single Server Edition .
Virtuoso has a very powerful freetext index which can be used to efficiently find triples within the quad store.
?s ?p ?o .
?o bif:contains 'Berlin'.
Click here to run this example on our live DBpedia instance
This index is not build during the bulkloading process, but is generated by a background task that is started by the scheduler after the database has been restarted.
Thanks very much. I know the reason of [ Virtuoso’s disk space(virtuoso.db) didn’t decrease ].
I used isql command [ DB.DBA.VT_BATCH_UPDATE (‘DB.DBA.RDF_OBJ’, ‘OFF’, null); ] and I’m going to keep looking to see if my virtuoso’s disk space(virtuoso.db) still keeps increasing.
By the way, Can you tell me how large it’s going to get If I keep full text index?
The official DBpedia endpoint at OpenLink Virtuoso SPARQL Query Editor is about 90 GB.
It your situation it depends on exactly which databus set you loaded, if you loaded additional datasets beyond what we load in the offical endpoint etc.
Here is some information from our endpoint:
$ isql 1111
File size 2717908992, 11341824 pages, 3774183 free.
4000000 buffers, 3816209 used, 250233 dirty 1 wired down, repl age 6340052 0 w. io 1 w/crsr.
If you look at the number of pages, then the db size in bytes is
pages * 8192 = 11341824 * 8192 = 92912222208 bytes = 86.5 GB
Note there is a lot of free space (3774183 pages) in the database, so at some point the database growth will level off as it will reuse these pages.
You can perform the following command to see how many records still need to be indexes:
SQL> select count(1) from vtlog_db_dba_rdf_obj;
1 Rows. -- 0 msec.
If it is 0 like here, then virtuoso has probably already completed the freetext index.
On our develop server which is a machine with 24 cores and 370G memory, this index process only takes a couple of hours.
Thanks very much. I confirmed that my indexes were all executed by using isql command ’ select count(1) from vtlog_db_dba_rdf_obj; '.