OOM issue during bulkload in kubernetes but not on computer

P-Ehlert · March 31, 2025, 12:37pm

We are using the Docker image openlink/virtuoso-opensource-7:7.2 and are trying to load about 60 million items in Virtuoso that are split over about 2000 datasets. We load this per dataset using 1 thread with a maximum of 10,000 items per batch. For this we use the following script:

log_enable(2);
SPARQL CLEAR GRAPH  <http://myorganization.org/dataset/##DATASET_ID##_new>;
delete from DB.DBA.load_list;
ld_dir ('##IMPORT_FOLDER##', '##TTL_FILENAME##.ttl.gz', 'http://myorganization.org/dataset/##DATASET_ID##');
rdf_loader_run();
log_enable(1);
sparql select 'Result triples: ', count(*) FROM <http://myorganization.org/dataset/##DATASET_ID##> WHERE {?s ?p ?o};

Once a dataset is fully loaded we run a script to rename it from <dataset_id>_new to <dataset_id>.

This is working fine on my computer. I set a limit of 8GB RAM in the Docker image and I see RAM usage going up to about 5 to 5.5 GB during the bulkload and then it stabilized between 4.5 and 5.5 GB. However, when I deploy this in a kubernetes cluster with a 8GB RAM limit then Virtuoso continues to use more and more RAM until the pod runs out of memory and is restarted. What could cause this difference between kubernetes and running it on a computer? I first thought the problem was that we need to add a checkpoint in the bulkload script but adding this didn’t help. Also reducing the batch size to 1,000 items didn’t help.

While inspecting RAM usage with top in the k8s container I did see that top thinks that 7.9GB is about 25% of the available RAM, so could it be that Virtuoso also somehow doesn’t see the 8GB RAM limit and thinks there is 32G available?

hwilliams · March 31, 2025, 2:20pm

What does 60 million items equate to in terms of the number of RDF triples being bulk loaded ?

When you say kubernetes cluster with a 8GB RAM limit then Virtuoso continues to use more and more RAM until the pod runs out of memory and is restarted. How is the memory usage being measured ? As if Virtuoso consumes 4.5 - 5.5 GB RAM on a physical machine it should consume the same memory on Kubernetes or other deployment methods, using the same Virtuoso configuration.

What does the output of running the Virtuoso status(); command during the bulk load process, ideally just before the OOM error, return ?

Can you provide the virtuoso.ini and virtuoso.log files for review ?

P-Ehlert · April 1, 2025, 8:19am

Thank you for your quick response.

We estimate that 60 million items means about 4.5 to 5 billion triples.

RAM usage in kubernetes is measured by our cluster’s Grafana monitoring tool and I verified its numbers by running the unix command top just before it ran out of memory. Top confirmed that 7.9 GB RAM was used just before the pod was killed because of OOM. I double checked if the correct RAM limit was set multiple times and our RAM limit configuration is fine. We let the pod run for several days and the pod would run out of memory and be restarted every 9-15 hours (see below).

I don’t see the option to attach a file to this post, so I uploaded our virtuoso.ini somewhere else. It’s available here: virtuoso.ini. The only changes we made to the default virtuoso settings are

ENV VIRT_Parameters_DirsAllowed=.,../vad,/database/tmp-ingest
ENV VIRT_Parameters_NumberOfBuffers=340000
ENV VIRT_Parameters_MaxDirtyBuffers=250000

I also uploaded a file with the status() command output and the virtuoso.log file after starting with an empty Virtuoso database and ingesting data for about an hour. RAM usage was about 4.1GB at that point. I need to check back in several hours to see if I can do the same just before it runs out of memory again.
status() output
virtuoso.log

hwilliams · April 1, 2025, 12:01pm

With ENV VIRT_Parameters_NumberOfBuffers=340000 that is the setting for a machine with 4GB RAM.

But if as indicated you dataset size is 4.5 to 5 billion triples, for which Virtuoso would on average require 10GB RAM per billion triples for hosting in memory for best performance, you should ideally have about 50GB RAM for hosting this dataset. With NumberOfBuffers set accordingly to something like 4000000 (& MaxDirtyBuffers = 3000000).

I suspect if you continue running the status() command the buffer usage ie 340000 buffers, 142771 used from the output provided will soon show that ALL the buffers are used and the load rate will significantly slow as the system starts swapping between memory and disk.

P-Ehlert · April 1, 2025, 2:19pm

Thanks again for your reply. I am aware that our buffer settings are for 4GB RAM. Our normal RAM limit after loading is 4GB and only during loading do we increase to 8GB. We have one installation where we loaded almost all data (generated with a lot of kubernetes pod restarts) and there 4GB works fine for most queries. We may increase our current RAM limit later, but we’ll probably won’t be able to go to more than 12-15 GB unfortunately.

I checked the status of the loading process after it ran for about 7 hours today. At this time we loaded around 310.5M triples, memory usage was around 6.2 GB and the status() command returned:

OpenLink Virtuoso  Server
Version 07.20.3240-pthreads for Linux as of Nov 11 2024 (ffed4676d)
Started on: 2025-04-01 07:03 GMT+0 (up 07:10)
CPU: 100.05% RSS: 6015MB VSZ: 7321MB PF: 2
 
Database Status:
  File size 7933526016, 968448 pages, 259569 free.
  340000 buffers, 312908 used, 19635 dirty 15 wired down, repl age 8437689 0 w. io 0 w/crsr.
  Disk Usage: 169406 reads avg 0 msec, 0% r 0% w last  3882 s, 37578925 writes flush      50.26 MB/s,
    2562 read ahead, batch = 60.  Autocompact 1165579 in 912717 out, 21% saved col ac: 15645255 in 8% saved.
Gate:  1676 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap. 
Log = ../database/virtuoso.trx, 185 bytes
687234 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 595 connects, max 2 concurrent
RPC: 9221 calls, 2 pending, 2 max until now, 0 queued, 484 burst reads (5%), 0 second 330M large, 614M max
Checkpoint Remap 2000 pages, 0 mapped back. 3065 s atomic time.
    DB master 968448 total 259554 free 2000 remap 48 mapped back
   temp  256 total 251 free
 
Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
   Currently 6 threads running 0 threads waiting 1 threads in vdb.
Pending:
 
Client 1111:595:  Account: dba, 207 bytes in, 288 bytes out, 1 stmts.
PID: 816, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks: 
 
Client 1111:594:  Account: dba, 859 bytes in, 543 bytes out, 1 stmts.
PID: 815, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks: 
 
 
Running Statements:
 Time (msec) Text
       11433 rdf_loader_run()
         190 status()
 
 
Hash indexes
 

44 Rows. -- 190 msec.

FWIW when I checked earlier (around 6 hours) the buffers were at:

  File size 6855589888, 836864 pages, 210999 free.
  340000 buffers, 320603 used, 31890 dirty 5 wired down, repl age 7533925 0 w. io 0 w/crsr.

I understand that the low buffers will reduce performance, but that’s not directly related to the OOM, or is it? I vaguely remember we did try loading with the 8GB buffer settings last week, but IIRC that had little effect. However, I can retry this tomorrow to be sure.

Unfortunately I won’t be around to run status() again when the pod is almost out of memory, but I can upload the virtuoso.log file tomorrow morning. So far I don’t see much in the logs other than a lot of

14:15:29 Scheduler events are disabled.
14:15:29 PL LOG: Loader started
14:15:41 PL LOG: No more files to load. Loader has finished,
14:16:05 Checkpoint started
14:16:16 Checkpoint finished, log reused

and occasionally a line like

14:22:07 Checkpoint removed 555 MB of remapped pages, leaving 15 MB. Duration     5.441 s.  To save this time, increase MaxCheckpointRemap and/or set Unremap quota to 0 in ini file.

for example

hwilliams · April 1, 2025, 5:29pm

The 'status()` output is showing ALL the buffers are being consume and so will reduce performance, which can only be mitigated against to a degree if you have fast storage devices to minimise the swap time, but still performance will be degraded.

What are there continuous bulk load start and stop operations last tine a few minutes in the logs file ? ie

08:13:45 Checkpoint is disabled.
08:13:45 Scheduler events are disabled.
08:13:45 PL LOG: Loader started
08:14:11 PL LOG: No more files to load. Loader has finished,
08:14:25 Checkpoint started
08:14:30 Checkpoint finished, log reused

How exactly are you running the bulk loader, as I would expect there to be only one start and end of the bulk load, when all the data has been loaded ? But it almost seems as if you are bulk loading one or a few files at a time, and restarting the process from scratch until all data is loaded, whereas the whole point of the bulk loader is that you load all file at once, with multiple rdf_loader_run() to optimum load times, as detailed in the RDF bulk loader docs.

P-Ehlert · April 2, 2025, 8:44am

Our data is organized in data sets ranging from 10 items up to 600,000 items in size. We load the data per data set because that’s most convenient for us and because we want to do updates per set later.

The uploading is done using the script I listed in my first post. We first upload a set as http://myorganization.org/dataset/<datasetid_goes_here>_new, then check if there is a data set named http://myorganization.org/dataset/<datasetid_goes_here>. If so we delete that and finally we rename the newly uploaded dataset and remove the _new postfix.

The log lines

14:15:29 Scheduler events are disabled.
14:15:29 PL LOG: Loader started
14:15:41 PL LOG: No more files to load. Loader has finished,

are generated by our loading script. The checkpoint started and finished logs are from our renaming script:

log_enable(2);
UPDATE DB.DBA.RDF_QUAD TABLE OPTION (index RDF_QUAD_GS)
   SET g = iri_to_id ('http://data.europeana.eu/dataset/##DATASET_ID##')
 WHERE g = iri_to_id ('http://data.europeana.eu/dataset/##DATASET_ID##_new', 0);
checkpoint;
log_enable(1);

This certainly isn’t the most efficient way to do our intial loading of all datasets and we probably should introduce separate scripts for the initial loading of all data sets and for updating a set afterwards, but that’s something we want to work on later. Right now my main worry is to understand where the OOM issues come from and try to fix that.

I am starting to wonder if the OOM isn’t simply a logical consequence of configuring so little RAM compared to what you recommend. Our initial tests on a bare metal machine (also with a 8GB limit so we could compare RAM usage) showed memory usage was more stable there. However the test upload I started yesterday on the bare metal server failed after about 11 hours because of OOM. It did load much more though before it was killed compared to our kubernetes deployment, because the bare metal server has more CPU available.

This all gives me the impression that the OOM issue is not directly related to the number of items we load as I thought initially, but to the time Virtuoso spents on bulk loading? Also, as mentioned before, when we complete loading all files after many OOMs and restarts, we are able to run Virtuoso with as little as 4GB RAM.

Anyway, I uploaded the virtuoso.log files of our k8s deployment about 4 minutes before the pod was killed and that of the bare metal server:
virtuoso.log bare metal server up to the point it got killed.
partial virtuoso.log kubernetes while at 7.8GB RAM usage

Here’s also the output of the kubernetes pod’s status(), about 4 minutes before the pod was killed.

OpenLink Virtuoso  Server
Version 07.20.3240-pthreads for Linux as of Nov 11 2024 (ffed4676d)
Started on: 2025-04-01 22:06 GMT+0 (up 09:43)
CPU: 62.71% RSS: 7751MB VSZ: 8847MB PF: 315
 
Database Status:
  File size 18278776832, 2231296 pages, 520511 free.
  340000 buffers, 319144 used, 9693 dirty 24 wired down, repl age 9297348 0 w. io 0 w/crsr.
  Disk Usage: 2144181 reads avg 0 msec, 0% r 0% w last  0 s, 73838540 writes flush      48.56 MB/s,
    52245 read ahead, batch = 35.  Autocompact 701791 in 559388 out, 20% saved col ac: 31314479 in 6% saved.
Gate:  46707 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap. 
Log = ../database/virtuoso.trx, 185 bytes
1698932 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 528 connects, max 2 concurrent
RPC: 8587 calls, 2 pending, 2 max until now, 0 queued, 218 burst reads (2%), 0 second 197M large, 790M max
Checkpoint Remap 2000 pages, 0 mapped back. 6099 s atomic time.
    DB master 2231296 total 519900 free 2000 remap 7 mapped back
   temp  256 total 251 free
 
Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
   Currently 6 threads running 0 threads waiting 1 threads in vdb.
Pending:
 
Client 1111:527:  Account: dba, 207 bytes in, 288 bytes out, 1 stmts.
PID: 716, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks: 
 
Client 1111:528:  Account: dba, 853 bytes in, 543 bytes out, 1 stmts.
PID: 718, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks: 
 
 
Running Statements:
 Time (msec) Text
        3290 rdf_loader_run()
         420 status()
 
 
Hash indexes
 

44 Rows. -- 451 msec.

I see now that I didn’t properly configure the batch size on the bare metal server and it may have tried to load a very large set in 1 go. Not sure if the bulk loader can handle that or if that could be the cause of the OOM there?

hwilliams · April 2, 2025, 10:20am

So, basically you appear to be seeking to load new datasets periodically some of which are new graphs and others are updates to existing graphs. And in the latter case the existing graph is deleted and the new graph ( *_new) renamed to the existing graph ?

In which case this seems very similar to the Virtuoso Delta-aware bulk loading option, which effectively optimally loads the difference/delta between the existing and new datasets as part of the bulk load operation automatically. Have you considered this option, although note it would require the conversion from TTL to NQUAD dataset files ?

You stated in previous post:

virtuoso.log bare metal server up to the point it got killed.

but I thought the OOM error did not occur on bare metal (ie physical computer), which loaded the datasets successfully and only occurred on Kubernetes, as per the title of the post ?

In your Kubernetes virtuoso log after every checkpoint post bulk load the Checkpoint removed xxx MB of remapped pages, leaving 15 MB. Duration yyyy s. To save this time, increase MaxCheckpointRemap and/or set Unremap quota to 0 in ini file. messages occur, but do not occur in the bare metal server. Do both machine not have the same MaxCheckpointRemap = 2000, which is the default for a Virtuoso installation (docker or standalone), or do you have different settings for both ?

P-Ehlert · April 3, 2025, 1:19pm

The Delta-aware bulk loading looks interesting. I’m not sure how easy it is for us to generate N-Quad files and if there’s budget to switch from the open version of Virtuoso to the enterprise edition, but we’ll investigate this.

I see now that our initial conclusion that on a bare metal server Virtuoso doesn’t get killed was drawn too hastily. Sorry for that. What I saw earlier was that RAM usage increases more quickly on the bare metal server and then stabilizes, unlike in kubernetes where RAM usage primarily goes up over time. But eventually in both cases an OOM occurs somewhere after 9-15 hours, so I guess we have little other options but to increase the available RAM.

Regarding the “Checkpoint removed…” messages, I was also surprised that we don’t see those on the bare metal server. I double checked and we do have the same (default) MaxCheckpointRemap=2000 setting on the bare metal server as in kubernetes.

I think we’ll abandon our attempts to run this in kubernetes because of the RAM limitations we have there. Instead we’ll focus now on getting this working on the bare metal server with more RAM. Thank you for your help.