Virtuoso Writing Issue - Process Stalling and Crashing

Frank · August 27, 2024, 11:10am

Hello,

We are encountering an issue with our Virtuoso setup and would greatly appreciate your assistance in diagnosing and resolving the problem.

Environment:

Virtuoso Version: 07.20.3239 (the same issue was observed in previous versions)
Apache Jena Libraries Version: 4.9.0
Server Configuration:
- CPU: 2 cores
- RAM: 8 GB

Data Ingestion:

We are saving data to Virtuoso using the Jena library with the following code snippet:

private void saveWithConnection(String graphName, Model model, RDFConnection connection) {
    try {
        connection.load(graphName, model);
    } catch (Exception e) {
        log.error("Could not flush!", e);
        if (e instanceof HttpException) {
            HttpException httpException = (HttpException) e;
            log.error("HttpException: {}", httpException.getResponse());
        }
        throw new TripleStoreRepositoryException(format("Could not save model to '%s'", graphName), e);
    }
}

You can view the full code here.

Problem:

We’re experiencing issues when attempting to write a relatively large dataset (82 MB) from this TTL file.

During the write operation, Virtuoso appears to hang indefinitely without completing the write, eventually crashing after a significant delay.

Virtuoso Status Output during the Write Operation:

Virtuoso Server
Version 07.20.3239-pthreads for Linux as of Feb 13 2024 (d698f21712)
Started on: 2024-08-26 13:36 GMT+0 (up 20:32)
CPU: 0.05% RSS: 6249MB VSZ: 7139MB PF: 0

Database Status:
File size 174063616, 21248 pages, 10366 free.
680000 buffers, 6217 used, 3736 dirty 1 wired down, repl age 0 0 w. io 6 w/crsr.
Disk Usage: 76365 reads avg 0 msec, 0% r 0% w last 102 s, 2631 writes flush 30.12 MB/s,
113 read ahead, batch = 25. Autocompact 225 in 157 out, 30% saved.
Gate: 299 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = ../database/virtuoso.trx, 65461820 bytes
682 pages have been changed since last backup (in checkpoint state)
Current backup prefix: bck_dev_
Current backup timestamp: 0x091C-0x02-0x00
Last backup date: Mon Aug 26 23:57:28 2024

Clients: 0 connects, max 0 concurrent
RPC: 1 calls, 0 pending, 1 max until now, 0 queued, 0 burst reads (0%), 0 second 131M large, 293M max
Checkpoint Remap 0 pages, 0 mapped back. 1 s atomic time.
DB master 21248 total 10366 free 0 remap 0 mapped back
temp 1024 total 1016 free

Lock Status: 0 deadlocks of which 0 2r1w, 6 waits,
Currently 8 threads running 6 threads waiting 1 threads in vdb.
Pending:
...
(100 instances of "IER NO_CONN")

Memory Status during the Stall:

P.S.:The issue does not occur consistently. Sometimes the write operation completes successfully within a few minutes.

Has anyone encountered a similar issue or have any insights on what might be causing Virtuoso to hang and eventually crash during large data writes?
Any suggestions on what we could try to resolve this would be greatly appreciated.

Thank you!

hwilliams · August 27, 2024, 4:09pm

Have you reviewed the Transactional Bulk Loading of RDF Data into Virtuoso DBMS via the Jena RDF Framework post we provide, which details the optimal means of bulk loading RDF datasets into Virtuoso with Jena. Note there is also the Virtuoso RDF Bulk Loader documentation detailing most optimal means of bulk loading RDF datasets into Virtuoso using its built-in functions.

Frank · August 28, 2024, 9:51am

Hello hwilliams,
thank you for your response.

Your suggestion is interesting, but we are currently using REST APIs to load semantic assets into Virtuoso. Our approach does not involve multithreaded writing, so it’s unclear why the system occasionally goes into a lock state.

We would like to further investigate this issue to understand the reasons behind these deadlocks. Moreover, we’re looking for ways to handle these situations - such as managing timeouts or other strategies - since we currently don’t have a clear approach to address this problem.

Any insights or suggestions on how to better analyze or handle these scenarios would be greatly appreciated.

Thank you!

hwilliams · August 28, 2024, 4:32pm

Even if inserting the data multi-threaded, there are still internal operations like the scheduled updating free text indexes etc that can be occurring in the background, reading and writing to the tables hosting the RDF data. So you still should try changing the transaction isolation level or concurrency mode connecting with to minimise the locks and pending transactions as is done in the Transactional Bulk Loading of RDF Data into Virtuoso DBMS via the Jena RDF Framework post.

Are any errors being reported in the virtuoso.log file during the load operations ? It would also be worth enabling additional tracing with the Virtuoso trace_on() function to get more detailed information written to the log.

If the database is crashing sometime and a core file is being generated , when ulimit -c unlimited is set, if you are compiling Virtuoso yourself you can also rebuild a debug binary to enable core file generation for analysis to see where the crash is occurring as detailed in the Virtuoso open source “gdb” core file stack trace generation for analysis post.

Frank · September 2, 2024, 8:59am

I’m happy to report that we were able to resolve the issue by increasing the MaxClientConnections parameter from 10 to 50, as suggested in the Virtuoso community thread.

However, I’m still unclear on why this change resolved the problem. If you can provide further insights into the reasoning behind this solution, I’d greatly appreciate it.

Thank you for your support!

hwilliams · September 2, 2024, 11:03am

The MaxClientConnections in the parameters section controls the number of SQL server threads the Virtuoso server is allowed to use for external and internal requests. If your application requires more server threads than the Virtuoso server configuration allows then. you will start hitting bottle necks which would explain the intermittent behaviour being experience. As indicated in the Virtuoso "MaxClientConnections" configuration parameter post, it is recommended this be set to at least 100 typically, although it can be set higher as the threads are allocated on demand ie as and when required.

Frank · September 13, 2024, 10:51am

I would like to ask for further clarification regarding the issue we encountered. Although the previous problem was resolved during testing, we are concerned that it might reoccur in the future. The issue did not cause a full crash of Virtuoso but rather a stall where Virtuoso did not generate any logs and did not progress, even after waiting for hours. The crash only occurred when we tried to query the graphs (via the Conductor) during this operation.

We would like to know if there is a configuration parameter that can help manage these stall situations more effectively. Specifically, we are interested in setting a timeout that would terminate the connection in case of a stall, so our software can handle the response and exception, rather than remaining in a pending state.

Is there a parameter or a method to manage this behavior?

Thank you

hwilliams · September 13, 2024, 12:09pm

The Virtuoso Jena providers VirtuosoQueryExecution method has a timeout parameter to limit how long a query will be allowed to run.

In terms of know what is causing the stall or crash we would need more information to determine this, such as:

What does the output of the status(); command report when in the stalled state
When a crash occurs is any error reported in the virtuoso.log or Linux kernel log file ?
When the Virtuoso server is started ulimit -c unlimited is set , does a core file get created that could be analysed with gdb to determine the cause of the crash ?

Frank · September 13, 2024, 1:34pm

Hi,

The output of the status(); command is the one I posted in the first message:

For example, locally:

 OpenLink Virtuoso Server
 Version 07.20.3233-pthreads for Linux as of Jun 30 2021
 Started on: 2024-09-13 10:25 GMT+0
 
 Database Status:
  File size 150994944, 18432 pages, 11051 free.
  20000 buffers, 5978 used, 5116 dirty 0 wired down, repl age 0 0 w. io 2 w/crsr.
  Disk Usage: 1079 reads avg 0 msec, 0% r 0% w last 351 s, 117 writes flush 0 MB/s,
  41 read ahead, batch = 12. Autocompact 0 in 0 out, 0% saved.
 Gate: 151 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
 Log = /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.trx, 328292724 bytes
 2215 pages have been changed since last backup (in checkpoint state)
 Current backup timestamp: 0x0000-0x00-0x00
 Last backup date: unknown
 Clients: 0 connects, max 0 concurrent
 RPC: 1 calls, 0 pending, 1 max until now, 0 queued, 0 burst reads (0%), 0 second 113M large, 480M max
 Checkpoint Remap 19 pages, 0 mapped back. 0 s atomic time.
  DB master 18432 total 11051 free 19 remap 4 mapped back
  temp 256 total 251 free
 
 Lock Status: 0 deadlocks of which 0 2r1w, 1 waits,
  Currently 3 threads running 1 threads waiting 1 threads in vdb.
 Pending:
  101: IER NO_CONN 172.20.0.1
  8: ISR 172.20.0.1
  4: ISR 172.20.0.1
  9: ISR 172.20.0.1
  5: ISR 172.20.0.1
  1: ISR NO_CONN
  6: ISR 172.20.0.1
  2: ISR 172.20.0.1
  7: ISR NO OWNER
  3: ISR 172.20.0.1
  13824: IER NO_CONN 172.20.0.1
  ...

There is nothing of interest in the log (even with trace_on()).

As mentioned, the crash only occurs if forced; it is not automatic. What happens is that it remains in this stalled state. Therefore, I would like to have a parameter that closes the connection in this condition.

Regarding the timeout parameter, is this a client-side parameter?
Isn’t there a parameter in virtuoso?
How can we integrate it into our writing method that I posted in the first message?

Thanks

hwilliams · September 14, 2024, 2:10pm

Why from the status() output are you now running an old Version 07.20.3233-pthreads for Linux as of Jun 30 2021 binary, whereas previously you were running a ersion 07.20.3239-pthreads for Linux as of Feb 13 2024 (d698f21712) binary ?

Why in the status() output are 20000 buffers whereas previously they were set to 680000 buffers. The buffers in the status output is the NumberOfBuffers param in the INI config file.

How many triples are in the database as with 20000 buffers which is the minimal Virtuoso default you would not be able to out much data before running out of memory buffers ? Forcing the database to start swapping to disk to process requests, which would slow it down resulting in a growth of pending transactions and general slowing of the database.

The Jena RDFConnection method being used is connecting directly to the Virtuoso SPARQL endpoint, which does not support transactions or the setting of different transaction isolation levels in the database, that can impact performance. The default transaction isolation level for Virtuoso is READ_COMMITTED , but the DefaultIsolation setting in the [Parameters] section can be used to set it to any of the standard levels of:

Numeric Value	Transaction Isolation Level
unset	as if set to 2, `READ COMMITTED`
1	`READ UNCOMMITTED`
2	`READ COMMITTED`
4	`REPEATABLE READ`
8	`SERIALIZABLE`

So you can try setting the isolation level to something like REPEATABLE READ ie DefaultIsolation = 4, to see if it improves performance and prevents the excessive locking/pending transactions.

As said previously to better control transactions in the database you should use the methods outlined in the reviewed the Transactional Bulk Loading of RDF Data into Virtuoso DBMS via the Jena RDF Framework post we provide, which details the optimal means of bulk loading RDF datasets into Virtuoso with Jena using the Virtuoso Jena Provider.