Segmentation fault when a pyodbc multi thread process accesses to Virtuoso 7.2.13

On CentOS-7, unixODBC-2.3.12, pyodbc 5.2.0, python 3.12.3, and Virtuoso Open Source 7.2.13, a multithread python process is abnormally terminated with segmentation fault.
The python script uses a concurrent.futures module and spins a thread by executor.map with max_concurrent_threads of 3.
There is a list of SPARQL queries to be issued in parallel and first it runs without any problem, but later the issue happens.
The query that causes the termination is different each time, and therefore it is difficult to spot the cause.
OS dmesg says as follows.

python[89760]: segfault at 831 ip 00007f4c7f995d20 sp 00007f4c7e708b00 error 4 in virtodbcu_r.so[7f4c7f914000+279000]

And the code to issue a query, which runs as a thread is as follows.

def concurrent_query_executer(query: str):
  time.sleep(0.1)
  conn = connectDB(endpoint_name, passlist[endpoint_name])
  with conn.cursor() as cursor:
    try:
      cursor.execute(query)
      result_set = cursor.fetchall()
    except Exception as e:
      result_set = []
      print(f"Error: {e}", flush = True, file = sys.stderr)
  conn.close()
  return result_set

def connectDB(endpoint, password):
  conn_string = "DSN={};UID=dba;PWD={}".format(endpoint, password)
  conn = pyodbc.connect(conn_string, autocommit=True)
  return conn

def spin_all_queeies(query_list: list[str], max_concurrent_threads: int = 3):
  with ThreadPoolExecutor(max_workers = max_concurrent_threads) as executor:
    result_set = executor.map(concurrent_query_executer, query_list)
  return list(result_set)

And here is the .odbc.ini

[endpoint]
Description = VOS for MyRDF
Driver=/home/vosuser/vos72_latest/lib/virtodbcu_r.so
Address     = localhost:51120
PWDClearText       = 0
LastUser           = dba
RoundRobin         = No
NoSystemTables     = No
TreatViewsAsTables = No
wideAsUTF16 = Y

I appreciate any suggestions.
Thanks.

I have tested on Ubuntu 18.04, unixODBC 2.3.11, pyodbc 4.0.30, python 3.6.9 and Virtuoso Open Source 7.2.13.

Made the same code runnable:

#!/usr/bin/env python3

import time
import sys
import pyodbc
import argparse
from typing import List
from concurrent.futures import ThreadPoolExecutor

# Hardcoded DSN credentials
DSN = "VOS"
UID = "dba"
PWD = "dba"

def connectDB():
    conn_string = "DSN={};UID={};PWD={}".format(DSN, UID, PWD)
    conn = pyodbc.connect(conn_string, autocommit=True)
    return conn

def concurrent_query_executer(query):
    time.sleep(0.1)  # slight delay to simulate staggered start
    conn = connectDB()
    result_set = []
    with conn.cursor() as cursor:
        try:
            cursor.execute(query)
            result_set = cursor.fetchall()
        except Exception as e:
            print("Error: {}".format(e), flush=True, file=sys.stderr)
    conn.close()
    return result_set

def spin_all_queries(query_list, max_concurrent_threads=3):
    with ThreadPoolExecutor(max_workers=max_concurrent_threads) as executor:
        result_set = executor.map(concurrent_query_executer, query_list)
    return list(result_set)

def main():
    parser = argparse.ArgumentParser(description="Run SQL queries concurrently via ODBC")
    parser.add_argument("query_file", help="Path to a file containing SQL queries (one per line)")
    parser.add_argument("--threads", type=int, default=3, help="Number of concurrent threads")
    args = parser.parse_args()

    # Load queries from file
    try:
        with open(args.query_file, "r") as f:
            queries = [line.strip() for line in f if line.strip()]
    except Exception as e:
        print("Failed to read queries file: {}".format(e), file=sys.stderr)
        sys.exit(1)

    results = spin_all_queries(queries, args.threads)

    for i, res in enumerate(results):
        print("--- Result for Query {} ---".format(i + 1))
        for row in res:
            print(row)

if __name__ == "__main__":
    main()

Use sample set of test queries:

$ cat queries.txt 
SPARQL SELECT count(*) WHERE {?s ?p ?o}
SPARQL SELECT * WHERE {?s ?p ?o} LIMIT 5 
SELECT NOW()

and the program runs successfully for me multiple time:

$ python3 run_queries.py queries.txt --threads 5
--- Result for Query 1 ---
(351583, )
--- Result for Query 2 ---
('http://www.w3.org/2001/XMLSchema#date', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2000/01/rdf-schema#Datatype')
('http://www.w3.org/2001/XMLSchema#gDay', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2000/01/rdf-schema#Datatype')
('http://www.w3.org/2001/XMLSchema#gMonth', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2000/01/rdf-schema#Datatype')
('http://www.w3.org/2001/XMLSchema#gYear', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2000/01/rdf-schema#Datatype')
('http://www.w3.org/2001/XMLSchema#time', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2000/01/rdf-schema#Datatype')
--- Result for Query 3 ---
(datetime.datetime(2025, 3, 26, 10, 48, 27, 545210), )
$

Thus do you have a test set of queries and dataset for recreation ?

Also, did you compile Virtuoso yourself or did you use prebuilt binaries ?

1 Like

Your ‘virtuoso.ini’ file settings will also be useful regarding what may be causing this problem.

Thanks for your comments.
I use prebuilt binaries.
The number of queries to be retrieved is an order of millions.
The endpoint has 76 graphs and 18 billion triples in total.
The machine has 3.9T RAM.

Here is the virtuoso.ini excerption relevant to this topic.

[Parameters]
ServerPort                      = 51120
LiteMode                        = 0
DisableUnixSocket               = 1
DisableTcpSocket                = 0
MaxClientConnections            = 20
CheckpointInterval              = 60
O_DIRECT                        = 0
CaseMode                        = 2
MaxStaticCursorRows             = 50000
CheckpointAuditTrail            = 0
AllowOSCalls                    = 0
SchedulerInterval               = 10
DirsAllowed                     = ., /data/rdfportal, /usr/share/proj, /home/vosuser/vos72_latest
ThreadCleanupInterval           = 0
ThreadThreshold                 = 15
ResourcesCleanupInterval        = 0
FreeTextBatchSize               = 100000
SingleCPU                       = 0
VADInstallDir                   = /home/vosuser/vos72_latest/vad/
PrefixResultNames               = 0
RdfFreeTextRulesSize            = 100
IndexTreeMaps                   = 256
MaxMemPoolSize                  = 200000000
PrefixResultNames               = 0
MacSpotlight                    = 0
MaxQueryMem                     = 2G
VectorSize                      = 1000
MaxVectorSize                   = 1000000
AdjustVectorSize                = 0
ThreadsPerQuery                 = 5
AsyncQueueMaxThreads            = 15
NumberOfBuffers          = 1360000
MaxDirtyBuffers          = 1000000

[Client]
SQL_PREFETCH_ROWS               = 50000
SQL_PREFETCH_BYTES              = 64000000
SQL_QUERY_TIMEOUT               = 0
SQL_TXN_TIMEOUT                 = 0

Is there any reason why the complete virtuoso.ini file cannot be provided, as there maybe settings in other sections that affect behaviour ?

I note in the INI file snippet provided NumberOfBuffers = 1360000 , which is the setting for a many with 16GB RAM, is there a reason it is set so low ? As given you indicate having about 18 billion triples and we recommend 10GB RAM per billion triples on average, for 18B triples about 180GB RAM is require for hosting the data in memory ie NumberOfBuffers= 15300000 & MaxDirtyBuffers = 1125000 , and should be possible given it is indicated the machine has 3.9T RAM.

With the current settings it would be interested to see the output of the Virtuoso status(); command when the typical database working set is in use, as I suspect all the buffers are consumed resulting is a lot of swapping between memory and disk, reducing performance.

BTW, I presume from the error initially reported ie

python[89760]: segfault at 831 ip 00007f4c7f995d20 sp 00007f4c7e708b00 error 4 in virtodbcu_r.so[7f4c7f914000+279000]

It is the Virtuoso client ODBC driver that is crashing and not the Virtuoso database server ?

It would also be useful to see the virtuoso.log file, to see the state of the Virtuoso server when running these queries, especially with so little memory allocated for use.

Please note this public Google Spreadsheet comprising INI settings for some well known public Virtuoso Open Source Edition instances hosting large datasets. I suggest you check your settings against those.

I thought it would be redundant to put all the virtuoso.ini.
Here is the entire content.
And its no special reason as to the NumberOfBuffers.
No swapping has been observed.
Is there any possibility that this parameter causes the segmentation fault?

[Database]
DatabaseFile                    = /home/vosuser/vdbs/pubchem.db
ErrorLogFile                    = /home/vosuser/vdbs/pubchem.log
LockFile                        = /home/vosuser/vdbs/pubchem.lck
TransactionFile                 = /home/vosuser/vdbs/pubchem.trx
xa_persistent_file              = /home/vosuser/vdbs/pubchem.pxa
ErrorLogLevel                   = 7
FileExtend                      = 200
MaxCheckpointRemap              = 2000
Striping                        = 0
TempStorage                     = TempDatabase

[TempDatabase]
DatabaseFile                    = /home/vosuser/vdbs/pubchem-temp.db
TransactionFile                 = /home/vosuser/vdbs/pubchem-temp.trx
MaxCheckpointRemap              = 2000
Striping                        = 0

[Parameters]
ServerPort                      = 51120
LiteMode                        = 0
DisableUnixSocket               = 1
DisableTcpSocket                = 0
MaxClientConnections            = 20
CheckpointInterval              = 60
O_DIRECT                        = 0
CaseMode                        = 2
MaxStaticCursorRows             = 50000
CheckpointAuditTrail            = 0
AllowOSCalls                    = 0
SchedulerInterval               = 10
DirsAllowed                     = ., /data/rdfportal, /usr/share/proj, /home/vosuser/vos72_latest
ThreadCleanupInterval           = 0
ThreadThreshold                 = 15
ResourcesCleanupInterval        = 0
FreeTextBatchSize               = 100000
SingleCPU                       = 0
VADInstallDir                   = /home/vosuser/vos72_latest/vad/
PrefixResultNames               = 0
RdfFreeTextRulesSize            = 100
IndexTreeMaps                   = 256
MaxMemPoolSize                  = 200000000
PrefixResultNames               = 0
MacSpotlight                    = 0
MaxQueryMem                     = 2G
VectorSize                      = 1000
MaxVectorSize                   = 1000000
AdjustVectorSize                = 0
ThreadsPerQuery                 = 5
AsyncQueueMaxThreads            = 15
NumberOfBuffers          = 1360000
MaxDirtyBuffers          = 1000000

[HTTPServer]
ServerPort                      = 58888
ServerRoot                      = /home/vosuser/vos72_latest/vsp
MaxClientConnections            = 10
DavRoot                         = DAV
EnabledDavVSP                   = 0
HTTPProxyEnabled                = 0
TempASPXDir                     = 0
DefaultMailServer               = localhost:25
MaxKeepAlives                   = 10
KeepAliveTimeout                = 10
MaxCachedProxyConnections       = 10
ProxyConnectionCacheTimeout     = 15
HTTPThreadSize                  = 280000
HttpPrintWarningsInOutput       = 0
Charset                         = UTF-8
MaintenancePage                 = atomic.html
EnabledGzipContent              = 1

[AutoRepair]
BadParentLinks                  = 0

[Client]
SQL_PREFETCH_ROWS               = 50000
SQL_PREFETCH_BYTES              = 64000000
SQL_QUERY_TIMEOUT               = 0
SQL_TXN_TIMEOUT                 = 0

[VDB]
ArrayOptimization               = 0
NumArrayParameters              = 10
VDBDisconnectTimeout            = 1000
KeepConnectionOnFixedThread     = 0

[Replication]
ServerName                      = db-RDFP03
ServerEnable                    = 1
QueueMax                        = 50000

[Striping]
Segment1                        = 100M, db-seg1-1.db, db-seg1-2.db
Segment2                        = 100M, db-seg2-1.db

[Zero Config]
ServerName                      = virtuoso (RDFP03)

[Mono]
[URIQA]
DynamicLocal                    = 0
DefaultHost                     = localhost:8890

[SPARQL]
MaxConstructTriples             = 1000000
ResultSetMaxRows                = 100000
MaxQueryCostEstimationTime      = 4000  ; in seconds
MaxQueryExecutionTime           = 3600  ; in seconds
DefaultQuery                    = select distinct ?Concept where {[] a ?Concept} LIMIT 100
DeferInferenceRulesInit         = 0  ; controls inference rules loading
MaxMemInUse                     = 0  ; limits the amount of memory for construct dict (0=unlimited)

[Plugins]
#LoadPath                       = /data/rdfportal/virtuoso/stat/lib/virtuoso/hosting
#Load1                          = plain, wikiv
#Load2                          = plain, mediawiki
#Load3                          = plain, creolewiki
#Load8          = plain, shapefileio
#Load9          = plain, graphql

Here is the status(); output.

SQL> status();
REPORT
VARCHAR
_______________________________________________________________________________

OpenLink Virtuoso  Server
Version 07.20.3240-pthreads for Linux as of Jun 10 2024 (a1fd8195bf)
Started on: 2025-03-26 08:34 GMT+9 (up 13:07)
CPU: 6.48% RSS: 17022MB VSZ: 18499MB PF: 0

Database Status:
  File size 2445279232, 119836160 pages, 39658221 free.
  1360000 buffers, 1336557 used, 3 dirty 0 wired down, repl age 47378048 0 w. io 0 w/crsr.
  Disk Usage: 415606861 reads avg 0 msec, 3% r 0% w last  43767 s, 89158 writes flush          0 MB/s,
    5234179 read ahead, batch = 78.  Autocompact 0 in 0 out, 0% saved.
Gate:  19382926 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = /home/vosuser/vdbs/pubchem.trx, 1254 bytes
80175254 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 1011577 connects, max 4 concurrent
RPC: 4046226 calls, 62 pending, 64 max until now, 0 queued, 12815 burst reads (0%), 0 second 0M large, 19M max
Checkpoint Remap 0 pages, 0 mapped back. 8 s atomic time.
    DB master 119836160 total 39658221 free 0 remap 0 mapped back
   temp  256 total 249 free

Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
   Currently 2 threads running 0 threads waiting 0 threads in vdb.
Pending:

Client 51120:1:  Account: dba, 562 bytes in, 7185 bytes out, 1 stmts.
PID: 34292, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks:

Client 51120:2:-1011579:  Account: dba, 500 bytes in, 505 bytes out, 1 stmts.
PID: 34601, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks:


Running Statements:
 Time (msec) Text
          33 SPARQL select distinct ?graph ?sclass ?pred ?oclass {values (?graph ?sclass) { (
        1196 status()


Hash indexes


44 Rows. -- 1196 msec.

And there is no relevant records in virtuoso.log. ( pubchem.log of our case. ).

08:34:35 OpenLink Virtuoso Universal Server
08:34:35 Version 07.20.3240-pthreads for Linux as of Jun 10 2024 (a1fd8195bf)
08:34:35 uses OpenSSL 1.0.2u  20 Dec 2019
08:34:35 uses parts of PCRE, Html Tidy
08:34:36 Database version 3126
08:34:37 SQL Optimizer enabled (max 1000 layouts)
08:34:38 Compiler unit is timed at 0.000123 msec
08:34:45 Roll forward started
08:34:45     63 transactions, 5530 bytes replayed (100 %)
08:34:45 Roll forward complete
08:34:49 Checkpoint started
08:34:49 Checkpoint finished, log reused
08:34:49 HTTP/WebDAV server online at 58888
08:34:49 Server online at 51120 (pid 34098)
09:34:53 Checkpoint started
09:34:53 Checkpoint finished, log reused
10:34:55 Checkpoint started
...

virtodbcu_r.so is provided with the prebuilt binaries, at lib directory.

I assume the must have been an initial reason the NumberOfBuffers was set to that of a 16GB RAM machine I would think ? Have you reviewed the Virtuoso RDF Performance tuning guide when configuring the instance ?

From the status output it shows 1360000 buffers, 1336557 used ie near all of the 1336557 memory buffers allocated to Virtuoso are using after which it would have to start swapping data. So unless the query workload only touches a small section of the datasets loaded then you should consider increasing the NumberOfBuffers as suggested before.

I doubt the NumberOfBuffers setting on the Virtuoso server would cause the crash in the ODBC driver on the client side. Although you did not confirm if the Virtuoso server was still accessible after the crash of the client occurred , although if the status() output is from after such a run then it seems to be accessible still ?

Is it the public PubChem RDF datasets or some variant of that you have loaded into your Virtuoso instance ?

I raised the NumberOfBuffers to the following, but the segmentation fault occurred.

NumberOfBuffers          = 12000000
MaxDirtyBuffers          = 10000000

The status() says as follows after the crash and the server is still accessible.

SQL> status();
REPORT
VARCHAR
_______________________________________________________________________________

OpenLink Virtuoso  Server
Version 07.20.3240-pthreads for Linux as of Jun 10 2024 (a1fd8195bf)
Started on: 2025-03-28 09:35 GMT+9 (up 00:38)
CPU: 0.05% RSS: 64784MB VSZ: 99729MB PF: 0

Database Status:
  File size 2445279232, 119836160 pages, 39658167 free.
  12000000 buffers, 3385445 used, 3 dirty 0 wired down, repl age 0 0 w. io 0 w/crsr.
  Disk Usage: 3391800 reads avg 0 msec, 0% r 0% w last  767 s, 6427 writes flush          0 MB/s,
    7884 read ahead, batch = 428.  Autocompact 0 in 0 out, 0% saved.
Gate:  6520 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = /home/vosuser/vdbs/pubchem.trx, 3392 bytes
80175254 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 212416 connects, max 11 concurrent
RPC: 849711 calls, 164 pending, 169 max until now, 0 queued, 5832 burst reads (0%), 0 second 0M large, 19M max
Checkpoint Remap 57 pages, 0 mapped back. 0 s atomic time.
    DB master 119836160 total 39658167 free 57 remap 3 mapped back
   temp  256 total 251 free

Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
   Currently 1 threads running 0 threads waiting 0 threads in vdb.
Pending:

Client 51120:76628:  Account: dba, 342 bytes in, 3768 bytes out, 1 stmts.
PID: 69077, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks:


Running Statements:
 Time (msec) Text
        1493 status()


Hash indexes


38 Rows. -- 1495 msec.

The virtuoso instance stores not only the official PubChem RDF, but other data as well.

After raising the NumberOfBuffers and the MaxDirtyBuffers to 24 000 000 and 20 000 000, it still exited with segmentation fault.
After this crash, the server was still accepting connections.
Then, I run the identical script and this time the client abnormally exited with the following message.
And after this termination, the server is still running and accepts a query.

Traceback (most recent call last):
  File "/home/vosuser/git/rdfportal_metadata/get_gspo.py", line 142, in <module>
    result_set = spin_all_queeies(query_list)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vosuser/git/rdfportal_metadata/get_gspo.py", line 66, in spin_all_queries
    return list(result_set)
           ^^^^^^^^^^^^^^^^
  File "/home/vosuser/.pyenv/versions/3.12.3/lib/python3.12/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vosuser/.pyenv/versions/3.12.3/lib/python3.12/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/vosuser/.pyenv/versions/3.12.3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/vosuser/.pyenv/versions/3.12.3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/vosuser/.pyenv/versions/3.12.3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vosuser/git/rdfportal_metadata/get_gspo.py", line 50, in concurrent_query_executer
    conn = connectDB(endpoint_name, passlist[endpoint_name])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vosuser/git/rdfportal_metadata/get_gspo.py", line 120, in connectDB
    conn = pyodbc.connect(conn_string, autocommit=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyodbc.Error: ('S2801', '[S2801] [OpenLink][Virtuoso ODBC Driver]CL033: Connect failed to localhost:51120 = localhost:51120. (-1) (SQLDriverConnect)')

The status(); says after this,

SQL> status();
REPORT
VARCHAR
_______________________________________________________________________________

OpenLink Virtuoso  Server
Version 07.20.3240-pthreads for Linux as of Jun 10 2024 (a1fd8195bf)
Started on: 2025-03-28 14:23 GMT+9 (up 03:13)
CPU: 0.05% RSS: 137768MB VSZ: 197100MB PF: 1

Database Status:
  File size 2445279232, 119836160 pages, 39658221 free.
  24000000 buffers, 7259288 used, 3 dirty 0 wired down, repl age 0 0 w. io 0 w/crsr.
  Disk Usage: 7265677 reads avg 0 msec, 0% r 0% w last  0 s, 25522 writes flush          0 MB/s,
    25970 read ahead, batch = 278.  Autocompact 0 in 0 out, 0% saved.
Gate:  66540 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = /home/vosuser/vdbs/pubchem.trx, 2323 bytes
80175254 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 1142232 connects, max 10 concurrent
RPC: 4569019 calls, 492 pending, 501 max until now, 0 queued, 37832 burst reads (0%), 0 second 0M large, 20M max
Checkpoint Remap 0 pages, 0 mapped back. 10 s atomic time.
    DB master 119836160 total 39658221 free 0 remap 0 mapped back
   temp  256 total 251 free

Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
   Currently 1 threads running 0 threads waiting 0 threads in vdb.
Pending:

Client 51120:1142232:  Account: dba, 220 bytes in, 297 bytes out, 1 stmts.
PID: 9155, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks:


Running Statements:
 Time (msec) Text
        1387 status()


Hash indexes


38 Rows. -- 1388 msec.

With NumberOfBuffers = 12000000 the status() output shows:

12000000 buffers, 3385445 used
...
Clients: 212416 connects, max 11 concurrent

Indicating 3385445 of the 12000000 buffers were in use at the point the virtodbcu_r.so client driver crashed. With 212416 client connections having been made with a maximum of 11 concurrent client connections at the same time.

With NumberOfBuffers = 24000000 the status() output shows:

24000000 buffers, 7259288 used,
...
Clients: 1142232 connects, max 10 concurrent

Indicating 7259288 buffers are used were in use at the point the virtodbcu_r.so client driver SQLDriverConnect called failed. With 1142232 client connections having been made with a maximum of 10 concurrent client connections at the same time.
Does the failure now consistently occur with the SQLDriverConnect error with NumberOfBuffers = 24000000, with no more crashes, as only 7259288 buffers are used and so did not reach the 12000000 buffers set previously and so I do not expect it to make a difference ?

BTW is your test python application being run on the same machine as Virtuoso is installed on a different machine ? I would suggest running on separate machines, so they are not competing for resources.

I ask this at given your application is making millions of connections in quick succession, could it be the OS is running out of sockets or something for opening new connections ?

The kernel has a socket timeout setting sysctl net.ipv4.tcp_fin_timeout which is normally about 60 secs and can be reduced such that sockets are timeout quicker and released by the kernel for reuse. You can view the number of sockets in WAIT state with the command netstat -an | grep WAIT | wc -l.

Finally, I note in the [Parameters] section of the INI file MaxClientConnections = 20 and would recommend that setting be increase to about 1000 as this controls threads required for both client connections and interval virtuoso server processes and so should be set higher such that if the server wants to allocate threads on demand it can do so.

Thank you very much for your helpful tips.
Now it seems that the issue has gone.
I reduced sysctl net.ipv4.tcp_fin_timeout from 60 to 30.
In addition, I increased MaxClientConnections from 20 to 1000.
I run both the client and virtuoso server on the same machine.
Now I’m curious some data shown by status();.
I’ve made a query of counting uniq subjects and objects, which have taken long time.
The following is an excerpt of the command.

Database Status:
  File size 434110464, 373870336 pages, 137396831 free.
  20000000 buffers, 20000000 used, 1204988 dirty 2 wired down, repl age -129792777 0 w. io 2 w/crsr.
  Disk Usage: 10898724785 reads avg 0 msec, 26% r 0% w last  18096 s, 188582236 writes flush      8.696 MB/s,
    3238601 read ahead, batch = 64.  Autocompact 0 in 0 out, 0% saved.

In the data, could you tell me how to interpret repl age -129792777 0 w. io 2 w/crsr ?
And is a way of working around to rase NumberOfBuffers only?

Thanks.

Good to hear the sysctl net.ipv4.tcp_fin_timeout 30 and MaxClientConnections 1000 setting appear to have resolved the problem.

The repl age -129792777 output is a old status param for buffer's replace age, which is not used anymore and can be ignored.

I am not sure what you mean by And is a way of working around to rase NumberOfBuffers only ?

Looking at the status snippet provided I see 20000000 buffers, 20000000 used, indicating all the memory buffers are in use, which will result in the system having to swap between memory and disk to service queries whose data is not already loaded into memory. Is there a reason why NumberOfBuffers was reduced from 24000000 in a previous status output provided, given the system in running out of buffers ?

Thanks.
The virtuoso server I mentioned is another one.
Because the host machie runs multiple virtuoso servers, as long as swapping does not occur, it is desirable to make the memory space required for each process as small as possible.