I have a query like this which is used to retrieve all iri_id in numerical form from the RDF_QUAD table given that they are really “IRIs” and not blanknodes for a specific very large graph.
select iri_id_num(RDF_QUAD.S) from RDF_QUAD table option (index RDF_QUAD_GS) where RDF_QUAD.G = iri_to_id('https://sparql.uniprot.org/uniparc') and isiri_id(RDF_QUAD.S) > 0 and is_bnode_iri_id(RDF_QUAD.S) = 0
This query takes multiple days to run, consuming not much CPU. And most time on the JDBC side is spend waiting/filling reading data from the socket input stream.
The JDBC statement is made as
try (final Statement createStatement = quadStoreConnection.createStatement()) {
createStatement.setFetchDirection(ResultSet.FETCH_FORWARD);
createStatement.setFetchSize(4 * 1024);
// actual java work that does not show up in the profile
/extractUniqueIRIIdsPerGraph(createStatement, roaringBitmap);
}
Is there another thing I could change?
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(java.base@11.0.17/Native Method)
at java.net.SocketInputStream.socketRead(java.base@11.0.17/SocketInputStream.java:115)
at java.net.SocketInputStream.read(java.base@11.0.17/SocketInputStream.java:168)
at java.net.SocketInputStream.read(java.base@11.0.17/SocketInputStream.java:140)
at java.io.BufferedInputStream.fill(java.base@11.0.17/BufferedInputStream.java:252)
at java.io.BufferedInputStream.read(java.base@11.0.17/BufferedInputStream.java:271)
- locked <0x00007f2ef8c17f90> (a virtuoso.jdbc4.VirtuosoInputStream)
at virtuoso.jdbc4.VirtuosoInputStream.read(VirtuosoInputStream.java:48)
at virtuoso.jdbc4.VirtuosoInputStream.readshortint(VirtuosoInputStream.java:580)
at virtuoso.jdbc4.VirtuosoInputStream.read_object(VirtuosoInputStream.java:348)
at virtuoso.jdbc4.VirtuosoInputStream.read_object(VirtuosoInputStream.java:179)
at virtuoso.jdbc4.VirtuosoConnection.read_request(VirtuosoConnection.java:918)
at virtuoso.jdbc4.VirtuosoFuture.nextResult(VirtuosoFuture.java:150)
at virtuoso.jdbc4.VirtuosoResultSet.process_result(VirtuosoResultSet.java:460)
at virtuoso.jdbc4.VirtuosoResultSet.next(VirtuosoResultSet.java:2265)
- locked <0x00007f2ef8c17fd0> (a virtuoso.jdbc4.VirtuosoConnection)
The above is the most common stacktrace for this method. For a smaller graph (in the same humongous database) there does not seem to be a significant difference between using the isql and jdbc client.
The status(‘c’) output might be interesting too
isql ** ** ** exec="status('c')"
Connected to OpenLink Virtuoso
Driver: 07.20.3214 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
REPORT
VARCHAR
_______________________________________________________________________________
OpenLink Virtuoso Server
Version 07.20.3236-pthreads for Linux as of Apr 4 2023 (7d0f3bb)
Started on: 2023-05-24 16:59 GMT+2
Database Status:
File size 603979776, 986259456 pages, 358817590 free.
52605124 buffers, 52605077 used, 3 dirty 0 wired down, repl age 699110807 0 w. io 3 w/crsr.
Disk Usage: 185211414 reads avg 0 msec, 0% r 0% w last 24933 s, 51817 writes flush 0 MB/s,
3202784 read ahead, batch = 57. Autocompact 0 in 0 out, 0% saved.
Gate: 4539624 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = /work/nobackup/sparql/uniprot/main/2023_02/./data/triples/virtuoso.trx, 939319 bytes
627420308 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 119327 connects, max 78 concurrent
RPC: 32055695 calls, 1248 pending, 1259 max until now, 0 queued, 1207 burst reads (0%), 2 second 0M large, 1200M max
Checkpoint Remap 38 pages, 0 mapped back. 2 s atomic time.
DB master 986259456 total 358817590 free 38 remap 1 mapped back
temp 13829888 total 13829882 free
Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
Currently 1 threads running 0 threads waiting 0 threads in vdb.
Pending:
Client 1112:46: Account: dba, 528212985 bytes in, 1533292794636 bytes out, 1 stmts.
PID: 0, OS: Linux, Application: JDBC, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Client 1112:119324: Account: dba, 219 bytes in, 294 bytes out, 1 stmts.
PID: 2549729, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks:
Running Statements:
Time (msec) Text
13006 status('c')
38 Rows. -- 13006 msec.
Using the /proc/PID/io counter of the kernel I note that the server process is reading data at about 50Kilobytes per second.
The virtuoso server CPU utilization is about 9 to 10% of a single core, while the reading jvm uses also about 6% to 10% cpu of a single core.