Struggling with paralell uploads and queries to Virtuoso

Thank you! Hopefully that example can show us what we’re doing wrong

Any chance you can give an ETA if/when that example will be provided?

@Joel_Vik
Could you be clearer about the code for this.getRepositoryConnection() ?
If it just returns a ref to object VirtuosoRepositoryConnection or not.
Is your code similar to the following sample snippet?

VirtuosoRepository repo = new VirtuosoRepository("jdbc:virtuoso://" + instance + ":" + port, uid, pwd);

VirtuosoRepositoryConnection vc = repo.getConnection();

vc.begin(IsolationLevels.READ_UNCOMMITTED)

IRI context = vc.getValueFactory().createIRI(namedGraphURI);
vc.clear(context);

Reader graphReader = new StringReader(graph);
vc.add(graphReader, namedGraphURI, RDFFormat.RDFXML, context);

vc.commit();

Sure! I can share a bit more details about our code:

The getter is a simple getter that returns a org.eclipse.rdf4j.repository.RepositoryConnection:

public RepositoryConnection getRepositoryConnection() {
        return repositoryConnection;
    }

Just before getRespositoryConnection() is called, this method is called that initializes the repository and the connection:

protected void setupRepository() throws ConnectionErrorException {
        try {
            if (this.isRepositoryConnectionActive()) {
                this.logger.warn("Repository connection is already active. Closing it and opening a new one");
                this.closeRepositoryConnection();
            }
            
            String url = String.format("jdbc:virtuoso://%s:%d", this.getAddress(), this.getPort());
            this.setRepository(new VirtuosoRepository(url, this.getUsername(), this.getPassword()));
            this.getRepository().initialize();
            this.getRepository().setQueryTimeout(MAX_SQL_QUERY_EXECUTION_TIME);
            /*
            * Optimistic Concurrency Control.
            *  - presumes that probability and frequency of multiple users and
            *    processes instigating changes to the same database records is low.
            *    As result when an end-user or process attempts to change records
            *    it first of all determines if the record values at the point of change are
            *    still the same as what they were at the time of retrieval.
            *    If they are unchanged at the point of change then the change occurs otherwise
            *    the change process is rejected and then re-attempted.
            *    Although this reduces concurrent user latency,
            *    it does have the knock on effect of reducing data integrity if changes rejections aren't managed carefully.

            * Pessimistic Concurrency Control.
            *  - presumes that the probability and frequency of multiple user processing and
            *    instigating changes to the same records is high.
            *    As a result an end-user or process attempts to changes records it
            *    first of all secures Exclusive Locks on the records in question, performs the changes,
            *    and then releases the locks.
            *    Although this increases and preserves data integrity it does introduce concurrent use latency,
            *    which is perceived as performance degradation by the end-user or application developer.
            */
            this.getRepository().setConcurrencyMode(VirtuosoRepository.CONCUR_OPTIMISTIC);
            
            this.setRepositoryConnection(this.getRepository().getConnection());
        } catch (Exception e) {
            throw new ConnectionErrorException("Failed to setup repository", e);
        }
    }

The MAX_SQL_QUERY_EXECUTION_TIME is set to:

private static final int MAX_SQL_QUERY_EXECUTION_TIME = 7100; // Slighly less than the timeout set in the database

The clear method is called like this:

IRI context = this.getRepositoryConnection().getValueFactory().createIRI(namedGraphURI);
            
logger.debug("Clearing existing data for named graph \"{}\" within {}", namedGraphURI, transactionID);
this.getRepositoryConnection().clear(context);

And add is called like this:

Reader graphReader = new StringReader(graph);
IRI context = this.getRepositoryConnection().getValueFactory().createIRI(namedGraphURI);
            
this.logger.debug("Adding data to transaction {} for named graph \"{}\"", transactionID, namedGraphURI);
this.getRepositoryConnection().add(graphReader, namedGraphURI, RDFFormat.RDFXML, context);

Commit is called like this:

this.getRepositoryConnection().commit();

Hope this helps!

@kidehen @imitko
Test sample
Test_Perf_1.zip (425.3 KB)

Sample could be run via

gradlew run

Connection setting must be changed in build_gradle
the default

  args = ['localhost', '1111', 'dba', 'dba', 'repeatable_read', 'default']

The sample run 10 threads, where each thread

  • start transaction
  • clear graph
  • insert data, random count of triples between 50 to 5000
  • commit transaction

Also this sampe has deadlock handler for restart task again.

The issue:

  • first run works fine, because there aren’t any data in DB
  • second run, theads hang on Clear graph, BUT if I kill this process and run sample again, it works FINE but only once, the next run hangs again.

Thank you for providing an actual reproducible sample. Hope this helps the Virtuoso team!

@Joel_Vik

Did you tried the latest develop/7 branch with your application, if not please try and let us know if issue still persists?

@kidehen @imitko
Revised sample program.
Test_Perf_1.zip (425.3 KB)

Notes

The previous version generated incorrect graph names, if more than 20 threads where in use.
I’ve retested this new sample program with Virtuoso Opensource 7.2.3235 and everything worked fine without any hangs, BUT I had to increase the following entry in the virtuoso.ini

;  Server parameters
;
[Parameters]
...
MaxClientConnections     = 50

The default was 5 in my setup.
Once this change was applied, and server restarted, I could run the revised sample test program with upto 50 connections/threads with SUCCESS.

I assume you are set now?

That looks like promising news. I haven’t tested the new version yet. I’ll make sure to try it asap!

Note, your MaxClientConnections INI setting is the vital clue here. We are going to double-check that default settings for the open source edition are unrestricted going forward.

Thank you, we have set that to 100.

Hi again,

So I set up a new instance of Virtuoso built from develop/7, version 07.20.3236, and still have the same problems with concurrency. As @smalinin explained, the first iteration is fine, but on the second iteration clear() is blocked for all clients but one.

To make this easier to replicate, I’ve created a public repo with the application we use to test Virtuoso:

Maybe this can help you replicate the issue.

Virtuoso was run on AWS, t3.medium, Ubuntu 20.04.
I’m not allowed to add the .ini file, so here are some noticable stuff from it:

[Parameters]
ServerPort                      = 5820
ServerThreads                   = 200
LiteMode                        = 0
DisableUnixSocket               = 1
DisableTcpSocket                = 0
;SSLServerPort                  = 2111
;SSLCertificate                 = cert.pem
;SSLPrivateKey                  = pk.pem
;X509ClientVerify               = 0
;X509ClientVerifyDepth          = 0
;X509ClientVerifyCAFile         = ca.pem
MaxClientConnections            = 150
CheckpointInterval              = 60
O_DIRECT                        = 0
CaseMode                        = 2
MaxStaticCursorRows             = 5000
CheckpointAuditTrail            = 0
AllowOSCalls                    = 0
SchedulerInterval               = 10
DirsAllowed                     = ., /opt/virtuoso/share/virtuoso/vad, /usr/share/proj
ThreadCleanupInterval           = 0
ThreadThreshold                 = 20
ResourcesCleanupInterval        = 0
FreeTextBatchSize               = 100000
SingleCPU                       = 0
VADInstallDir                   = /opt/virtuoso/share/virtuoso/vad/
PrefixResultNames               = 0
RdfFreeTextRulesSize            = 100
IndexTreeMaps                   = 512
MaxMemPoolSize                  = 800000000
PrefixResultNames               = 0
MacSpotlight                    = 0
MaxQueryMem                     = 5G            ; memory allocated to query processor
VectorSize                      = 2000          ; initial parallel query vector (array of query operations) size
MaxVectorSize                   = 2000000       ; query vector size threshold.
AdjustVectorSize                = 1
ThreadsPerQuery                 = 8
AsyncQueueMaxThreads            = 20
DefaultIsolation                = 2             ; 1 for read uncommitted, 2 for read committed, 4 for repeatable read and 8 for serializable
                                                ; If nothing is specified, the default is repeatable read
;;
;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.

;; Uncomment next two lines if there is 4 GB system memory free
NumberOfBuffers                = 340000
MaxDirtyBuffers                  = 250000


TransactionAfterImageLimit      = 50000000000
QueryLog                        = /opt/virtuoso/logs/querylog.log

@kidehen @imitko @hwilliams @Joel_Vik
New Virtuoso-Concurrency-Test worked with success with both Virtuoso 8.3 and Virtuoso 7.2.3234
I launched test twice.
VOS 7.2 settings

;
;  Server parameters
;
[Parameters]
ServerPort			= 1111
LiteMode			= 0
DisableUnixSocket		= 1
DisableTcpSocket		= 0
;SSLServerPort			= 2111
;SSLCertificate			= cert.pem
;SSLPrivateKey			= pk.pem
;X509ClientVerify		= 0
;X509ClientVerifyDepth		= 0
;X509ClientVerifyCAFile		= ca.pem
MaxClientConnections		= 50
CheckpointInterval		= 60
O_DIRECT			= 0
CaseMode			= 2
MaxStaticCursorRows		= 5000
CheckpointAuditTrail		= 0
AllowOSCalls			= 0
SchedulerInterval		= 10
DirsAllowed			= ., /usr/local/virtuoso-opensource/share/virtuoso/vad, /usr/share/proj, /mnt/hgfs/WORK
ThreadCleanupInterval		= 0
ThreadThreshold			= 10
ResourcesCleanupInterval	= 0
FreeTextBatchSize		= 100000
SingleCPU			= 0
VADInstallDir			= /usr/local/virtuoso-opensource/share/virtuoso/vad/
PrefixResultNames               = 0
RdfFreeTextRulesSize		= 100
IndexTreeMaps			= 256
MaxMemPoolSize                  = 200000000
PrefixResultNames               = 0
MacSpotlight                    = 0
IndexTreeMaps                   = 64
MaxQueryMem 		 	= 2G		; memory allocated to query processor
VectorSize 		 	= 1000		; initial parallel query vector (array of query operations) size
MaxVectorSize 		 	= 1000000	; query vector size threshold.
AdjustVectorSize 	 	= 0
ThreadsPerQuery 	 	= 4
AsyncQueueMaxThreads 	 	= 10
;;
NumberOfBuffers          = 10000
MaxDirtyBuffers          = 6000

Great! I’ll try it with your config. How do I install 7.2.3234 or 7.2.3235? Latest develop/7 is 7.2.3236?

@Joel_Vik
VOS 7.2.3236 works fine also, I rebuilt binary and rerun test twice.

Note I have tested the app you provided against the latest develop/7 branch (Version 07.20.3236-pthreads for Linux as of Apr 21 2023 (e5312f0a3)) which works for me both with my existing INI file and the setting you provided for your setup above:

ubuntu@ip-172-30-0-77:~/git/Virtuoso-Concurrency-Test$ mvn clean compile exec:java
[INFO] Scanning for projects...
[INFO] 
[INFO] -----------------------< se.iquest:stress-test >------------------------
[INFO] Building stress-test 0.1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ stress-test ---
[INFO] Deleting /home/ubuntu/git/Virtuoso-Concurrency-Test/target
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ stress-test ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/ubuntu/git/Virtuoso-Concurrency-Test/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ stress-test ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 15 source files to /home/ubuntu/git/Virtuoso-Concurrency-Test/target/classes
[INFO] 
[INFO] --- exec-maven-plugin:3.1.0:java (default-cli) @ stress-test ---
Main 2023-05-17 20:47:18:841: Logging initialized
Main 2023-05-17 20:47:18:970: Starting client STClientThread14
Main 2023-05-17 20:47:18:971: Starting client STClientThread15
Main 2023-05-17 20:47:18:971: Starting client STClientThread16
Main 2023-05-17 20:47:18:973: Starting client STClientThread17
Main 2023-05-17 20:47:18:973: Starting client STClientThread18
STClientThread14 2023-05-17 20:47:19:766: Replacing UUIDs and miner name in miner file
STClientThread17 2023-05-17 20:47:19:770: Replacing UUIDs and miner name in miner file
STClientThread18 2023-05-17 20:47:19:771: Replacing UUIDs and miner name in miner file
...
STClientThread18 2023-05-17 20:54:14:853: Adding data to transaction d065edd9-0e52-4a46-a7d3-df49177736b4 for named graph ":5f563c9c-f0c3-44b9-b73e-9dc82b812326"
STClientThread18 2023-05-17 20:54:14:870: Adding data to transaction d065edd9-0e52-4a46-a7d3-df49177736b4 for named graph ":5f563c9c-f0c3-44b9-b73e-9dc82b812326"
STClientThread18 2023-05-17 20:54:14:915: Successfully committed transaction: d065edd9-0e52-4a46-a7d3-df49177736b4
STClientThread18 2023-05-17 20:54:14:916: Closing Repository connection
STClientThread18 2023-05-17 20:54:14:916: Shutting down Repository
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  06:59 min
[INFO] Finished at: 2023-05-17T20:54:14Z
[INFO] ------------------------------------------------------------------------
ubuntu@ip-172-30-0-77:~/git/Virtuoso-Concurrency-Test$

That sounds great. Hopefully that means I’m doing something wrong. I just want to make sure we all agree on what a successful run is. When I start the application, all client threads are uploading to new empty named graphs with excellent concurrency, as shown by the output. All threads are concurrently uploading data:

STClientThread16 2023-05-22 09:25:18:117: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread13 2023-05-22 09:25:18:123: Adding data to transaction be417e99-d99b-4617-8456-42c315d33926 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread14 2023-05-22 09:25:18:176: Adding data to transaction 30bdf024-b20f-4b92-bf03-ddf5d24b884e for named graph ":c2c92006-7f0c-4255-8014-60ca64a7cb65"
STClientThread15 2023-05-22 09:25:18:192: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread12 2023-05-22 09:25:18:255: Adding data to transaction db61daf4-0943-4598-bb66-2a1d5895bd45 for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread16 2023-05-22 09:25:18:286: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread13 2023-05-22 09:25:18:286: Adding data to transaction be417e99-d99b-4617-8456-42c315d33926 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread14 2023-05-22 09:25:18:348: Adding data to transaction 30bdf024-b20f-4b92-bf03-ddf5d24b884e for named graph ":c2c92006-7f0c-4255-8014-60ca64a7cb65"
STClientThread15 2023-05-22 09:25:18:364: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread12 2023-05-22 09:25:18:426: Adding data to transaction db61daf4-0943-4598-bb66-2a1d5895bd45 for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"

However, after they complete their transactions and begin the second iteration, now uploading to named graphs that already have data, only one transaction for one client is uploading data, while the others are stuck on clear():

STClientThread12 2023-05-22 09:28:27:770: Successfully committed transaction: db61daf4-0943-4598-bb66-2a1d5895bd45
STClientThread12 2023-05-22 09:28:27:770: Closing Repository connection
STClientThread12 2023-05-22 09:28:27:770: Shutting down Repository
STClientThread12 2023-05-22 09:28:27:770: Starting to upload graphs to db "Whatever" at 172.31.7.144:5820
STClientThread12 2023-05-22 09:28:27:801: Began transaction with internal transaction ID: 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d with Isolation Level READ_UNCOMMITTED
STClientThread12 2023-05-22 09:28:27:801: Clearing existing data for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8" within 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d
STClientThread13 2023-05-22 09:28:27:927: Adding data to transaction be417e99-d99b-4617-8456-42c315d33926 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread16 2023-05-22 09:28:28:021: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread14 2023-05-22 09:28:28:021: Adding data to transaction 30bdf024-b20f-4b92-bf03-ddf5d24b884e for named graph ":c2c92006-7f0c-4255-8014-60ca64a7cb65"
STClientThread15 2023-05-22 09:28:28:021: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread14 2023-05-22 09:28:28:307: Adding data to transaction 30bdf024-b20f-4b92-bf03-ddf5d24b884e for named graph ":c2c92006-7f0c-4255-8014-60ca64a7cb65"
STClientThread15 2023-05-22 09:28:28:311: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread16 2023-05-22 09:28:28:313: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread13 2023-05-22 09:28:28:442: Successfully committed transaction: be417e99-d99b-4617-8456-42c315d33926
STClientThread13 2023-05-22 09:28:28:442: Closing Repository connection
STClientThread13 2023-05-22 09:28:28:442: Shutting down Repository
STClientThread13 2023-05-22 09:28:28:442: Starting to upload graphs to db "Whatever" at 172.31.7.144:5820
STClientThread15 2023-05-22 09:28:28:504: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread13 2023-05-22 09:28:28:520: Began transaction with internal transaction ID: 511cc2b0-f32b-4c97-ac52-cf781f403964 with Isolation Level READ_UNCOMMITTED
STClientThread13 2023-05-22 09:28:28:520: Clearing existing data for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d" within 511cc2b0-f32b-4c97-ac52-cf781f403964
STClientThread16 2023-05-22 09:28:28:520: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread15 2023-05-22 09:28:28:692: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread16 2023-05-22 09:28:28:692: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread14 2023-05-22 09:28:28:707: Successfully committed transaction: 30bdf024-b20f-4b92-bf03-ddf5d24b884e
STClientThread14 2023-05-22 09:28:28:707: Closing Repository connection
STClientThread14 2023-05-22 09:28:28:707: Shutting down Repository
STClientThread14 2023-05-22 09:28:28:707: Starting to upload graphs to db "Whatever" at 172.31.7.144:5820
STClientThread14 2023-05-22 09:28:28:739: Began transaction with internal transaction ID: b0223298-3769-4c18-84cd-40da0b487548 with Isolation Level READ_UNCOMMITTED
STClientThread14 2023-05-22 09:28:28:739: Clearing existing data for named graph ":c2c92006-7f0c-4255-8014-60ca64a7cb65" within b0223298-3769-4c18-84cd-40da0b487548
STClientThread15 2023-05-22 09:28:28:849: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread16 2023-05-22 09:28:28:858: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread15 2023-05-22 09:28:29:006: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread16 2023-05-22 09:28:29:021: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread15 2023-05-22 09:28:29:173: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread16 2023-05-22 09:28:29:182: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread15 2023-05-22 09:28:29:329: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread16 2023-05-22 09:28:29:337: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread16 2023-05-22 09:28:29:458: Adding data to transaction 91e3725a-c4c2-4260-9f52-0860b8279afd for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94"
STClientThread15 2023-05-22 09:28:29:474: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread15 2023-05-22 09:28:29:661: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread15 2023-05-22 09:28:29:817: Adding data to transaction a15496af-f980-4c98-913f-5d8549db4c8f for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc"
STClientThread16 2023-05-22 09:28:29:836: Successfully committed transaction: 91e3725a-c4c2-4260-9f52-0860b8279afd
STClientThread16 2023-05-22 09:28:29:836: Closing Repository connection
STClientThread16 2023-05-22 09:28:29:838: Shutting down Repository
STClientThread16 2023-05-22 09:28:29:839: Starting to upload graphs to db "Whatever" at 172.31.7.144:5820
STClientThread16 2023-05-22 09:28:29:849: Began transaction with internal transaction ID: cf7a3247-be8e-4708-b328-50d48d539783 with Isolation Level READ_UNCOMMITTED
STClientThread16 2023-05-22 09:28:29:849: Clearing existing data for named graph ":7e282e37-8ebd-4856-9e44-04efbb0b3f94" within cf7a3247-be8e-4708-b328-50d48d539783
STClientThread15 2023-05-22 09:28:30:177: Successfully committed transaction: a15496af-f980-4c98-913f-5d8549db4c8f
STClientThread15 2023-05-22 09:28:30:177: Closing Repository connection
STClientThread15 2023-05-22 09:28:30:177: Shutting down Repository
STClientThread15 2023-05-22 09:28:30:177: Starting to upload graphs to db "Whatever" at 172.31.7.144:5820
STClientThread15 2023-05-22 09:28:30:208: Began transaction with internal transaction ID: 76d11a0d-290c-43af-bc4d-8c3c9e91ffb8 with Isolation Level READ_UNCOMMITTED
STClientThread15 2023-05-22 09:28:30:208: Clearing existing data for named graph ":392fe0ee-336a-4a57-86d1-7d6d445842dc" within 76d11a0d-290c-43af-bc4d-8c3c9e91ffb8
STClientThread12 2023-05-22 09:28:31:259: Beginning to add data for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:31:380: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:31:520: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:31:692: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:31:833: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:32:013: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:32:160: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:32:323: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:32:481: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:32:652: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:32:807: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:32:974: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:33:130: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:33:307: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:33:458: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:33:629: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:33:787: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:33:958: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:34:114: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:34:271: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:34:443: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:34:598: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:34:754: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:34:927: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:35:083: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:35:255: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:35:411: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:35:583: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:35:739: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:35:879: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:36:052: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:36:208: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:36:378: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:36:536: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:28:36:692: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"

Note how only STClientThread12 is adding data and the other threads last log message was “Clearing existing data for named graph …”. After STClientThread12 is done with it’s transaction and commit, another client that was waiting for clear() now resumes its transaction. In this test run it was STClientThread13. The other clients are still stuck on clear():

STClientThread12 2023-05-22 09:31:20:324: Adding data to transaction 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8"
STClientThread12 2023-05-22 09:31:20:610: Successfully committed transaction: 297a44ff-525f-4abc-8bd2-ae1e44dc2b7d
STClientThread12 2023-05-22 09:31:20:610: Closing Repository connection
STClientThread12 2023-05-22 09:31:20:610: Shutting down Repository
STClientThread12 2023-05-22 09:31:20:611: Starting to upload graphs to db "Whatever" at 172.31.7.144:5820
STClientThread12 2023-05-22 09:31:20:642: Began transaction with internal transaction ID: 2053a8c9-d385-42ac-aea2-590631d20d34 with Isolation Level READ_UNCOMMITTED
STClientThread12 2023-05-22 09:31:20:642: Clearing existing data for named graph ":9320a557-8bd7-4579-92be-bda0eaabdac8" within 2053a8c9-d385-42ac-aea2-590631d20d34
STClientThread13 2023-05-22 09:31:22:208: Beginning to add data for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:22:538: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:22:660: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:22:826: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:22:996: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:23:144: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:23:299: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:23:455: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:23:611: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:23:782: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:23:937: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:24:078: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:24:261: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:24:420: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:24:585: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:24:751: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:24:914: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:25:063: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"
STClientThread13 2023-05-22 09:31:25:214: Adding data to transaction 511cc2b0-f32b-4c97-ac52-cf781f403964 for named graph ":8680e10a-6011-468b-ac47-6cf93178a44d"

So after the first iteration, they now take turns uploading data, which is what I mean with no concurrency. Are you @smalinin and @hwilliams not experiencing this behaviour? Does the log show concurrent uploads even during the second iteration?

@Joel_Vik Yes, we do observerthe threads executing singularly after the first iteration, which is due to the application attempting to load a large dataset with a mixture of inserts/deletes creating many locks in the RDF_QUAD table used for storing the data in the database.

The Performance of your test application was increased by factor of 3, after improving the use of Batch inserts by preloading the data from files to the RDF4J Memory Store after which the data is copied from the Memory Store to Virtuoso. Performance was also increased a little by ~12% by using commit after clear graph operations.

The changes made by @smalinin are contained in this git clone of your original git tree, which you can try to see the improvements made.

Thank you, appreciate the time you’ve taken to investigate this and for the performance improvements you’ve taken your time to commit. So my take from this is that the concurrency can’t be improved after the first iteration, mainly due to the size of our datasets, but hopefully the other improvements can help us negate some of that. I’ll create a pull request and merge your changes.