Problem with violating unique index RDF_OBJ

Hello dear Virtuoso,

We get the following error after making multiple insert queries with log_enable(3,1) and then making checkpoint.

SPARQL Error: SQLSTATE[23000]: Integrity constraint violation: -1 [OpenLink][Virtuoso iODBC Driver][Virtuoso Server]SR175: Uniqueness violation : Violating unique index RDF_OBJ on table DB.DBA.RDF_OBJ. Transaction killed.

We can’t add any new triples with literals to the database.

Also we found some duplicate id-s in Primary Key with the query:

SELECT TOP 100 RO_ID, COUNT(*) FROM DB.DBA.RDF_OBJ GROUP BY RO_ID HAVING COUNT(*) > 1

Query result:

RO_ID
BIGINT    aggregate
INTEGER
421454300     2
421454297     2
421454298     2
421454299     2
421454301     2
421454302     2
421454303     2
421454304     2
421454305     2
421454306     2
421454307     2
421454308     2
...

Could you recommend possible course of action to resolve the problem?

Thank you!
Stas

virtuoso version

Version 7.2.6-rc1.3230-pthreads as of Aug 9 2019 (e2a413e)

virtuoso.ini

[Database]
DatabaseFile       = /var/lib/virtuoso-opensource-7/db/virtuoso.db
ErrorLogFile       = /var/lib/virtuoso-opensource-7/db/virtuoso.log
LockFile           = /var/lib/virtuoso-opensource-7/db/virtuoso.lck
TransactionFile    = /var/lib/virtuoso-opensource-7/db/trxlog/virtuoso20190904090128.trx
xa_persistent_file = /var/lib/virtuoso-opensource-7/db/virtuoso.pxa
;DatabaseFile       = /var/lib/virtuoso/virtuoso.db
;ErrorLogFile       = /var/lib/virtuoso/virtuoso.log
;LockFile           = /var/lib/virtuoso/virtuoso.lck
;TransactionFile    = /var/lib/virtuoso/trxlog/virtuoso20181220201625.trx
;xa_persistent_file = /var/lib/virtuoso/virtuoso.pxa
ErrorLogLevel      = 7
FileExtend         = 200
MaxCheckpointRemap = 500000
Striping           = 0
TempStorage        = TempDatabase
#TransactionFile   = virtuoso20190522051400.trx

[TempDatabase]
DatabaseFile       = /var/lib/virtuoso-opensource-7/db/virtuoso-temp.db
TransactionFile    = /var/lib/virtuoso-opensource-7/db/virtuoso-temp.trx
;DatabaseFile       = /var/lib/virtuoso/virtuoso-temp.db
;TransactionFile    = /var/lib/virtuoso/virtuoso-temp.trx
MaxCheckpointRemap = 2000
Striping           = 0

;
; Server parameters
;
[Parameters]
DefaultIsolation           = 2
ServerPort                 = 1111
LiteMode                   = 0
DisableUnixSocket          = 1
DisableTcpSocket           = 0
;SSLServerPort = 2111
;SSLCertificate = cert.pem
;SSLPrivateKey = pk.pem
;X509ClientVerify = 0
;X509ClientVerifyDepth = 0
;X509ClientVerifyCAFile = ca.pem
TransactionAfterImageLimit = 99999999
MaxClientConnections       = 100
CheckpointInterval         = 60
O_DIRECT                   = 0
CaseMode                   = 2
MaxStaticCursorRows        = 5000
CheckpointAuditTrail       = 1
AllowOSCalls               = 0
SchedulerInterval          = 10
;DirsAllowed              = ., /usr/share/virtuoso/vad, /import, /EGRUL
DirsAllowed                = ., /usr/share/virtuoso-opensource-7/vad, /import, /EGRUL
ThreadCleanupInterval      = 1
ThreadThreshold            = 10
ResourcesCleanupInterval   = 1
FreeTextBatchSize          = 100000
SingleCPU                  = 0
VADInstallDir              = /usr/share/virtuoso-opensource-7/vad/
;VADInstallDir            = /usr/share/virtuoso/vad/
PrefixResultNames          = 0
RdfFreeTextRulesSize       = 100
IndexTreeMaps              = 256
;IndexTreeMaps = 256
MaxMemPoolSize             = 6000000000
;MaxMemPoolSize = 200000000
PrefixResultNames          = 0
MacSpotlight               = 0
IndexTreeMaps              = 64
MaxQueryMem                = 32G	; memory allocated to query processor
;HashJoinSpace = 8G
VectorSize                 = 10000	; initial parallel query vector (array of query operations) size
MaxVectorSize              = 3500000	; query vector size threshold.
AdjustVectorSize           = 0
ThreadsPerQuery            = 32
AsyncQueueMaxThreads       = 32
TraceOn                    = user_log, failed_log, user_names, compile, ddl_log, client_sql, errors, dsn, sql_send, transact, remote_transact, exec, soap, cursor
;;
;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.
;;
;; Uncomment next two lines if there is 2 GB system memory free
;NumberOfBuffers = 170000
;MaxDirtyBuffers = 130000
;; Uncomment next two lines if there is 4 GB system memory free
;NumberOfBuffers = 340000
; MaxDirtyBuffers = 250000
;; Uncomment next two lines if there is 8 GB system memory free
;NumberOfBuffers = 680000
;MaxDirtyBuffers = 500000
;; Uncomment next two lines if there is 16 GB system memory free
;NumberOfBuffers = 1360000
;MaxDirtyBuffers = 1000000
;; Uncomment next two lines if there is 32 GB system memory free
;NumberOfBuffers = 2720000
;MaxDirtyBuffers = 2000000
;; Uncomment next two lines if there is 48 GB system memory free
;NumberOfBuffers = 4000000
;MaxDirtyBuffers = 3000000
;; Uncomment next two lines if there is 64 GB system memory free
;NumberOfBuffers = 5450000
;MaxDirtyBuffers = 4000000
;;
;; Note the default settings will take very little memory
;; but will not result in very good performance
;;
NumberOfBuffers            = 20710000
MaxDirtyBuffers            = 15200000
;NumberOfBuffers          = 7425000
;MaxDirtyBuffers          = 5568750

[HTTPServer]
ServerPort                  = 8190
SSLPort                     = 8890
SSLCertificate              = /data/ssl/fullchain.pem
SSLPrivateKey               = /data/ssl/privkey.pem
X509ClientVerify            = 0
ServerRoot                  = /var/lib/virtuoso-opensource-7/vsp
;ServerRoot                  = /USR/lib/virtuoso/vsp
MaxClientConnections        = 10
DavRoot                     = DAV
EnabledDavVSP               = 0
HTTPProxyEnabled            = 0
TempASPXDir                 = 0
DefaultMailServer           = localhost:25
MaxKeepAlives               = 10
KeepAliveTimeout            = 10
MaxCachedProxyConnections   = 10
ProxyConnectionCacheTimeout = 15
HTTPThreadSize              = 680000;280000
HttpPrintWarningsInOutput   = 0
Charset                     = UTF-8
;HTTPLogFile = /var/log/virtuoso/http.log
MaintenancePage             = atomic.html
EnabledGzipContent          = 1

[AutoRepair]
BadParentLinks = 0

[Client]
SQL_PREFETCH_ROWS  = 100
SQL_PREFETCH_BYTES = 16000
SQL_QUERY_TIMEOUT  = 0
SQL_TXN_TIMEOUT    = 0
;SQL_NO_CHAR_C_ESCAPE = 1
;SQL_UTF8_EXECS = 0
;SQL_NO_SYSTEM_TABLES = 0
;SQL_BINARY_TIMESTAMP = 1
;SQL_ENCRYPTION_ON_PASSWORD = -1

[VDB]
ArrayOptimization           = 0
NumArrayParameters          = 10
VDBDisconnectTimeout        = 1000
KeepConnectionOnFixedThread = 0

[Replication]
;ServerName   = db-CE05D567F527
;ServerEnable = 1
;QueueMax     = 50000

;
; Striping setup
;
; These parameters have only effect when Striping is set to 1 in the
; [Database] section, in which case the DatabaseFile parameter is ignored.
;
; With striping, the database is spawned across multiple segments
; where each segment can have multiple stripes.
;
; Format of the lines below:
; Segment<number> = <size>, <stripe file name> [, <stripe file name> .. ]
;
; <number> must be ordered from 1 up.
;
; The <size> is the total size of the segment which is equally divided
; across all stripes forming the segment. Its specification can be in
; gigabytes (g), megabytes (m), kilobytes (k) or in database blocks
; (b, the default)
;
; Note that the segment size must be a multiple of the database page size
; which is currently 8k. Also, the segment size must be divisible by the
; number of stripe files forming the segment.
;
; The example below creates a 200 meg database striped on two segments
; with two stripes of 50 meg and one of 100 meg.
;
; You can always add more segments to the configuration, but once
; added, do not change the setup.
;
[Striping]
Segment1 = 100M, db-seg1-1.db, db-seg1-2.db
Segment2 = 100M, db-seg2-1.db
;...
;[TempStriping]
;Segment1 = 100M, db-seg1-1.db, db-seg1-2.db
;Segment2 = 100M, db-seg2-1.db
;...
;[Ucms]
;UcmPath = <path>
;Ucm1 = <file>
;Ucm2 = <file>
;...

[Zero Config]
ServerName = virtuoso (CE05D567F527)
;ServerDSN = ZDSN
;SSLServerName =
;SSLServerDSN =

[Mono]
;MONO_TRACE = Off
;MONO_PATH = <path_here>
;MONO_ROOT = <path_here>
;MONO_CFG_DIR = <path_here>
;virtclr.dll =

[URIQA]
DynamicLocal = 0
DefaultHost  = localhost:8890

[SPARQL]
;ExternalQuerySource = 1
;ExternalXsltSource = 1
;DefaultGraph = http://localhost:8890/dataspace
;ImmutableGraphs = http://localhost:8890/dataspace
ResultSetMaxRows           = 1000
MaxQueryCostEstimationTime = 60000	; in seconds
MaxQueryExecutionTime      = 0	; in seconds
DefaultQuery               = PREFIX crm2: <http://sp7.ru/ontology/> SELECT * WHERE { ?a ?sp crm2:Organization. ?sp rdf:singletonPropertyOf rdf:type. } LIMIT 100
DeferInferenceRulesInit    = 0	; controls inference rules loading
;PingService = http://rpc.pingthesemanticweb.com/

[Plugins]
LoadPath = /usr/lib/virtuoso-opensource-7/hosting
;LoadPath = /usr/lib/virtuoso/hosting
Load1    = plain, wikiv
Load2    = plain, mediawiki
Load3    = plain, creolewiki
Load4    = plain, im
;Load5 = plain, wbxml2
;Load6 = plain, hslookup
;Load7 = attach, libphp5.so
;Load8 = Hosting, hosting_php.so
;Load9 = Hosting,hosting_perl.so
;Load10 = Hosting,hosting_python.so
;Load11 = Hosting,hosting_ruby.so
;Load12 = msdtc,msdtc_sample

Triple count: 5 142 593 917

Hi Stas,

I would suggest following:

  1. compare

    SELECT TOP 100 RO_ID, COUNT(*) FROM DB.DBA.RDF_OBJ table option (index RDF_OBJ) GROUP BY RO_ID HAVING COUNT(*) > 1;
    

    vs

    SELECT TOP 100 RO_ID, COUNT(*) FROM DB.DBA.RDF_OBJ table option (index RO_VAL) GROUP BY RO_ID HAVING COUNT(*) > 1
    
  2. see if secondary index (RO_VAL) is broken or primary key.

  3. if secondary index is broken then make full online backup of database, drop the index, and re-create it with

    create index RO_VAL on DB.DBA.RDF_OBJ (RO_VAL, RO_DT_AND_LANG)
    

Also, please could you specify what operations have been done in order to get to this situation, i.e., how many clients are used for inserts, number of triples, distribution of the data, etc., as much as is possible to reproduce this problem. The database log file (e.g., virtuoso.log) also might be be interesting, especially if you turned on error logging via trace_on('errors') during the operation which leads to the problem.

HTH

Mitko

Thank you, Mitko,

I am sorry it took us so long to answer.

Our database started crashing regularly.

The problem with index most likely arose after we executed this query multiple times.

DEFINE sql:log-enable 3
PREFIX crm2: <http://sp7.ru/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
INSERT INTO GRAPH <http://sp7.ru/>
{ ?name crm2:label_lower ?name_label_lower }
WHERE {
?name rdf:type ?type. ?name rdfs:label ?name_label.
?type crm2:need_calculate_crm2pr_predicate crm2:need_calculate_crm2pr_object.
MINUS { ?name crm2:label_lower ?lower.}
BIND (lcase(str(?name_label)) as ?name_label_lower).
}
LIMIT 500000

There was some additional details.

We have our checkpoint turned off. They are made once per day at night.

So trx log grew up to 1 GB in size.

Also after those commands, Virtuoso started to crash several seconds after start.

So we allowed checkpoints after start again.

Checkpoint was made right after start.

And Virtuoso stopped crashing.

But after that it also stopped accepting new INSERTs.


Thank you for your recommendations.

We tried COUNT commands, but unfortunately they gave us timeout through iSQL.

root@test12 ~ # cat t.txt 
Initializing PRNG
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Mar 13 2019
Type HELP; for help and EXIT; to exit.
*** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server
at line 0 of Top-Level:
SELECT TOP 100 RO_ID, COUNT(*) FROM DB.DBA.RDF_OBJ table option (index RDF_OBJ) 
GROUP BY RO_ID HAVING COUNT(*) > 1
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver

We also rebuilt the index like this.

DROP INDEX RO_VAL DB.DBA.RDF_OBJ;
create index RO_VAL on DB.DBA.RDF_OBJ (RO_VAL, RO_DT_AND_LANG);

But it didn’t help.

Finally we took earlier database backup, so the problem is gone for now.

Virtuoso.log is gone unfortunately. :frowning:

Maybe we can close this ticket.

Thank you again for your help!

Stas