Hello team,
I am running 7.2.13 community edition Virtuoso.
Queries like
SELECT DISTINCT ?p WHERE { GRAPH <http://my.host/my_graph/> { ?s ?p ?o . } }
with default indexing schema are performed via scanning RDF_QUAD_PGOS index.
According to your suggestion given on [Performance Tuning Virtuoso for RDF Queries and Other Use] I tried to create bitmap index RDF_QUAD_GPOS . Unfortunately the index creation ended up with error
[Code: -8, SQL State: 23000] SR175: Uniqueness violation : Violating unique index RDF_QUAD_GPOS on table DB.DBA.RDF_QUAD. Transaction killed.
This error is quite misleading, as the index was not asked to be unique. Moreover, uniqueness of tripples in RDF_QUAD table are guaranteed by its primary key.
Going further, I created an index on P, like
CREATE DISTINCT NO PRIMARY KEY REF COLUMN INDEX RDF_QUAD_P
ON DB.DBA.RDF_QUAD (P)
but this index is ignored, according to EXPLAIN
Does anybody have a suggestion on an index that might be more selective with the above query ?
Thanks a lot !
The SR175: Uniqueness violation : Violating unique index RDF_QUAD_GPOS on table DB.DBA.RDF_QUAD. Transaction killed.
error probably indicates there are some broken RDF indexes or other corruption in the database.
Thus you should perform a database integrity check with the command:
backup '/dev/null';
Check the primary key on the RDF_QUAD table is not broken by running the command:
select count (s), count (p ), count (o ), count (g ) from rdf_quad table option (index rdf_quad, check);
The counts of the s, p, o, g
columns should all be the same
Check the RDF Index counts with the queries:
select count (*) from rdf_quad a table option (index rdf_quad) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_pogs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index primary key) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_op) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_sp) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_gs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
If any counts are not 0
it indicates a bad index.
Hugh, thanks for directions.
- backup went through without any errors
- 4 counts from the PK index are all same, though negative -962394511
- All 5 queries comparing the 5 RDF indices return 0
I also tried to create columnar index RDF_QUAD_GPSO, as you suggested in the same post.
CREATE COLUMN INDEX RDF_QUAD_GPSO ON RDF_QUAD (G, P, S, O)
This went ok, and this index has lower cardinality, hence taken by optimiser for my query in question. However, the respond contains only a part of predicates, several less omnipresent ones [that I know are in my RDF dataset] are missed. This confirms your concern about data integrity. Or about validity of columnar index.
Any thoughts ?
thanks
Dmitry.
What does the output of the query select count(*) from rdf_quad
return ?
What does the output of the command status();
return ?
What does the output of the query select count(*) from rdf_quad return ?
7627540081
What does the output of the command status();
return ?
OpenLink Virtuoso Server
Version 07.20.3240-pthreads for Linux as of Sep 30 2024 (000000)
Started on: 2025-04-10 13:40 GMT+2 (up 3 days 20:41)
CPU: 0.60% RSS: 104113MB VSZ: 106380MB PF: 433
Database Status:
File size 973078528, 55169024 pages, 23103037 free.
10900000 buffers, 10899943 used, 10 dirty 0 wired down, repl age 19285907 0 w. io 0 w/crsr.
Disk Usage: 39860611 reads avg 0 msec, 0% r 0% w last 0 s, 245065 writes flush 0.5967 MB/s,
84138 read ahead, batch = 337. Autocompact 4 in 2 out, 40% saved.
Gate: 450608 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = /db/wrk2/vdb-12/virtuoso.trx, 3911 bytes
32064893 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 10 connects, max 4 concurrent
RPC: 145 calls, 1 pending, 2 max until now, 0 queued, 0 burst reads (0%), 0 second 0M large, 3134M max
Checkpoint Remap 38 pages, 0 mapped back. 299 s atomic time.
DB master 55169024 total 23103037 free 38 remap 0 mapped back
temp 43264 total 43259 free
Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
Currently 1 threads running 0 threads waiting 0 threads in vdb.
Pending:
Client 1111:10: Account: dba, 341 bytes in, 406 bytes out, 1 stmts.
PID: 2640697, OS: unix, Application: unknown, IP#: 192.168.3.251
Transaction status: PENDING, 1 threads.
Locks:
Running Statements:
Time (msec) Text
886 status()
Hash indexes
38 Rows. -- 888 msec.
SQL> STATISTICS DB.DBA.RDF_QUAD;
Showing SQLStatistics of table(s) 'DB.DBA.RDF_QUAD'
TABLE_QUALIFIER TABLE_OWNER TABLE_NAME NON_UNIQUE INDEX_QUALIFIER INDEX_NAME TYPE SEQ_IN_INDEX COLUMN_NAME COLLATION CARDINALITY PAGES FILTER_CONDITION
VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR SMALLINT SMALLINT VARCHAR VARCHAR INTEGER INTEGER VARCHAR
_______________________________________________________________________________
DB DBA RDF_QUAD NULL NULL NULL 0 NULL NULL NULL 7627540098 NULL NULL
DB DBA RDF_QUAD 0 DB RDF_QUAD 3 1 P NULL 7627540098 NULL NULL
DB DBA RDF_QUAD 0 DB RDF_QUAD 3 2 S NULL 7627540098 NULL NULL
DB DBA RDF_QUAD 0 DB RDF_QUAD 3 3 O NULL 7627540098 NULL NULL
DB DBA RDF_QUAD 0 DB RDF_QUAD 3 4 G NULL 7627540098 NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_GPSO 3 1 G NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_GPSO 3 2 P NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_GPSO 3 3 S NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_GPSO 3 4 O NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_GS 3 1 G NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_GS 3 2 S NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_OP 3 1 O NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_OP 3 2 P NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_POGS 3 1 P NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_POGS 3 2 O NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_POGS 3 3 S NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_POGS 3 4 G NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_SP 3 1 S NULL NULL NULL NULL
DB DBA RDF_QUAD 1 DB RDF_QUAD_SP 3 2 P NULL NULL NULL NULL
19 Rows. -- 23 msec.
Not sure if it matters, the database version is 3126
thanks,
Dmitry.