Index to ameliorate SELECT DISTINCT ?p WHERE { GRAPH ... queries

Hello team,
I am running 7.2.13 community edition Virtuoso.

Queries like
SELECT DISTINCT ?p WHERE { GRAPH <http://my.host/my_graph/> { ?s ?p ?o . } }
with default indexing schema are performed via scanning RDF_QUAD_PGOS index.

According to your suggestion given on [Performance Tuning Virtuoso for RDF Queries and Other Use] I tried to create bitmap index RDF_QUAD_GPOS . Unfortunately the index creation ended up with error

[Code: -8, SQL State: 23000] SR175: Uniqueness violation : Violating unique index RDF_QUAD_GPOS on table DB.DBA.RDF_QUAD. Transaction killed.

This error is quite misleading, as the index was not asked to be unique. Moreover, uniqueness of tripples in RDF_QUAD table are guaranteed by its primary key.

Going further, I created an index on P, like

CREATE DISTINCT NO PRIMARY KEY REF COLUMN INDEX RDF_QUAD_P 
  ON DB.DBA.RDF_QUAD (P)

but this index is ignored, according to EXPLAIN

Does anybody have a suggestion on an index that might be more selective with the above query ?

Thanks a lot !

The SR175: Uniqueness violation : Violating unique index RDF_QUAD_GPOS on table DB.DBA.RDF_QUAD. Transaction killed. error probably indicates there are some broken RDF indexes or other corruption in the database.

Thus you should perform a database integrity check with the command:

backup '/dev/null';

Check the primary key on the RDF_QUAD table is not broken by running the command:

select count (s), count (p ), count (o ), count (g ) from rdf_quad   table option (index rdf_quad, check);

The counts of the s, p, o, g columns should all be the same

Check the RDF Index counts with the queries:

select count (*) from rdf_quad a table option (index rdf_quad) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_pogs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s); 
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index primary key) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s); 
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_op) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s); 
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_sp) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s); 
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_gs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

If any counts are not 0 it indicates a bad index.

Hugh, thanks for directions.

  1. backup went through without any errors
  2. 4 counts from the PK index are all same, though negative -962394511
  3. All 5 queries comparing the 5 RDF indices return 0

I also tried to create columnar index RDF_QUAD_GPSO, as you suggested in the same post.
CREATE COLUMN INDEX RDF_QUAD_GPSO ON RDF_QUAD (G, P, S, O)

This went ok, and this index has lower cardinality, hence taken by optimiser for my query in question. However, the respond contains only a part of predicates, several less omnipresent ones [that I know are in my RDF dataset] are missed. This confirms your concern about data integrity. Or about validity of columnar index.

Any thoughts ?

thanks
Dmitry.

What does the output of the query select count(*) from rdf_quad return ?

What does the output of the command status(); return ?

What does the output of the query select count(*) from rdf_quad return ?
7627540081

What does the output of the command status(); return ?

OpenLink Virtuoso  Server
Version 07.20.3240-pthreads for Linux as of Sep 30 2024 (000000)
Started on: 2025-04-10 13:40 GMT+2 (up 3 days 20:41)
CPU: 0.60% RSS: 104113MB VSZ: 106380MB PF: 433

Database Status:
  File size 973078528, 55169024 pages, 23103037 free.
  10900000 buffers, 10899943 used, 10 dirty 0 wired down, repl age 19285907 0 w. io 0 w/crsr.
  Disk Usage: 39860611 reads avg 0 msec, 0% r 0% w last  0 s, 245065 writes flush     0.5967 MB/s,
    84138 read ahead, batch = 337.  Autocompact 4 in 2 out, 40% saved.
Gate:  450608 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = /db/wrk2/vdb-12/virtuoso.trx, 3911 bytes
32064893 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 10 connects, max 4 concurrent
RPC: 145 calls, 1 pending, 2 max until now, 0 queued, 0 burst reads (0%), 0 second 0M large, 3134M max
Checkpoint Remap 38 pages, 0 mapped back. 299 s atomic time.
    DB master 55169024 total 23103037 free 38 remap 0 mapped back
   temp  43264 total 43259 free

Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
   Currently 1 threads running 0 threads waiting 0 threads in vdb.
Pending:

Client 1111:10:  Account: dba, 341 bytes in, 406 bytes out, 1 stmts.
PID: 2640697, OS: unix, Application: unknown, IP#: 192.168.3.251
Transaction status: PENDING, 1 threads.
Locks:


Running Statements:
 Time (msec) Text
         886 status()


Hash indexes


38 Rows. -- 888 msec.

SQL> STATISTICS DB.DBA.RDF_QUAD;
Showing SQLStatistics of table(s) 'DB.DBA.RDF_QUAD'
TABLE_QUALIFIER  TABLE_OWNER      TABLE_NAME       NON_UNIQUE  INDEX_QUALIFIER  INDEX_NAME       TYPE        SEQ_IN_INDEX  COLUMN_NAME      COLLATION  CARDINALITY  PAGES       FILTER_CONDITION
VARCHAR          VARCHAR          VARCHAR          SMALLINT    VARCHAR          VARCHAR          SMALLINT    SMALLINT    VARCHAR          VARCHAR  INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

DB               DBA              RDF_QUAD         NULL        NULL             NULL             0           NULL        NULL             NULL     7627540098  NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           1           P                NULL     7627540098  NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           2           S                NULL     7627540098  NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           3           O                NULL     7627540098  NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           4           G                NULL     7627540098  NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GPSO    3           1           G                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GPSO    3           2           P                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GPSO    3           3           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GPSO    3           4           O                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GS      3           1           G                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GS      3           2           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_OP      3           1           O                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_OP      3           2           P                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           1           P                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           2           O                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           3           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           4           G                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_SP      3           1           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_SP      3           2           P                NULL     NULL        NULL        NULL

19 Rows. -- 23 msec.

Not sure if it matters, the database version is 3126

thanks,
Dmitry.