Gathering statistics for the RDF tables, useful or not?

Is it useful to gather statistics for tables and histograms for columns on the rdf tables after a bulk load?


And if so what kind of histogram bucket numbers are reasonable for different number of distinct values in a column.

Are there any criteria to keep in mind before trying to create a histogram for a column?

As RDF storage is a total mess with different statistics in different predicates and graphs, per-table statistics are next to useless. They’re useful primarity for mix queries where relational data and RDF are joined. For SPARQL with its snowflake-like joins, the only way to make an execution plan accurate is to perform sampling of specific joins during query compilation. So if you want to stay on safe side and to ensure you have the best plans after a bulk load, just re-connect the client to clear the cache of queries compiled with old samples.

Thank you. This is what I remembered but was not sure about. In practice running those commands leads to errors from the query planner. But as they are not helpful we won’t run them so it is not an issue.

SQ156: Internal Optimized compiler error : sqlo table has no index in sqldf.c:3782.
Please report the statement compiled