This query is so slow, took 10 seconds for the result, the only difference is the ?catId is replaced with literal value 1. I thought this should be faster because it’s more specific.
Why is it so? I am still new to SPARQL and triplestore, seems like I have to do some indexing on the object?
I have tried creating the GOPS index with, I don’t understand about the Partition part, is using O correct? What does that partition mean? Creating this index has improved the execution for the second query to 4 seconds, but still too slow for a DBMS.
CREATE COLUMN INDEX RDF_QUAD_GOPS
ON RDF_QUAD (G, O, P, S)
PARTITION (O VARCHAR (-1, 0hexffff));
Btw, does Virtuoso store triplets in the regular RDBMS table? in the table DB.DBA.RDF_QUAD? So virtuoso does not have a native storage of graph? Or what I don’t understand?
What is the specific version and gitid of the Virtuoso 7 server being used? This can be obtained from the output of running the command ./virtuoso-t -? from the bin directory of your Virtuoso installation?
The second query checking for a specific catId value and taking a longer time to run is probably because it is resulting in a Full Table Scan on the RDF_QUAD table which, depending on the number of triples in the database, can take significantly longer. This can be determined by obtaining a query compilation plan, showing how the SPARQL query is being prepared for execution against the database, as detailed in Generate SPARQL Query Compilation Report from a Virtuoso /sparql Endpoint.
Rather than checking for the catID value in the triple pattern you can try adding a SPARQL filter clause to the first query to check for the specific value required which should eliminate the need for the Full Table Scan and hence run much faster, i.e., filter (?catId = 1), See:
Generally, there is no need to create additional indexes on the RDF_QUAD table, as the 5 default indexes (2 Full & 3 partial) suffice for most uses with few exceptions, as indicated in the Virtuoso RDF Performance Tuning Guide, thus I would suggest deleting the index you created.
C:\Program Files\OpenLink Software\Virtuoso OpenSource 7.2\bin> .\virtuoso-t.exe -?
Virtuoso Open Source Edition (multi threaded)
Version 07.20.0.3229-threads as of Aug 27 2018 (17c4ba1d5)
Compiled for Win64 (x86_64-generic-win-64)
Copyright (C) 1998-2018 OpenLink Software
When I used your FILTER query, the query has reduced from 8 seconds to 4 seconds, there’s improvement, but still pretty slow.
Previously I only have 4 indexes, after adding my own then it become 5 indexes, the first one was mine. So it should have 5 indexes by default ?
Your screenshot is missing the PRIMARY KEY INDEX on the table itself which is the main PSOG index, and visible if you view the RDF_QUAD table Definition in the conductor and select the Show SQL option to see the DDL command for the table creation:
Hi, I see the main primary Index now, I have dropped my GOPS index.
EXPLAIN for fast query
EXPLAIN for slow query (3 to 4 seconds) with ?catAtt qq:catId 1;
Btw, now my initial query by specifying the value for ?catId 1 that took 8 seconds now only 3 seconds. Which become slightly faster than using FILTER, it’s strange. I know first time query is always slower than the second time, all the time I tested the query multiple times.
I used the “statistic” tab to count the triplet in “graph” that was imported from MySQL D2RQ; here it is, I deleted the graph URL because this forum does not allow me to post more than 2 links.
Can I ask a separate question, is triplestore suitable to store day to day programming data in RDF form? I have a lot of linked data which is suitable for triplestore, but what about application data such as user login info, customer, orders are they suitable to be stored in triplestore? I would prefer that I could so that I only have to deal with one type of database to reduce complexity but I been reading a lot saying that triplestore is only suitable for linked data and inferencing. Because for the same query triplestore seems to be slower than regular RDBM. This is discussed here https://stackoverflow.com/questions/2501892/are-rdf-triple-stores-suitable-for-everyday-programming But it’s 10 years, ago, I don’t know if things has changed since.
I know Virtuoso is multi-model that support RDBMS, but I prefer to store in triplestore so that I only maintain one system and talk in one language, SPARQL
Let me know if it’s better to start a new thread for this.
Hi, I have a new finding for the slow query with literal value specified, when that ?catAttName is removed from the query, the query is almost instant fast! ?catAttName is a string field, why querying string field slow it down?
EXPLAIN output (combine the URL below), it’s kind of annoying that I cannot post more than 2 links, then now i cannot post this pastebin domain, so just delete the space in between.
UPDATE: When I import the same RDF into GraphDB by Ontotext, the same query return instantly, took just 0.1 seconds, clearly something is not right with Virtuoso.
That is an interesting observation, as it would appear the query is for some reason generating a bad query plan, in some way related to the inclusion of the ?catAtt qq:catId 1; and qq:catAttName ?catAttName; triple patterns in the query.
We are analysing the query plans provided to try and determine the cause …
Hi, I suppose this won’t be fixed anytime soon right? I am guessing it will take months/year before a new version released? I would have to use other triplestore like GraphDb or Blazegraph then.
Hi,
Please make explain of the slow query and send us the report so we can see what happens. The sparql endpoint have a checkbox to make explain, this results in a report shows:
The pastebin link do not working for me, gives me timeout. Please send us the raw output of explain and profile. The way to do either of them is to use ISQL tool and do:
@anatta – If the community.openlinksw.com server rejects your file attachment for any reason, or it says your inline pasted text is too large, you can email the output (optimally as attached text files) to support@openlinksw.com, referencing this thread, and we’ll connect the dots.
Hi, I have emailed to support@openlinksw.com, your forum doesn’t allow me to include more than 2 links. As there are many graph links in the EXPLAIN.
After many days I didn’t use Virtuoso, somehow I cannot start the service anymore, can help me on this? I can reinstall but will it deleted my config and database? So I can only provide the EXPLAIN query.
In what way is the Virtuoso service not starting? What is in the virtuoso.log file from the start attempts?
Might the machine have been shutdown unexpectedly, leaving the virtuoso.lck in the database directory, which will then prevent it from starting ?
You can reinstall and the database file (virtuoso.db) should remain in place (but best make a copy of it), as it should not be uninstalled or overwritten as the installer does not install a default database, but reinstallation should NOT be required …
Hi, indeed that virtuoso.lck was the issue, deleting it, solve it.
I don’t have the option to upload file; the only option that I have for upload is the image. If I tried to upload file via that “Sorry, the file you are trying to upload is not authorized (authorized extensions: jpg, jpeg, png, gif, heic, heif).”
Could it be you have more privilege than me?
This is what I saw. If I directly copy paste a file, it will also show the error like above.