Hi all,
I have issues that seem clearly related to SPARQL query limits and am not sure if it is a bug or a “feature”…
Context
The dataset I am working on is DBnary available at http://kaiko.getalp.org/ but my use of it is through a direct connection using isql (with dba user). To reproduce would mean loading the full dataset, which is possible (but will take several hours). You can contact me for this.
Preamble
I am aware of the fact that there is a limit to the public SPARQL endpoint (which is 10000 rows and a certain amount of time and effort for complex queries), but such limits should not appear when connected through isql command on the linux server (and authenticated as dba). To illustrate this, the query :
SPARQL SELECT * WHERE {?sle ontolex:canonicalForm ?tle} ;
will return 10000 rows if executed on the sparql public endpoint (OpenLink Virtuoso SPARQL Query Editor) and 15511921 rows if executed directly on the server using isql command line. So far, so good !
The problem
I want to execute a huge UPDATE query everytime I relod the DBnary data in a fresh server. I do not really care about the time it will take as I do it behind the scene when publishing a new version of the data.
SPARQL INSERT
{ GRAPH <http://kaiko.getalp.org/dbnary/vartrans> {?sle vartrans:translatableAs ?tle} }
WHERE {
{ SELECT (sample(?sle) as ?sle), (sample(?le) as ?tle) WHERE {
?trans
a dbnary:Translation ;
dbnary:isTranslationOf ?sle ;
dbnary:targetLanguage ?lg ;
dbnary:writtenForm ?wf.
?sle a ontolex:LexicalEntry;
lexinfo:partOfSpeech ?pos.
?le a ontolex:LexicalEntry;
dcterms:language ?lg;
rdfs:label ?wf;
lexinfo:partOfSpeech ?pos.
FILTER (REGEX(STR(?le), "^http://kaiko.getalp.org/dbnary/.../[^_]")) .
} GROUP BY ?trans
HAVING (COUNT(*) = 1)
}
};
When I do this query on isql, authenticated as DBA it seems to succeed, but if I count the number of inserted relation in the target graph, I’ll get 10001 rows only, while I should have millions…
My understanding is that the INNER query is limited to 10000 (or is it 10001 ?) and the outer query does not have any way to know that the results are partial.
My question is: why is there such a limit when I am connected and I want an exhaustive answer, regardless of the time it will take ?
Is this a bug or what ?
Gilles,