Technology Update -- Virtuoso Anytime Query Functionality for Query Scalability

Technology Update - Virtuoso Anytime Query Functionality

Copyright © 2022 OpenLink Software

Introduction

Anytime Query” is a core feature of Virtuoso that enables it handle challenges inherent in providing a high-performance and accessible interface (public e.g., Web or private e.g., internal intranet) for ad-hoc querying at scale. This extension allows an SPARQL- and HTTP-protocol based application or service to issue queries irrespective of query complexity and/or solution size. Fundamentally, it handles query solution production pipelines that would typically result in no solutions due to exceeding configured DBMS query timeouts and/or solution size limits; in addition, this feature enables the use of LIMIT and OFFSET (typically combined with ORDER BY and/or GROUP BY) to create windows (also known as sliding windows or cursors) to iterate through a complete query solution without being adversely affected by insert or delete operations.

Virtuoso Anytime Query extension for SPARQL

Server-side settings

An instance administrator (e.g., a DBA) can impose limits on a Virtuoso SPARQL endpoint, by via the following virtuoso.ini file entries:

[SPARQL]
ResultSetMaxRows      = 60000  ; Limit query results (or solution sizes) to 60000 rows (default is no limit)
MaxQueryExecutionTime = 120    ; Set server-side timeout to 120 seconds (default=0)
ExecutionTimeout      = 30     ; Set client-side timeout to 30 seconds (default=0)
HTTPAnytimeStatus     = 206    ; Preferred HTTP Status code, with the default being 206

When using the Virtuoso 8 Commercial Edition, the ResultSetMaxRows and MaxQueryExecutionTime can also be restricted by using ACLs in the Virtuoso Authentication Layer (VAL).

The ExecutionTimeout setting is used by the SPARQL endpoint in the Execution Timeout field (converted to milliseconds), which the user can override either via the form or a SPARQL URL parameter (i.e., &timeout).

NOTE: If none of these settings are added to the virtuoso.ini, there are no constraints set by Virtuoso. However the amount of memory, proxy timeout settings, etc., can still cause SPARQL queries to fail with various HTTP status codes.

Client-side parameter

HTTP-based client applications such as cURL, Python, Node.js, Javascript libraries, or the Virtuoso SPARQL endpoint itself, can use the &timeout=30000 parameter of a SPARQL URL to control how long it wants to wait for a result (or query solution) to be produced. Virtuoso’s query engine will sanitize the timeout parameter ensuring that its between 0 and MaxQueryExecutionTime (converted to milliseconds).

If a query executes to completion within the configured timeout (which can be unlimited) the HTTP status is set to 200 (OK) and the results will be returned in the requested format.

The following table shows what HTTP status code Virtuoso will return, If either &timeout and/or MaxQueryExecutionTime are set:

Timeout MaxQueryExecutionTime Description HTTP status on exceeding limit
0 0 seconds No limits imposed by the Virtuoso engine n/a
0 > 0 seconds Treat Anytime Query timeout as an error 500 (Server Error)
> 999 0 seconds Client requests optional timeout 206 (Partial Result)
> 999 > 0 seconds Mandatory Anytime Query timeout 206 (Partial Result)

Virtuoso HTTP status codes and response headers

The following HTTP status codes and response headers can be returned by the Virtuoso SPARQL endpoint.

Note: Headers that start with an X- are custom headers that Virtuoso returns.

HTTP status code 200 (OK)

This HTTP status code is returned if a valid query was executed within the allotted time.

Virtuoso returns the following response headers:

Connection: keep-alive
Content-disposition: filename=sparql_2022-10-06_12-00-00Z.html
Content-Encoding: gzip
Content-Type: text/html; charset=UTF-8
Date: Thu, 06 Oct 2022 12:00:00 GMT
Server: Virtuoso/08.03.3326 (Linux) x86_64-generic-linux-glibc212  VDB
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-SPARQL-default-graph: http://dbpedia.org

The response payload is the full result-set (or query solution) returned in the requested format.

HTTP status code 206 (Partial)

This HTTP status code is returned when the query solution production pipeline exceeded either the current Anytime Query timeout setting, or the current ResultSetMaxRows limit.

Some internet specifications such as RFC 2616: Hypertext Transfer Protocol and blogs like HTTP Status: 206 Partial Content and range requests indicate that this status code can only be initiated on the client (request) side.

However RFC 9110: HTTP Semantics from June 2022 which supersedes the previous RFC states:

… a server might want to send only a subset of the data requested for reasons of its own,
such as temporary unavailability, cache efficiency, load balancing, etc. Since a 206 response is
self-descriptive, the client can still understand a response that only partially satisfies its range request.

Our testing confirms that all common browsers, javascript frameworks, NodeJS etc. all work fine with HTTP status code 206 in combination with the response headers as well as payload that Virtuoso returns.

However since we anticipate users might wish to use HTTP status codes 200 or 500 inline with their preferred application’s behavior, we’ve added an INI configuration option, configurable by an instance DBA, along the following lines:

[SPARQL]
HTTPAnytimeStatus = 500     ; Default is 206

Virtuoso returns the following response headers for a partial result:

Accept-Ranges: none
Connection: keep-alive
Content-disposition: filename=sparql_2022-10-06_12-00-00Z.html
Content-Length: 1095
Content-Type: text/html; charset=UTF-8
Date: Thu, 06 Oct 2022 12:00:00 GMT
Server: Virtuoso/08.03.3326 (Linux) x86_64-generic-linux-glibc212  VDB
Strict-Transport-Security: max-age=15768000
X-Exec-DB-Activity: 55.41K rnd  1.383M seq      0 same seg   9.066K same pg  4.708K same par      0 disk      0 spec disk      0B /      0 messages      0 fork
X-Exec-Milliseconds: 2826
X-SPARQL-Anytime: timeout=1000; max_timeout=30000
X-SPARQL-default-graph: http://dbpedia.org
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout.  Activity:  55.41K rnd  1.383M seq      0 same seg   9.066K same pg  4.708K same par      0 disk      0 spec disk      0B /      0 m
X-SQL-State: S1TAT

The Accept-Ranges header returns a value of none to make sure a client-side application does not attempt to automatically start requesting byte ranges based on the Content-Length.

The X-SPARQL-Anytime header returns both the current timeout value as well as the MaxQueryExecutionTime value both in milliseconds. It can be used by a client-side application to generate a form where the user can change the timeout and resend the request to the SPARQL endpoint using a new value for the &timeout parameter.

The X-SPARQL-default-graph header returns the SPARQL default graph.

The X-Exec-DB-Activity, X-Exec-Milliseconds, X-SQL-Message and X-SQL-State headers can be used by the DBA or by the OpenLink Support staff to examine some query statistics.

The response payload is the partial result-set returned in the requested format, even if the HTTPAnytimeStatus has been changed from 206 to another http status code.

HTTP status code 400 (Bad Request)

This status code can be returned when the query contains a syntax error.

Virtuoso returns the following response headers:

Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 351
Content-Type: text/plain
Date: Thu, 06 Oct 2022 14:06:08 GMT
Server: Virtuoso/08.03.3326 (Linux) x86_64-generic-linux-glibc212  VDB

The result payload contains a description of the error which the client side application can log or put in a dialog box for the user, e.g.:

Virtuoso 37000 Error SP030: SPARQL compiler, line 5: syntax error at 'string' before '('

SPARQL query:
#output-format:text/html
define sql:signal-unconnected-variables 1
define sql:signal-void-variables 1
define input:default-graph-uri <http://dbpedia.org>
select distinct ?Concept where {[] a ?Concept. filter (string(?Concept) like '%dbpedia%') } LIMIT 100

HTTP status 500 (Server Error)

This status code is be returned when the query hits the AnyTime Query timeout, but the client-side specified it did not want to a partial result by setting the &timeout=0

Virtuoso returns the following response headers:

Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 69
Content-Type: text/plain
Date: Thu, 06 Oct 2022 12:00:00 GMT
Server: Virtuoso/08.03.3326 (Linux) x86_64-generic-linux-glibc212  VDB

The result payload contains a description of the error which the client side application can log or put in a dialog box for the user:

Virtuoso S1TAT Error Query did not complete due to ANYTIME timeout.

HTTP status code 504 (Gateway Timeout)

This status can be returned by reverse proxies such as Nginx, HAProxy or Traefik when the query takes too long to produce results. This can happen when the timeout set by the proxy is smaller than the AnyTime Query timeout.

Virtuoso Anytime Query Functionality & GraphQL Queries

The Virtuoso GraphQL endpoint uses the same MaxQueryExecutionTime, HTTPAnytimeStatus and ResultSetMaxRows settings as the SPARQL endpoint.

It also uses similar HTTP response headers as the ones described for the SPARQL endpoint.

Virtuoso Anytime Query Functionality for ODBC, JDBC, iSQL or Virtuoso PL Clients

Virtuoso also allows applications written in ODBC, JDBC, iSQL, Virtuoso stored procedures (PL) etc., to use the Anytime Timeout extension for both SQL and SPARQL queries.

Since these types of connections are not anonymous like the SPARQL or GraphQL endpoint, there currently is no maximum timeout.

Example using Virtuoso iSQL/PL

Running the following example on dbpedia.org using the Virtuoso isql tool:

$ isql 1111
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 08.03.3318 as of Aug 13 2020
Type HELP; for help and EXIT; to exit.

SQL> set RESULT_TIMEOUT = 1000;     -- timeout in milliseconds

SQL> SPARQL SELECT SAMPLE(?s) AS ?sample COUNT(*) AS ?count ?o 
FROM <http://dbpedia.org>
WHERE { ?s a ?o. }
ORDER BY DESC 2 LIMIT 10;

sample                                                          count  o
LONG VARCHAR                                                    LONG   LONG VARCHAR
_______________________________________________________________________________

http://dbpedia.org/resource/1979_Gulf_Cup_of_Nations            62342  http://dbpedia.org/ontology/CareerStation
http://dbpedia.org/resource/Category:Ancyloceratoidea           48506  http://www.w3.org/2004/02/skos/core#Concept
http://dbpedia.org/resource/1969_in_baseball                    26261  http://www.w3.org/2002/07/owl#Thing
http://dbpedia.org/resource/501(c)(3)                           10959  http://dbpedia.org/ontology/PersonFunction
http://dbpedia.org/resource/2021_in_spaceflight                 9527   http://dbpedia.org/ontology/TimePeriod
http://dbpedia.org/resource/1969_uprising_in_East_Pakistan      8940   http://dbpedia.org/ontology/Organisation
http://dbpedia.org/resource/2020–21_Glasgow_Warriors_season     8727   http://dbpedia.org/ontology/Agent
http://dbpedia.org/resource/2020–21_Glasgow_Warriors_season     8706   http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent
http://dbpedia.org/resource/2020–21_Glasgow_Warriors_season     8706   http://www.wikidata.org/entity/Q24229398
http://dbpedia.org/resource/2020–21_Glasgow_Warriors_season     8700   http://schema.org/Organization

*** Error S1TAT: VD [Virtuoso Server]RC...: Returning incomplete results, query interrupted by result timeout.  Activity:    561K rnd  561.3K seq      0 same seg   172.5K same pg   1.29K same par      0 disk      0 spec disk      0B /      0 m
in lines 3-6 of Top-Level:
#line 3 "(console)"
SPARQL SELECT SAMPLE(?s) AS ?sample COUNT(*) AS ?count ?o  FROM <http://dbpedia.org> WHERE { ?s a ?o. } ORDER BY DESC 2 LIMIT 10

SQL> set RESULT_TIMEOUT = 0;

This will fetch a number and display a number of rows, until the timeout exceeds and the next fetch results in a SQL state S1TAT.

*** Error S1TAT: VD [Virtuoso Server]RC...: Returning incomplete results, query interrupted by result timeout.  Activity:  616.6K rnd  584.1K seq      0 same seg   184.6K same pg  9.519K same par      0 disk      0 spec disk      0B /      0 m
at line 6 of Top-Level:
SPARQL SELECT SAMPLE(?s) AS ?sample COUNT(*) AS ?count ?o  FROM <http://dbpedia.org> WHERE { ?s a ?o. } ORDER BY DESC 2

This SQL state can be checked by simple application logic using a WHENEVER statement in Virtuoso PL.

Example using SPARQL inside SQL (SPASQL) via iODBC

Using the iODBC iodbctest tool to run the same test:

$ iodbctest DSN=dbpedia
iODBC Demonstration program
This program shows an interactive SQL processor
Driver Manager: 03.52.1216.0712
Driver: 08.03.3314 OpenLink Virtuoso ODBC Driver (virtodbc.so)

SQL> set RESULT_TIMEOUT = 1000
Statement executed. 0 rows affected.

SQL> SPARQL SELECT SAMPLE(?s) AS ?sample COUNT(*) AS ?count ?o FROM <http://dbpedia.org> WHERE { ?s a ?o. } ORDER BY DESC 2 limit 10

sample                        |count                         |o
------------------------------+------------------------------+------------------------------
http://dbpedia.org/resource/19|63307                         |http://dbpedia.org/ontology/Ca
http://dbpedia.org/resource/Ca|48506                         |http://www.w3.org/2004/02/skos
http://dbpedia.org/resource/19|26261                         |http://www.w3.org/2002/07/owl#
http://dbpedia.org/resource/50|11087                         |http://dbpedia.org/ontology/Pe
http://dbpedia.org/resource/20|9686                          |http://dbpedia.org/ontology/Ti
http://dbpedia.org/resource/19|8940                          |http://dbpedia.org/ontology/Or
http://dbpedia.org/resource/20|8727                          |http://dbpedia.org/ontology/Ag
http://dbpedia.org/resource/20|8706                          |http://www.ontologydesignpatte
http://dbpedia.org/resource/20|8706                          |http://www.wikidata.org/entity
http://dbpedia.org/resource/20|8700                          |http://schema.org/Organization
1: Fetch = [OpenLink][Virtuoso ODBC Driver][Virtuoso Server]RC...: Returning incomplete results, query interrupted by result timeout.  Activity:    562K rnd  562.3K seq      0 same seg   173.5K same pg  1.293K same par      0 disk      0 spec disk      0B /      0 m (-1) SQLSTATE=S1TAT

 result set 1 returned 10 rows.

SQL> set RESULT_TIMEOUT = 0
Statement executed. 0 rows affected.

If iODBC tracing is enabled, we can see the following in the iODBC trace log:

[000024.502533]
iodbctest       7FA9E8E3A700 ENTER SQLFetchScroll
                SQLHSTMT          0x1b2e280
                SQLUSMALLINT      1 (SQL_FETCH_NEXT)
                SQLLEN            1

[000024.502558]
iodbctest       7FA9E8E3A700 EXIT  SQLFetchScroll with return code -1 (SQL_ERROR)
                SQLHSTMT          0x1b2e280
                SQLUSMALLINT      1 (SQL_FETCH_NEXT)
                SQLLEN            1

[000024.502584]
iodbctest       7FA9E8E3A700 ENTER SQLGetDiagRec
                SQLSMALLINT       3 (SQL_HANDLE_STMT)
                SQLHSTMT          0x1b2e280
                SQLSMALLINT       1
                SQLCHAR         * 0x7ffd7aaf6a80
                SQLINTEGER      * 0x7ffd7aaf6a9c
                SQLCHAR         * 0x7ffd7aaf6880
                SQLSMALLINT       512
                SQLSMALLINT     * 0x0

[000024.502636]
iodbctest       7FA9E8E3A700 EXIT  SQLGetDiagRec with return code 0 (SQL_SUCCESS)
                SQLSMALLINT       3 (SQL_HANDLE_STMT)
                SQLHSTMT          0x1b2e280
                SQLSMALLINT       1
                SQLCHAR         * 0x7ffd7aaf6a80
                                  | S1TAT                                    |
                SQLINTEGER      * 0x7ffd7aaf6a9c (-1)
                SQLCHAR         * 0x7ffd7aaf6880
                                  | [OpenLink][Virtuoso ODBC Driver][Virtuos |
                                  | o Server]RC...: Returning incomplete res |
                                  | ults, query interrupted by result timeou |
                                  | t.  Activity:  1.047M rnd  971.5K seq    |
                                  |    0 same seg   229.9K same pg     23K s |
                                  | ame par      0 disk      0 spec disk     |
                                  |   0B /      0 m                          |
                SQLSMALLINT       512
                SQLSMALLINT     * 0x0

This SQL state can be checked by simple application logic.

See Also