Querying on local endpoint through Python is slow

I have setup a local SPARQL endpoint with DBPedia database using Openlink Virtuoso through this guide. Then I tried to query my database through Python with the help of RDFLib and SPARQLWrapper.

Problem is the time it take for a query (through Python) is very long, usually 2 to 3 seconds before I can get a result back. But when I use my browser to query directly at the endpoint (go to localhost from Chrome), for the same type of query I get the result instantly.

I don’t think it’s a problem with my Python code, because if I keep the same code and just change to DBPedia public endpoint, I can get a query result within 0.1 to 0.2 seconds. My database file is around 6GB and I have configured the ini file to use more memory as instructed.

Here’s what my query looks like (DBPedia endpoint or local endpoint directly from Chrome: almost instant result; local endpoint through Python: 2+ seconds):

sparql = SPARQLWrapper('http://localhost:8890/sparql')
sparql.setQuery('''
    SELECT ?id
    WHERE { 
     ?linkto dbo:wikiPageID ?id.
     ?origin    dbo:wikiPageWikiLink  ?linkto.
     ?origin  dbo:wikiPageID 9186.
    }
''')
sparql.setReturnFormat(CSV)
qres = sparql.query().convert().decode('u8')

I printed out the runtime of several hundreds queries (on local endpoint) and every single one of them took around 2.01 to 2.05 sec to complete, not even one is below 2 sec. So I thought somewhere along the pipeline there’s a fixed 2 sec delay, and the actual query only takes 10 to 50ms to complete.

Any help would be apreciated!

@mentats: If querying your Virtuoso SPARQL end point directly you get the expected response time, then it would indicate the problem is not with the Virtuoso instance. Although you should check the time when the query is first executed as subsequent one maybe from cache and be instantaneous as the query does not have to be re-executed.

What does the output the Virtuoso “status();” command report as to the status of your Virtuoso instance and what is in your "virtuoso.ini file, system memory and number of cpus ?

You could also log the response time around each python call to see where the time is being spent ie something like:

from SPARQLWrapper import SPARQLWrapper, CSV
import time
start_time = time.time()

sparql = SPARQLWrapper('http://localhost:8890/sparql')
print("--- SPARQLWrapper %s seconds ---" % (time.time() - start_time))
sparql.setQuery('''
    SELECT ?id
    WHERE { 
     ?linkto dbo:wikiPageID ?id.
     ?origin    dbo:wikiPageWikiLink  ?linkto.
     ?origin  dbo:wikiPageID 9186.
     } 
''')
print("--- setQuery %s seconds ---" % (time.time() - start_time))
sparql.setReturnFormat(CSV)
print("--- ResultFormat%s seconds ---" % (time.time() - start_time))
qres = sparql.query().convert().decode('u8')
print("--- Query Decode  %s seconds ---" % (time.time() - start_time))
print(qres)
print("--- Print Result %s seconds ---" % (time.time() - start_time))

Hi, thanks for the reply. I actually have found the cause of my issue, it’s some weird behavior with the DNS. So changing from localhost to 127.0.0.1 make the query instantaneous now.