FCT aggregates and filters

It may be an idea to kill with -11 the process to get core dump
or to attach with gdb and do thread apply all where

Got it, thanks. I will try this to help with troubleshooting.

@imitko

The LOD instance does not show symptoms of the filter bug. Is it possible to send the FCT VAD running at LOD? The filter bug is best exemplified in these two queries:

DBPedia (this instance is affected)
LOD (this instance is not affected)

Here are the two responses:

DBPedia Response (mal-formed)

HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sun, 10 Mar 2019 03:31:12 GMT
Content-Type: text/xml; charset=UTF-8
Connection: close
X-Powered-By: Express
access-control-allow-origin: *
access-control-allow-headers: Depth,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Accept-Encoding
vary: Accept-Encoding
access-control-allow-credentials: true
access-control-allow-methods: GET, POST, OPTIONS
Content-Length: 7752

 filter (?s1 = <http://dbpedia.org/resource/Bay_of_Kotor>) .<fct:facets xmlns:fct="http://openlinksw.com/services/facets/1.0/">
<fct:sparql>     select ?s1p as ?c1 count (*) as ?c2  where  {?s1 a &lt;http://dbpedia.org/class/yago/BodyOfWater109225146&gt; . quad map virtrdf:DefaultQuadMap { graph ?g {  ?s1 ?s1textp ?o1 . ?o1 bif:contains  &#39;&quot;sea&quot;&#39;  . } }  ?s1 &lt;http://dbpedia.org/property/year&gt; ?s2 . ?s1 ?s1p ?s1o . } group by ?s1p order by desc 2 limit 30  offset 0 </fct:sparql>
<fct:time>955</fct:time>
<fct:complete>yes</fct:complete>
<fct:timeout>13560</fct:timeout>

LOD Response (well-formed)

HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sun, 10 Mar 2019 03:31:38 GMT
Content-Type: text/xml; charset=UTF-8
Content-Length: 8113
Connection: close
X-Powered-By: Express
access-control-allow-origin: *
access-control-allow-headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Accept-Encoding
accept-ranges: bytes
access-control-allow-credentials: true
access-control-allow-methods: GET, POST, OPTIONS

<fct:facets xmlns:fct="http://openlinksw.com/services/facets/1.0/">
<fct:sparql>     select ?s1p as ?c1 count (*) as ?c2  where  {?s1 a &lt;http://dbpedia.org/class/yago/BodyOfWater109225146&gt; .  quad map virtrdf:DefaultQuadMap { graph ?g {  ?s1 ?s1textp ?o1 . ?o1 bif:contains  &#39;&quot;sea&quot;&#39;  . }}  quad map virtrdf:DefaultQuadMap { ?s1 &lt;http://dbpedia.org/property/year&gt; ?s2 . } filter (?s1 = &lt;http://dbpedia.org/resource/Bay_of_Kotor&gt;) . ?s1 ?s1p ?s1o . } group by ?s1p order by desc 2 limit 30  offset 0 </fct:sparql>
<fct:time>96</fct:time>
<fct:complete>yes</fct:complete>
<fct:timeout>13560</fct:timeout>

The LOD instance exhibits the filter bug symptoms now also. This query returns mal-formed XML.

HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 12 Mar 2019 21:26:46 GMT
Content-Type: text/xml; charset=UTF-8
Connection: close
X-Powered-By: Express
access-control-allow-origin: *
access-control-allow-headers: Depth,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Accept-Encoding
vary: Accept-Encoding
access-control-allow-credentials: true
access-control-allow-methods: GET, POST, OPTIONS
Content-Length: 3412

 filter (?s5 = <http://dbpedia.org/resource/Orson_Welles>) .<fct:facets xmlns:fct="http://openlinksw.com/services/facets/1.0/">
<fct:sparql>     select ?s1 as ?c1 count (*) as ?c2 where { select distinct ?s1  {?s1 a &lt;http://dbpedia.org/class/yago/WikicatBlack-and-whiteFilms&gt; .?s1 a &lt;http://dbpedia.org/class/yago/WikicatDystopianFilms&gt; . ?s1 &lt;http://dbpedia.org/ontology/producer&gt; ?s2 . ?s1 &lt;http://dbpedia.org/ontology/distributor&gt; ?s3 . ?s1 &lt;http://dbpedia.org/ontology/director&gt; ?s4 . ?s1 &lt;http://dbpedia.org/ontology/starring&gt; ?s5 . } } group by ?s1 order by desc 2 limit 15  offset 0 </fct:sparql>
<fct:time>77</fct:time>

Note, this was not the case a couple of weeks ago.

The below SPARQL query does not return the appropriate results. The subject ?s1 should have “STAR WARS” in its rdfs:label, but some of the results do not:

select distinct ?s1 as ?c1 ?o1 ?g where  { 

quad map virtrdf:DefaultQuadMap { 
  graph ?g {  
  ?s1 <http://www.w3.org/2000/01/rdf-schema#label> ?o1 . 
  ?o1 bif:contains  '(STAR AND WARS)'  . 
}}  

quad map virtrdf:DefaultQuadMap { 
  ?s1 <http://purl.org/dc/terms/subject> ?s2 . 
} 

optional { graph virtrdf:IRI_Rank_c {  
  ?s1 virtrdf:IRI_Rank_rnk_c_int ?srank1 . 
} }  

quad map virtrdf:DefaultQuadMap { ?s1 <http://purl.org/dc/terms/subject> ?s3 . } 

filter (?s3 = <http://dbpedia.org/resource/Category:Game_Boy_Advance_games>) . 

} 

order by desc (?srank1) 

@imitko Hi, are there any updates or fixes for this issue?

Please include URLs for your problematic instance, and one of our live instances (URIBurner or LOD Cloud Cache or DBpedia) to help diagnose this matter.

Here is a test query:

select distinct ?s1 ?o1 ?g where  { 

quad map virtrdf:DefaultQuadMap { 
  graph ?g {  
  ?s1 <http://www.w3.org/2000/01/rdf-schema#label> ?o1 . 
  ?o1 bif:contains  '(STAR AND WARS)'  . 
}}  

quad map virtrdf:DefaultQuadMap { 
  ?s1 <http://purl.org/dc/terms/subject> ?s2 . 
} 

optional { graph virtrdf:IRI_Rank_c {  
  ?s1 virtrdf:IRI_Rank_rnk_c_int ?srank1 . 
} }  

quad map virtrdf:DefaultQuadMap { ?s1 <http://purl.org/dc/terms/subject> ?s3 . } 

filter (?s3 = <http://dbpedia.org/resource/Category:Game_Boy_Advance_games>) . 
filter (lang(?o1) = "en")
filter (contains(str(?o1),"Star"))
filter (contains(str(?o1),"Wars"))

} 

Current Results:

  1. LOD Cloud Cache – which does returns matches

Here is the label bug example in LOD Cloud Cache. [Note: neither LOD or Dbpedia seem to produce this SPARQL any more.] It asks for only results where STAR+WARS appears in the rdfs:label , but in some of the results STAR+WARS in missing from all the item’s labels.

I can no longer reproduce the filter bug in LOD Cloud Cache instance. The issue is still present in DBPedia, URIBurner, and VIOS dataspaces, and generally any /fct I’ve tried besides LOD Cloud Cache. It only manifest in the /fct/service output, so the only way to reproduce it is to inspect the server’s response to a /fct/service request. Here is a comparison:

This DBPedia request should only produce a single result, but instead returns multiple items:

POST /fct/service HTTP/1.1
Accept: text/xml
Content-type: text/xml
cache-control: no-cache
Postman-Token: 541ee03e-f39c-4e75-abd7-f075bcf875e6
User-Agent: PostmanRuntime/7.6.0
Host: dbpedia.org
Accept-Encoding: gzip, deflate
Content-Length: 385
Connection: close

<query label="Record" timeout="13560"><class iri="http://dbpedia.org/class/yago/BodyOfWater109225146" label="Body Of Water"></class><text label="sea">sea</text><property iri="http://dbpedia.org/property/year" label="year"></property><value label="Bay of Kotor" datatype="uri">http://dbpedia.org/resource/Bay_of_Kotor</value><view limit="15" type="list-count" offset="0"></view></query>

This is the same request sent to LOD, it produces 1 result correctly:

POST /fct/service HTTP/1.1
Accept: text/xml
Content-type: text/xml
cache-control: no-cache
Postman-Token: 97c9c6d2-d178-43f0-8d71-9396b0fcb338
User-Agent: PostmanRuntime/7.6.0
Host: lod.openlinksw.com
Accept-Encoding: gzip, deflate
Content-Length: 385
Connection: close

<query label="Record" timeout="13560"><class iri="http://dbpedia.org/class/yago/BodyOfWater109225146" label="Body Of Water"></class><text label="sea">sea</text><property iri="http://dbpedia.org/property/year" label="year"></property><value label="Bay of Kotor" datatype="uri">http://dbpedia.org/resource/Bay_of_Kotor</value><view limit="15" type="list-count" offset="0"></view></query>

You can use the permalink feature to share the issue report using a Faceted Browser Services (FCT) URL.

It is much easier to diagnose starting with URLs for FCT or SPARQL re these matters.

I am not able to reproduce the issue at /fct or /sparql endpoints. I encounter it with /fct/service endpoint only.

I wrote

filter (?s1 = <http://dbpedia.org/resource/Bay_of_Kotor>) . means some code responsible for composing the query misses session as an argument to http(), http_value() or a similar BIF; the result is that the string is not appended to the text of the query but sent to the client.

I can cheat and add nicknames for http() etc. that will have the session argument mandatory.

but now I see different /dct/service outputs in Sherman’s description of FCT problem so this my comment is now totally invalid. I’ve seen garbage before <query label="Record" there but now browse shows empty line, not a garbage.

@imikhailov

This one (the filter bug) is different from the other one (the count bug). I am working on providing some feedback now for your post here…

@imikhailov

I see many references to http(string, string) for example. When you say "misses session as an argument to http()", which of these two arguments is the “session”?

I see, for example, that at line 1043, where the <value> element is processed, the http() call is missing the second argument:

http (sprintf (' filter (%s %s %s) .', t_s, op, val));  

This seems to be the line that needs fixing.

Can you also explain your proposed fix? Is there a deeper design issue that needs to be addressed?

session” (or “stream”) is the second, optional, argument of http(), http_value(), etc., as described in the function documentation. If it is present but is integer zero then it’s like it is missing. This might masquerade errors because

declare ses any;
...
http (something, ses);

seems to be safe but if ses is not initialized before http() call then it is equal to zero and thus the code writes to the web result anyway.

That is clear, thank you. Why is there no second argument in the call at line 1043? I would expect it to be:

  http (sprintf (' filter (%s %s %s) .', t_s, op, val), txt);

As for the fix, instead of the alias, we should comb the facet.sql, and fix all cases where the second argument is missing, or where the second argument is not initialized. Because, if your fix was ever removed for some reason, the bug would reappear.

It would be nice if http() threw an error when second argument is null, so that it could be caught during testing.

I guess that there’s just a forgotten argument there, nothing but.
For testing purpose, one can

create procedure http2 (in v any, in ses any) { http (v,ses); }

and temporarily replace http() with http2() in all files that don’t have to write to web output. Ditto http_value2, etc. This will result in visible run-time errors.

@imikhailov @imitko

A pull request has been submitted in Github containing the fix.

To test the fix, deploy the code, submit a FCT request containing a <value> element, and observe that the server response XML is not mal-formed, and that the FILTER corresponding to the <value> element is contained in the returned SPARQL. An example test query (for dbpedia.org) would be:

POST /fct/service HTTP/1.1
Accept: text/xml
Content-type: text/xml
cache-control: no-cache
Postman-Token: 541ee03e-f39c-4e75-abd7-f075bcf875e6
User-Agent: PostmanRuntime/7.6.0
Host: dbpedia.org
Accept-Encoding: gzip, deflate
Content-Length: 385
Connection: close
<query label="Record" timeout="13560"><class iri="http://dbpedia.org/class/yago/BodyOfWater109225146" label="Body Of Water"></class><text label="sea">sea</text><property iri="http://dbpedia.org/property/year" label="year"></property><value label="Bay of Kotor" datatype="uri">http://dbpedia.org/resource/Bay_of_Kotor</value><view limit="15" type="list-count" offset="0"></view></query>

Here is the cURL command:

curl -X POST \
  http://dbpedia.org/fct/service \
  -H 'Accept: text/xml' \
  -H 'Content-Type: text/xml' \
  -H 'Postman-Token: eccb8b64-0fd6-49ac-8109-34b40bbcf184' \
  -H 'cache-control: no-cache' \
  -d '<query label="Record" timeout="13560"><class iri="http://dbpedia.org/class/yago/BodyOfWater109225146" label="Body Of Water"></class><text label="sea">sea</text><property iri="http://dbpedia.org/property/year" label="year"></property><value label="Bay of Kotor" datatype="uri">http://dbpedia.org/resource/Bay_of_Kotor</value><view limit="15" type="list-count" offset="0"></view></query>'

To make this flow easily for others, you can grab a cURL command-based live example via the copy and paste friendly inspector feature of your browser :slight_smile:

Gotcha. Post updated.

Wonderful!

curl -X POST \
  http://dbpedia.org/fct/service \
  -H 'Accept: text/xml' \
  -H 'Content-Type: text/xml' \
  -H 'Postman-Token: eccb8b64-0fd6-49ac-8109-34b40bbcf184' \
  -H 'cache-control: no-cache' \
  -d '<query label="Record" timeout="13560"><class iri="http://dbpedia.org/class/yago/BodyOfWater109225146" label="Body Of Water"></class><text label="sea">sea</text><property iri="http://dbpedia.org/property/year" label="year"></property><value label="Bay of Kotor" datatype="uri">http://dbpedia.org/resource/Bay_of_Kotor</value><view limit="15" type="list-count" offset="0"></view></query>'

Output snippet screenshot