FCT returns unintuitive type/property counts

#1

I have this list of 5 individuals:

FCT query

<query timeout="15010">
<text property="http://www.w3.org/2000/01/rdf-schema#label">Star Wars</text>
<class iri="http://dbpedia.org/class/yago/Movie106613686"/>
<property iri="http://purl.org/dc/terms/subject"></property>
<property iri="http://dbpedia.org/ontology/starring"></property>
<property iri="http://dbpedia.org/ontology/director"></property>
<view limit="30" type="list" offset="0"></view>
</query>

SPARQL query from FCT

select distinct ?s1 as ?c1 ?g where  { quad map virtrdf:DefaultQuadMap { graph ?g {  ?s1 <http://www.w3.org/2000/01/rdf-schema#label> ?o1 . ?o1 bif:contains  '(STAR AND WARS)'  . }} ?s1 a <http://dbpedia.org/class/yago/Movie106613686> .  quad map virtrdf:DefaultQuadMap { ?s1 <http://purl.org/dc/terms/subject> ?s2 . } optional { graph virtrdf:IRI_Rank_c {  ?s1 virtrdf:IRI_Rank_rnk_c_int ?srank1 . } }  quad map virtrdf:DefaultQuadMap { ?s1 <http://dbpedia.org/ontology/starring> ?s3 . } optional { graph virtrdf:IRI_Rank_c {  ?s1 virtrdf:IRI_Rank_rnk_c_int ?srank1 . } }  quad map virtrdf:DefaultQuadMap { ?s1 <http://dbpedia.org/ontology/director> ?s4 . } optional { graph virtrdf:IRI_Rank_c {  ?s1 virtrdf:IRI_Rank_rnk_c_int ?srank1 . } }  } order by desc (?srank1)  limit 30  offset 0

I now ask for the classes of those individuals (by changing the type attr on view element to ‘classes’ rather than ‘list’, all else remains the same:

FCT query

<query timeout="15010">
<text property="http://www.w3.org/2000/01/rdf-schema#label">Star Wars</text>
<class iri="http://dbpedia.org/class/yago/Movie106613686"/>
<property iri="http://purl.org/dc/terms/subject"></property>
<property iri="http://dbpedia.org/ontology/starring"></property>
<property iri="http://dbpedia.org/ontology/director"></property>
<view limit="30" type="classes" offset="0"></view>
</query>

SPARQL query from FCT

select ?s1c as ?c1 count (*) as ?c2  where  { quad map virtrdf:DefaultQuadMap { graph ?g {  ?s1 <http://www.w3.org/2000/01/rdf-schema#label> ?o1 . ?o1 bif:contains  '(STAR AND WARS)'  . }} ?s1 a <http://dbpedia.org/class/yago/Movie106613686> .  quad map virtrdf:DefaultQuadMap { ?s1 <http://purl.org/dc/terms/subject> ?s2 . } quad map virtrdf:DefaultQuadMap { ?s1 <http://dbpedia.org/ontology/starring> ?s3 . } quad map virtrdf:DefaultQuadMap { ?s1 <http://dbpedia.org/ontology/director> ?s4 . } ?s1 a ?s1c . } group by ?s1c order by desc 2 limit 30  offset 0

Note the wildcard in the count function for the classes query. We would prefer the wildcard be replaced with distinct ?s1. User does not need whatever * is counting, instead, user needs the number of instances of each class. Can you please make this change, so that the number badges on this list makes sense? Ditto for properties count, user needs the number of objects of each property, not whatever * is counting, i.e. replace wildcard with distinct ?s1.

The actual vars to count are these:

select ?s1p count (distinct ?s1o)
select ?s1ip count (distinct ?s1) # not ?s1o
select ?s1c count (distinct ?s1)

@kidehen @hwilliams

#2

This is a little confusing. Please share the permalink of the FCT page in question, or the page collection sequence progression leading the problem via individual permalinks.

Then I or others can just click to see what you mean.

#3

So I assume you seek:

select ?s1p as ?c1 count (distinct ?s1) as ?c2  
where  
	{ quad map virtrdf:DefaultQuadMap 
		{ graph ?g 
			{  ?s1 <http://www.w3.org/2000/01/rdf-schema#label> ?o1 . ?o1 bif:contains  '(STAR AND WARS)'  . }
		} 
	 ?s1 a <http://dbpedia.org/class/yago/Movie106613686> .  
	 
	 quad map virtrdf:DefaultQuadMap 
	 	{ ?s1 <http://purl.org/dc/terms/subject> ?s2 . } 
	quad map virtrdf:DefaultQuadMap { ?s1 <http://dbpedia.org/ontology/starring> ?s3 . } 
	quad map virtrdf:DefaultQuadMap { ?s1 <http://dbpedia.org/ontology/director> ?s4 . } 
	
	?s1 ?s1p ?s1o . 
  } 
group by ?s1p order by desc 2 limit 30  offset 0

rather than

select ?s1p as ?c1 count (*) as ?c2  
where  
	{ quad map virtrdf:DefaultQuadMap 
		{ graph ?g 
			{  ?s1 <http://www.w3.org/2000/01/rdf-schema#label> ?o1 . ?o1 bif:contains  '(STAR AND WARS)'  . }
		} 
	 ?s1 a <http://dbpedia.org/class/yago/Movie106613686> .  
	 
	 quad map virtrdf:DefaultQuadMap 
	 	{ ?s1 <http://purl.org/dc/terms/subject> ?s2 . } 
	quad map virtrdf:DefaultQuadMap { ?s1 <http://dbpedia.org/ontology/starring> ?s3 . } 
	quad map virtrdf:DefaultQuadMap { ?s1 <http://dbpedia.org/ontology/director> ?s4 . } 
	
	?s1 ?s1p ?s1o . 
  } 
group by ?s1p order by desc 2 limit 30  offset 0

/cc @hwilliams @pvk @imitko

#4

FCT permalinks in route, waiting for LOD server to reboot.

#5

So I assume you seek:

Yes, exactly. And the same for property and property-of query too.

#6

Just change the host from lod.openlinksw.com to dbpedia.org in your SPARQL or FCT URI, and you should be fine. LOD has more data, but DBpedia is ample for diagnosing issues. It should also be the default rather than LOD (which is suffering from issues that we are working on etc…).

/cc @hwilliams

#7

Ah, thank you for the info, I will make the changes…

#8

Okay, that’s enough for it to be looked into.

/cc @PvK @imitko

#9

Okay thanks. And actually the vars to count are these:

select ?s1p count (distinct ?s1o)
select ?s1ip count (distinct ?s1) - not ?s1o
select ?s1c count (distinct ?s1)
#10

Just change the host from lod.openlinksw.com to dbpedia.org in your SPARQL or FCT URI

FYI, the VIOS PoC UI now defaults to DBPedia:

image

-sherman

#11

@kidehen The count issue still persists across all the FCT instances I’ve tested, including LOD.

#12

Please provide a live SPARQL results page link. That simplifies diagnosis etc…

#13

The first post in this thread contains the live SPARQL result page links.

#14

@kidehen @imikhailov @imitko

The property count logic and classes count logic both have two branches, one for inference (ln 843) and one when inference is absent (ln 845):

In facet.sql, lines 843 - 845:

  http (sprintf ('select ?s%dp as ?c1 count (distinct (?s%d)) as ?c2 ', this_s, this_s), pre);
      else
  http (sprintf ('select ?s%dp as ?c1 count (*) as ?c2 ', this_s), pre);
  1. Why is inference a factor in the count logic?
  2. Why do we count the triple’s subject (i.e. ?s%d) when inference is present?

I can provide feedback on the first branch once those questions are cleared up.

For the second branch, user would like the following:

In facet.sql, line 845 should read:

http (sprintf ('select ?s%dp as ?c1 count (distinct ?s%do) as ?c2 ', this_s), pre);

For other count logic, the lines should read:

Line 853:

 http (sprintf ('select ?s%dip as ?c1 count (distinct ?s%d) as ?c2 ', this_s), pre);

Line 869:

 http (sprintf ('select ?s%dc as ?c1 count (distinct ?s%d) as ?c2 ', this_s), pre);
#15

@imikhailov @imitko

A pull request containing the fix has been submitted in Github.

To verify the fix, with respect to this list of 5 individuals, when we list their classes, the counts (var c2) should all be 5.

To add to Kingsley’s comment, at first I tried applying my changes to /DAV/VAD/fct/facet.sql, but the changes did not apply. To apply the changes, I followed Kingsley’s instructions by essentially copying the fixed procedure declarations (in this case create fct_view and create fct_value declarations from facet.sql) into the isql command console.

#16

Great!

Note the steps for applying and testing a fix are:

  1. Drop the Stored Procedure to which fix has been applied – DROP PROCEDURE <Procedure-Name> ;
  2. Re-create the new version of Stored Procedure – CREATE PROCEDURE <Procedure-Name> ;
  3. Test effects of fix.