hi,
rdfs:label is units_sold
?o bif:contains “‘unit’” give results
?o bif:contains “‘units’” give results
?o bif:contains “‘units_’” give results
?o bif:contains “‘units_s’” NOT give any results
?o bif:contains “‘units_so’” NOT give any results
?o bif:contains “‘units_sol’” NOT give any results
?o bif:contains “‘units_sold’” give results
how can make all above cases work ?
thanks
What is the Virtuoso version being used ?
Do you have a sample test case for recreating with data and the actual SPARQL queries being executed ?
OpenLink Virtuoso Universal Server
Version 07.20.3240-threads for Win64 as of Jun 10 2024 (a1fd8195b)
SPARQL INSERT DATA {
GRAPH <example.com/testorggraph> {
<http://example.com/node1> rdfs:label "units_sold" .
<http://example.com/node2> rdfs:label "units_sold_out" .
<http://example.com/node3> rdfs:label "units_sales" .
<http://example.com/node4> rdfs:label "unit_sold" .
<http://example.com/node4> rdfs:label "total unit sold" .
}
}
select query :
SPARQL SELECT *
FROM <example.com/testorggraph>
WHERE {
?s rdfs:label ?o .
?o bif:contains "'unit'"
}
LIMIT 200
we can replace bif contains in above query to simulate all cases
?o bif:contains "'unit'"
?o bif:contains "'units'"
?o bif:contains "'units_'"
?o bif:contains "'units_s'"
?o bif:contains "'units_so'"
?o bif:contains "'units_sol'"
?o bif:contains "'units_sold'"
Additional requirement :
without introducing new formatted label or index,
Is it possible to make below case work ( ignoring spaces usecase) ?
?o bif:contains "'unitsold'"
expected is to return total unit sold
The Virtuoso FT index requires a minimum of 4
chars in a bif:contains
search. But there is some confusion when the underscore (_
) is used in the search terms. Where the presence of it is omitted if it consist of two words >=4 chars, but it is taken as part of the word if second part less than 4 chars. i.e. unit_sold
is seen as unit and sold
whereas unit_so
is seen as word unit_so
.
If you want all unit*
values then search for ‘“unit*”’ if you want to ignore _
which considers the underscore as delimiter and not part of the word ie
SPARQL SELECT * FROM <example.com/testorggraph> WHERE { ?s rdfs:label ?o . ?o bif:contains "'unit*'" };
It is not possible to make ?o bif:contains "'unitsold'"
to return total unit sold
.
@hwilliams ok thanks
got better understanding of it now
Underscore part can be managed by cleaning source data.
What else characters in label can create problems ?
bif:contains seems cannot provide full partial match logic
let us say need to get all having sold word anywhere in label
?o bif:contains "'*sold*'"
this will give error of Wildcard word needs at least 4 leading characters
regex() queries is slow taking lot of time
bif:contains() have above limitations
Is there any other optimized way to support partial match on rdfs:label for large data ?
You just put the wildcard to the right of the search pattern in bif:contains and it will search either side for matching words ie ?o bif:contains "'sold*'"
:
SQL> SPARQL SELECT * FROM <example.com/testorggraph> WHERE { ?s rdfs:label ?o . ?o bif:contains "'sold*'" };
s o
LONG VARCHAR LONG VARCHAR
_______________________________________________________________________________
http://example.com/node4 total unit sold
http://example.com/node4 unit_sold
http://example.com/node1 units_sold
http://example.com/node2 units_sold_out
4 Rows. -- 15 msec.
SQL> SPARQL SELECT * FROM <example.com/testorggraph> WHERE { ?s rdfs:label ?o . ?o bif:contains "'unit*'" };
s o
LONG VARCHAR LONG VARCHAR
_______________________________________________________________________________
http://example.com/node4 total unit sold
http://example.com/node4 unit_sold
http://example.com/node3 units_sales
http://example.com/node2 units_sold_out
http://example.com/node1 units_sold
5 Rows. -- 16 msec.
SQL>