Bif contains specific case issue

hi,

rdfs:label is units_sold

?o bif:contains “‘unit’” give results
?o bif:contains “‘units’” give results
?o bif:contains “‘units_’” give results
?o bif:contains “‘units_s’” NOT give any results
?o bif:contains “‘units_so’” NOT give any results
?o bif:contains “‘units_sol’” NOT give any results
?o bif:contains “‘units_sold’” give results

how can make all above cases work ?

thanks

What is the Virtuoso version being used ?

Do you have a sample test case for recreating with data and the actual SPARQL queries being executed ?

OpenLink Virtuoso Universal Server
Version 07.20.3240-threads for Win64 as of Jun 10 2024 (a1fd8195b)

SPARQL INSERT DATA {
  GRAPH <example.com/testorggraph> {
    <http://example.com/node1> rdfs:label "units_sold" .
    <http://example.com/node2> rdfs:label "units_sold_out" .
    <http://example.com/node3> rdfs:label "units_sales" .
    <http://example.com/node4> rdfs:label "unit_sold" .
    <http://example.com/node4> rdfs:label "total unit sold" .
  }
}

select query :

SPARQL SELECT * 
FROM <example.com/testorggraph> 
WHERE { 
  ?s rdfs:label ?o .
  ?o bif:contains "'unit'"
}
LIMIT 200

we can replace bif contains in above query to simulate all cases

?o bif:contains "'unit'" 
?o bif:contains "'units'"
?o bif:contains "'units_'" 
?o bif:contains "'units_s'" 
?o bif:contains "'units_so'"
?o bif:contains "'units_sol'"
?o bif:contains "'units_sold'"

Additional requirement :
without introducing new formatted label or index,
Is it possible to make below case work ( ignoring spaces usecase) ?
?o bif:contains "'unitsold'"
expected is to return total unit sold

The Virtuoso FT index requires a minimum of 4 chars in a bif:contains search. But there is some confusion when the underscore (_) is used in the search terms. Where the presence of it is omitted if it consist of two words >=4 chars, but it is taken as part of the word if second part less than 4 chars. i.e. unit_sold is seen as unit and sold whereas unit_so is seen as word unit_so.

If you want all unit* values then search for ‘“unit*”’ if you want to ignore _ which considers the underscore as delimiter and not part of the word ie

SPARQL SELECT *  FROM <example.com/testorggraph>  WHERE {    ?s rdfs:label ?o .   ?o bif:contains "'unit*'" };

It is not possible to make ?o bif:contains "'unitsold'" to return total unit sold .

@hwilliams ok thanks
got better understanding of it now
Underscore part can be managed by cleaning source data.
What else characters in label can create problems ?

bif:contains seems cannot provide full partial match logic
let us say need to get all having sold word anywhere in label
?o bif:contains "'*sold*'"
this will give error of Wildcard word needs at least 4 leading characters

regex() queries is slow taking lot of time
bif:contains() have above limitations

Is there any other optimized way to support partial match on rdfs:label for large data ?

You just put the wildcard to the right of the search pattern in bif:contains and it will search either side for matching words ie ?o bif:contains "'sold*'" :

SQL> SPARQL SELECT *  FROM <example.com/testorggraph>  WHERE {    ?s rdfs:label ?o .   ?o bif:contains "'sold*'" };
s                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://example.com/node4                                                          total unit sold
http://example.com/node4                                                          unit_sold
http://example.com/node1                                                          units_sold
http://example.com/node2                                                          units_sold_out

4 Rows. -- 15 msec.
SQL> SPARQL SELECT *  FROM <example.com/testorggraph>  WHERE {    ?s rdfs:label ?o .   ?o bif:contains "'unit*'" };
s                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://example.com/node4                                                          total unit sold
http://example.com/node4                                                          unit_sold
http://example.com/node3                                                          units_sales
http://example.com/node2                                                          units_sold_out
http://example.com/node1                                                          units_sold

5 Rows. -- 16 msec.
SQL>