Using SPARQL Query Templates to Fine-Tune ChatGPT's Large Language Model (LLM)

Being able to utilize SPARQL Query Templates to fine-tune ChatGPT (3.5 or 4.0) is a remarkably powerful and delightful surprise. In this post, I provide live examples of using this functionality to enhance the discovery of data across knowledge graphs associated with SPARQL Query Service endpoints.

It’s important to note that live SPARQL Query Service endpoints across the LOD Cloud Knowledge Graph have been in place since 2007. As part of the Linked Data effort to reorient the notion of a Semantic Web back to the Web, courtesy of Linked Data principles, they have been growing in size, quality, and uptime stealthily. Today, they offer vital infrastructure that reinforces the symbiotic relationship between Large Language Model (LLM)-based Natural Language processors and the idea of a Semantic Web, usable publicly or privately.

Here’s a generic SPARQL Query that enables the discovery of an aggregate of entity types associated with a sample of entities existing in a Knowledge Graph accessible via a SPARQL Query Services endpoint:

SELECT (SAMPLE(?s) AS ?EntityID) 
       (COUNT(*) AS ?count) 
       (?o AS ?EntityTypeID) 
FROM <urn:schemaorg:data:abox> 
WHERE { ?s a ?o . FILTER (CONTAINS(STR(?o),'schema'))} GROUP BY ?o ORDER BY DESC (?count) 

I will use this query as the foundation of the templates used in this post.

Transcript

Q: Load the following templates.

Provide a starting point for exploring knowledge graph <G> using schema.org terms;SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) FROM <G> WHERE { ?s a ?o . FILTER (CONTAINS(STR(?o),'schema.org'))} GROUP BY ?o ORDER BY DESC (?count) ;
Provide a starting point for exploring schema:offers associated with knowledge graph <G> using schema.org terms;SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) FROM <G> WHERE { ?s a ?o . FILTER (CONTAINS(STR(?o),'schema.org/Offer'))} GROUP BY ?o ORDER BY DESC (?count) ;
Provide a starting point for exploring across all knowledge graphs using schema.org terms;SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) WHERE { ?s a ?o . FILTER (CONTAINS(STR(?o),'schema.org'))} GROUP BY ?o ORDER BY DESC (?count) ;
Provide a starting point for exploring knowledge graph <G>;SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) FROM <G> WHERE { ?s a ?o . } GROUP BY ?o ORDER BY DESC (?count) ;
Provide a starting point for exploring across all knowledge graphs;SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) WHERE { ?s a ?o . } GROUP BY ?o ORDER BY DESC (?count) ;

-- Problematic --
Provide a starting point for exploring across all knowledge graphs, with sponging enabled;DEFINE get:soft "soft" SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) WHERE { ?s a ?o . } GROUP BY ?o ORDER BY DESC (?count) ;


# Introspection Query with Built-in rdfs:subClassOf Inference

DEFINE input:inference "urn:rdfs:subclass:subproperty:inference:rules"
SELECT (SAMPLE(?s) AS ?EntityID) 
       (COUNT(*) AS ?count) 
       (?o AS ?EntityTypeID) 
FROM <urn:schemaorg:data:abox> 
WHERE { ?s a ?o . FILTER (CONTAINS(STR(?o),'schema'))} GROUP BY ?o ORDER BY DESC (?count) 

## Derived Templates -- Built-in Reasoning & Inference

### Across all Vocabularies
Provide a starting point for exploring knowledge graph <G> with subClassOf reasoning enabled;DEFINE input:inference "urn:rdfs:subclass:subproperty:inference:rules" SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) FROM <G> WHERE { ?s a ?o.} GROUP BY ?o ORDER BY DESC (?count) ;

### Scoped to Schema.org
Provide a starting point for exploring knowledge graph <G> using schema.org terms with subClassOf reasoning enabled;DEFINE input:inference "urn:rdfs:subclass:subproperty:inference:rules" SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) FROM <G> WHERE { ?s a ?o . FILTER (CONTAINS(STR(?o),'schema.org'))} GROUP BY ?o ORDER BY DESC (?count) ;

# Organization oriented queries

SELECT (SAMPLE(?s) AS ?EntityID) 
       (COUNT(*) AS ?count) 
       (?o AS ?EntityTypeID) 
FROM <urn:schemaorg:data:abox> 
WHERE { ?s a ?o . FILTER (CONTAINS(STR(?s),'apple.com')) FILTER (CONTAINS(STR(?o),'schema'))} GROUP BY ?o ORDER BY DESC (?count) 

## Derived Templates

Provide a starting point for exploring an <Org> centric <Org> knowledge graph <G> using schema.org terms;SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) FROM <G> WHERE { ?s a ?o . FILTER (CONTAINS(STR(?s),'<Org>')) FILTER (CONTAINS(STR(?o),'schema.org'))} GROUP BY ?o ORDER BY DESC (?count) ;

# Item Listing Queries
## Offers 
List offers and prices associated with knowledge graph <G>, using terms from schema.org;SELECT ?offer ?price FROM <G> WHERE { ?offer a schema:Offer; schema:price ?price .} ORDER BY ASC (?price) ;
List offers and prices from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT ?offer ?price FROM <G> WHERE { ?offer a schema:Offer; schema:price ?price . FILTER (CONTAINS(STR(?offer),'<Org>'))} ORDER BY ASC (?price) ;

## Products & Offers 
List products associated with knowledge graph <G>, using terms from schema.org;SELECT ?product ?name ?offer ?price FROM <G> WHERE { ?product a schema:Product; schema:name ?name; (schema:offers | schema:itemOffered) ?offer. ?offer a schema:Offer; schema:price ?price .} ORDER BY ASC (?price) ;
List products from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT ?product ?name ?offer ?price FROM <G> WHERE { ?product a schema:Product; schema:name ?name; (schema:offers | schema:itemOffered) ?offer. ?offer a schema:Offer; schema:price ?price . FILTER (CONTAINS(STR(?offer),'<Org>'))} ORDER BY ASC (?price) ;

## Products 
List products associated with knowledge graph <G>, using terms from schema.org;SELECT ?product ?name FROM <G> WHERE { ?product a schema:Product; schema:name ?name.} ORDER BY ASC (?name) ;
List products from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT ?product ?name ?offer ?price FROM <G> WHERE { ?product a schema:Product; schema:name ?name. FILTER (CONTAINS(STR(?product),'<Org>'))} ORDER BY ASC (?price) ;

## FAQ Page

List FAQs associated with knowledge graph <G>, using terms from schema.org;SELECT ?page ?question ?name ?answerText FROM <G> WHERE { ?page a schema:FAQPage; schema:mainEntity ?question. ?question (schema:name|schema:title) ?name; (schema:acceptedAnswer|schema:suggestedAnswer) ?answer. ?answer schema:text ?answerText.} ORDER BY ASC (?name) ;
List FAQs from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT ?page ?question ?name ?answerText FROM <G> WHERE { ?page a schema:FAQPage; schema:mainEntity ?question. ?question (schema:name|schema:title) ?name; (schema:acceptedAnswer|schema:suggestedAnswer) ?answer. ?answer schema:text ?answerText. FILTER (CONTAINS(STR(?page),'<Org>'))} ORDER BY ASC (?name) ;

## Jobs
List Job Postings associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?job ?name FROM <G> WHERE { ?job a schema:JobPosting; (schema:name|schema:title) ?name. } ;
List Job Postings from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?job ?name ?org FROM <G> WHERE { ?job a schema:JobPosting; (schema:name|schema:title) ?name; schema:hiringOrganization ?org . FILTER (CONTAINS(STR(?org),'<Org>'))} ;

GPT Generated Template additions

Q: Based on my current template collection, can you create an addition for schema:SearchAction class instances based on my the Products template?

A (GPT 4.0):
List SearchActions associated with knowledge graph , using terms from schema.org;SELECT ?action ?query ?target FROM WHERE { ?action a schema:SearchAction; schema:query ?query; schema:target ?target .} ORDER BY ASC (?query) ;
List SearchActions from associated with knowledge graph , using terms from schema.org;SELECT ?action ?query ?target FROM WHERE { ?action a schema:SearchAction; schema:query ?query; schema:target ?target. FILTER (CONTAINS(STR(?action),‘’))} ORDER BY ASC (?query) ;

Q: Using the recent FAQ related templates as guidelines, generate template variants for schema:HowTo instance that include HowToStep instance associations.

A:
Here are the templates for schema:HowTo instances that include schema:HowToStep instance associations:

List HowTo guides associated with knowledge graph , using terms from schema.org;

SELECT ?guide ?step ?name ?text FROM <G> WHERE { 
  ?guide a schema:HowTo; 
  schema:step ?step. 
  ?step (schema:name|schema:text) ?text.
  ?guide (schema:name|schema:title) ?name.
} 
ORDER BY ASC(?name)

List HowTo guides from associated with knowledge graph , using terms from schema.org;

SELECT ?guide ?step ?name ?text FROM <G> WHERE { 
  ?guide a schema:HowTo; 
  schema:step ?step. 
  ?step (schema:name|schema:text) ?text.
  ?guide (schema:name|schema:title) ?name.
  FILTER (CONTAINS(STR(?guide),'<Org>'))
} 
ORDER BY ASC(?name)

Q: Generate an Ontology for my templates that includes an Template instance for schema:Recipe

A:

Sure, I can help generate an ontology that includes a Template instance for the schema:Recipe class. Here is an example:

@prefix ont: <http://example.org/ontology/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .

ont: ont:Template rdf:type rdfs:Class .
ont:Prompt rdf:type rdfs:Class ; rdfs:subClassOf ont:Template .
ont:Response rdf:type rdfs:Class ; rdfs:subClassOf ont:Template .

ont:PromptRecipe rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "List recipes associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?recipe ?name FROM <G> WHERE {?recipe a schema:Recipe; schema:name ?name.} ORDER BY ASC (?name);" ;
  ont:hasResponse ont:Response4 .
ont:Response4 rdf:type ont:Response .

ont:PromptRecipeByOrg rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "List recipes from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?recipe ?name FROM <G> WHERE {?recipe a schema:Recipe; schema:name ?name. FILTER (CONTAINS(STR(?recipe),'<Org>'))} ORDER BY ASC (?recipe);" ;
  ont:hasResponse ont:Response5 .
ont:Response5 rdf:type ont:Response .

This ontology defines two templates:

  • ont:PromptRecipe for listing recipes from a knowledge graph
  • ont:PromptRecipeByOrg for listing recipes from a specific organization within a knowledge graph.

Please replace <G> and <Org> with the IRI of your knowledge graph and your organization respectively when using these templates. The FILTER function in the second template is used to filter the results by the specified organization.

Q: Redo, but replace example.com with a relative hyperlinks using <#> .

A:

Sure, I can modify the ontology to use relative hyperlinks. Here’s the revised ontology:

## Turtle Start ##
@prefix : <#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .

:Template rdf:type rdfs:Class .
:Prompt rdf:type rdfs:Class ; rdfs:subClassOf :Template .
:Response rdf:type rdfs:Class ; rdfs:subClassOf :Template .

:PromptRecipe rdf:type :Prompt ;
  :hasUser :UserA ;
  :hasInput "List recipes associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?recipe ?name FROM <G> WHERE {?recipe a schema:Recipe; schema:name ?name.} ORDER BY ASC (?name);" ;
  :hasResponse :Response4 .
:Response4 rdf:type :Response .

:PromptRecipeByOrg rdf:type :Prompt ;
  :hasUser :UserA ;
  :hasInput "List recipes from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?recipe ?name FROM <G> WHERE {?recipe a schema:Recipe; schema:name ?name. FILTER (CONTAINS(STR(?recipe),'<Org>'))} ORDER BY ASC (?recipe);" ;
  :hasResponse :Response5 .
:Response5 rdf:type :Response .
## Turtle End ##

This version uses relative hyperlinks (i.e., <#>) to define resources within the ontology. This is a common practice in RDF and is used when the base URL of the ontology is not known or can vary. In this case, the base URL will be the URL where the ontology is hosted.

GPT Generated SPARQL Template Ontology Screenshot 1

GPT Generate SPARQL Template Ontology Screenshot 2

Testing Natural Language Query using GPT Generated Templates

Additional Sample Natural Language Queries Collection

These queries work with both GPT 3.5 and 4.0.

Note:
You can also incorporate the following into your prompts to control the behavior of your hyperlink lookups, if you are targeting a Virtuoso-based endpoint with the Faceted Search & Browsing module installed:

Redo, setting the value of the &format to text/x-html+tr .

Tools Used

Combined Multi-Model Database & HTTP Server Technology

OpenLink Structured Data Sniffer – Multi-Browser Extension

Related

SPARQL INSERTs based Fine-Tuning

Here’s how the goal outlined above is achieved, using a template based on the following generic INSERT statement.

INSERT DATA {
  GRAPH <http://example.com/graphURI> {
    <S> <P> <O> .
  }
}

Template

Using relative hyperlinks for subject and object denotation, add data expressed in the RDF triple: <S> <P> <O> to the knowledge graph <G>;PREFIX : <urn:records:test#> INSERT DATA { GRAPH <G> { :<S> :<P> :<O> .}} ;

Here’s a usage example using a ChatGPT session transcript.

Transcript

Q:
Using the following template.

Using relative hyperlinks for subject and object denotation, add data expressed in the RDF triple: <S> <P> <O> to the knowledge graph <G>;PREFIX : <urn:records:test#> INSERT DATA { GRAPH <G> { :<S> :<P> :<O> .}} ;

Create a clickable SPARQL link for inserting the following into the knowledge graph denoted by urn:records:test: this4 relatedTo this . Using the endpoint https://linkeddata.uriburner.com/sparql

A:
Sure, to insert the RDF triple this4 relatedTo this into the graph urn:records:test , the SPARQL command would be:

PREFIX : <urn:records:test#> 
INSERT DATA { GRAPH <urn:records:test> { :this4 :relatedTo :this .}} 

Here’s the clickable SPARQL link for the given endpoint: SPARQL Insert Link

Screenshot 1

Screenshot 2 (Multi-Protocol Authentication Challenge)

chatgpt-uriburner-sparql-7

Note:

  1. The URIBurner Service implements a Data Access Policy informed by an attribute-based access control (ABAC) that allows authenticated users perform read-write operations against public knowledge graphs (i.e., those that aren’t protected by more constraining data access policies).
  2. Data Access policies are created via RDF statements too!

Screenshot 3 (Successful Insert)

chatgpt-sparql-rww-demo-3

Screenshot 4 SPARQL URL (via the OpenLink Structured Data Sniffer Browser Extension)

Screenshot 5 (SPARQL SELECT Query Displaying Effects of INSERT)

Screenshot 6 (Single-Page Read-Write Application View)

Here’s alternative view showing SPARQL INSERT operation effects.

Related

Extending existing SPARQL-based Template Collection using ChatGPT

It’s a new day, and I am continuing my ChatGPT session on LLM fine-tuning using SPARQL. The following transcript showcases the usefulness of software that understands the structure and syntax of natural language sentences.

Transcript

Q: Can you explore schema:ScholarlyArticle instances from urn:schemaorg:data:abox ?

A:
In order to explore schema:ScholarlyArticle instances from urn:schemaorg:data:abox, we would need a SPARQL query that selects such instances and their associated titles or names. However, currently, there is no template for schema:ScholarlyArticle in your collection.

Should I create a template for schema:ScholarlyArticle based on your existing templates and then use it to generate a SPARQL query, or do you prefer to provide a specific template?

Q: Yes

A:
Sure, based on your existing templates, we could create a SPARQL query template for schema:ScholarlyArticle instances as follows:

## Turtle Start ##
ont:PromptScholarlyArticle rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "List scholarly articles associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?article ?name FROM <G> WHERE {?article a schema:ScholarlyArticle; (schema:name | schema:title) ?name.} ORDER BY ASC (?name);" ;
  ont:hasResponse ont:Response6 .
ont:Response6 rdf:type ont:Response .

ont:PromptScholarlyArticleByOrg rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "List scholarly articles from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?article ?name FROM <G> WHERE {?article a schema:ScholarlyArticle; (schema:name | schema:title) ?name. FILTER (CONTAINS(STR(?article),'<Org>'))} ORDER BY ASC (?name);" ;
  ont:hasResponse ont:Response7 .
ont:Response7 rdf:type ont:Response .
## Turtle End ##

With these templates, you can generate SPARQL queries to list scholarly articles from a specific knowledge graph and to filter scholarly articles by organization.

Now, let’s generate a SPARQL query for exploring schema:ScholarlyArticle instances from urn:schemaorg:data:abox:

SELECT DISTINCT ?article ?name 
FROM <urn:schemaorg:data:abox> 
WHERE {
  ?article a schema:ScholarlyArticle; 
           (schema:name | schema:title) ?name.
} 
ORDER BY ASC(?name)

You can run this query on a SPARQL endpoint that contains the urn:schemaorg:data:abox graph.

Q: Okay, add the new template to the collection and then address my earlier request using the endpoint Protected SPARQL Query Service .

A:
Alright, let’s first add the ScholarlyArticle templates to the collection:

## Turtle Start ##
ont:PromptScholarlyArticle rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "List scholarly articles associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?article ?name FROM <G> WHERE {?article a schema:ScholarlyArticle; (schema:name | schema:title) ?name.} ORDER BY ASC (?name);" ;
  ont:hasResponse ont:Response6 .
ont:Response6 rdf:type ont:Response .

ont:PromptScholarlyArticleByOrg rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "List scholarly articles from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT DISTINCT ?article ?name FROM <G> WHERE {?article a schema:ScholarlyArticle; (schema:name | schema:title) ?name. FILTER (CONTAINS(STR(?article),'<Org>'))} ORDER BY ASC (?name);" ;
  ont:hasResponse ont:Response7 .
ont:Response7 rdf:type ont:Response .
## Turtle End ##

Screenshot (Generated Prompt Ontology Instance Data Snippet)

Next, I’ll generate a clickable SPARQL link to explore schema:ScholarlyArticle instances from urn:schemaorg:data:abox using the endpoint https://linkeddata.uriburner.com/sparql:

Explore Scholarly Articles

You can click the link to run the SPARQL query and explore schema:ScholarlyArticle instances from urn:schemaorg:data:abox.

Q: Set the following at the default value of the &format parameter for all clickable sparql links generated in this session: text/x-html+tr, and then redo the last request.

A:
Sure, I will set the default value of the &format parameter to text/x-html+tr for all clickable SPARQL links generated in this session.

Here’s the updated SPARQL link to explore schema:ScholarlyArticle instances from urn:schemaorg:data:abox using the endpoint https://linkeddata.uriburner.com/sparql:

Explore Scholarly Articles

You can click the link to run the SPARQL query and explore schema:ScholarlyArticle instances from urn:schemaorg:data:abox.

Screenshot (Query Solution Page)

Screenshot (Selected Scholarly Article Description)

Q: Redo, but across any existing knowledge graphs associated with the designated sparql endpoint.

A:
I understand you want to explore schema:ScholarlyArticle instances across all knowledge graphs associated with the designated SPARQL endpoint. Here is the updated SPARQL query without specifying a particular graph:

Explore Scholarly Articles

You can click the link to run the SPARQL query and explore schema:ScholarlyArticle instances across all available knowledge graphs.

Screenshot (Query Solution Page)

Screenshot (Selected Scholarly Article Description)

Additional SPARQL-Template Collection – Aggregates

The ChatGPT transcript that follows includes the addition of aggregate queries to the template collection for fine-tuning the GPT 4.0 and 3.5 LLMs.

Transcript

Q: Looking at the current template for FAQs, can you generate a template for answering requests along the following lines:
Count the number of questions associate with an FAQ?

A:
Certainly, here are the templates you requested:

Count Questions in FAQ

Count the number of questions associated with FAQs in knowledge graph , using terms from schema.org:

SELECT ?page (COUNT(?question) as ?questionCount) 
FROM <G> WHERE { 
  ?page a schema:FAQPage; 
        schema:mainEntity ?question. 
} GROUP BY ?page 
ORDER BY DESC(?questionCount);

Count the number of questions associated with FAQs from in knowledge graph , using terms from schema.org:

SELECT ?page (COUNT(?question) as ?questionCount) 
FROM <G> WHERE { 
  ?page a schema:FAQPage; 
        schema:mainEntity ?question. 
  FILTER (CONTAINS(STR(?page),'<Org>'))
} GROUP BY ?page 
ORDER BY DESC(?questionCount);

These templates allow you to count the number of questions associated with each FAQ page within a given knowledge graph or associated with a specific organization.

Q: Okay, generate this template and add to the collection.

A:
Adding the templates to the collection would look like this in the ontology:

## Turtle Start ##
ont:PromptCountFAQQuestions rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "Count the number of questions associated with FAQs in knowledge graph <G>, using terms from schema.org;SELECT ?page (COUNT(?question) as ?questionCount) FROM <G> WHERE { ?page a schema:FAQPage; schema:mainEntity ?question. } GROUP BY ?page ORDER BY DESC(?questionCount);" ;
  ont:hasResponse ont:ResponseCountFAQQuestions .
ont:ResponseCountFAQQuestions rdf:type ont:Response .

ont:PromptCountFAQQuestionsByOrg rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "Count the number of questions associated with FAQs from <Org> in knowledge graph <G>, using terms from schema.org;SELECT ?page (COUNT(?question) as ?questionCount) FROM <G> WHERE { ?page a schema:FAQPage; schema:mainEntity ?question. FILTER (CONTAINS(STR(?page),'<Org>')) } GROUP BY ?page ORDER BY DESC(?questionCount);" ;
  ont:hasResponse ont:ResponseCountFAQQuestionsByOrg .
ont:ResponseCountFAQQuestionsByOrg rdf:type ont:Response .
## Turtle End ##

These templates can be used to generate queries for counting the number of questions associated with FAQ pages in a knowledge graph.

Q: Redo, but for steps in a HowTo.

A:
Here are the templates for counting the number of steps in a HowTo:

## Turtle Start ##
ont:PromptCountHowToSteps rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "Count the number of steps associated with HowTos in knowledge graph <G>, using terms from schema.org;SELECT ?howto (COUNT(?step) as ?stepCount) FROM <G> WHERE { ?howto a schema:HowTo; schema:step ?step. } GROUP BY ?howto ORDER BY DESC(?stepCount);" ;
  ont:hasResponse ont:ResponseCountHowToSteps .
ont:ResponseCountHowToSteps rdf:type ont:Response .

ont:PromptCountHowToStepsByOrg rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "Count the number of steps associated with HowTos from <Org> in knowledge graph <G>, using terms from schema.org;SELECT ?howto (COUNT(?step) as ?stepCount) FROM <G> WHERE { ?howto a schema:HowTo; schema:step ?step. FILTER (CONTAINS(STR(?howto),'<Org>')) } GROUP BY ?howto ORDER BY DESC(?stepCount);" ;
  ont:hasResponse ont:ResponseCountHowToStepsByOrg .
ont:ResponseCountHowToStepsByOrg rdf:type ont:Response .
## Turtle End ##

These templates can be used to generate queries for counting the number of steps associated with HowTo instances in a knowledge graph.

Q: Add that to the current template collection.

A:

I’ve added the “Count the number of steps in HowTos” templates to your collection:

  1. Count the number of steps associated with HowTos in knowledge graph
SELECT ?howto (COUNT(?step) as ?stepCount) 
FROM <G> 
WHERE { 
  ?howto a schema:HowTo; 
  schema:step ?step. 
} 
GROUP BY ?howto 
ORDER BY DESC(?stepCount)
  1. Count the number of steps associated with HowTos from in knowledge graph
SELECT ?howto (COUNT(?step) as ?stepCount) 
FROM <G> 
WHERE { 
  ?howto a schema:HowTo; 
  schema:step ?step. 
  FILTER (CONTAINS(STR(?howto),'<Org>')) 
} 
GROUP BY ?howto 
ORDER BY DESC(?stepCount)

You can use these templates to generate SPARQL queries for counting the number of steps associated with HowTo instances in a given knowledge graph or from a specific organization.

Q: Count the number of questions associated with the FAQ denoted by https://www.net-usb.com/share-usb/share-yubikey-over-network/#FAQPage

A:
To count the number of questions associated with a specific FAQ page, we can modify the template you’ve provided earlier for counting questions associated with an FAQ page.

Here is the SPARQL query template:

SELECT (COUNT(?question) AS ?questionCount) 
FROM <G> 
WHERE { 
    <FAQPageURI> a schema:FAQPage; 
    schema:mainEntity ?question. 
}

In this case, replace <G> with your graph URI (e.g., urn:schemaorg:data:abox), and replace <FAQPageURI> with the FAQ page URI you’re interested in, which is https://www.net-usb.com/share-usb/share-yubikey-over-network/#FAQPage in your case.

Here’s the generated clickable SPARQL query link:

Run the query

Please note that the results of this query will depend on the data available in the specific knowledge graph. If the FAQ page URI doesn’t exist in the graph or if it doesn’t have any associated questions, the count will be zero.

Q: Generate the new template you suggested.

A:
Sure, here are the new templates for counting the number of questions in an FAQ page and the number of steps in a HowTo:

# Count questions in FAQ <FAQPageURI> associated with knowledge graph <G>, using terms from schema.org;

SELECT (COUNT(?question) AS ?questionCount) 
FROM <G> 
WHERE { 
    <FAQPageURI> a schema:FAQPage; 
    schema:mainEntity ?question. 
}
# Count steps in HowTo <HowToURI> associated with knowledge graph <G>, using terms from schema.org;

SELECT (COUNT(?step) AS ?stepCount) 
FROM <G> 
WHERE { 
    <HowToURI> a schema:HowTo; 
    schema:step ?step. 
}

These templates allow you to count the number of questions in a specific FAQ page and the number of steps in a specific HowTo guide, respectively. Replace <G>, <FAQPageURI>, and <HowToURI> with your graph URI and the URI of the FAQ page or HowTo guide you’re interested in.

Remember, to generate clickable SPARQL query links from these templates, the URIs should be properly encoded to be used in a URL.

Q: Generate a description of the new temples in RDF-Turtle using terms from the Template Ontology, derived from the following example:

## Turtle Start ##
ont:PromptCountHowToSteps rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "Count the number of steps associated with HowTos in knowledge graph <G>, using terms from schema.org;SELECT ?howto (COUNT(?step) as ?stepCount) FROM <G> WHERE { ?howto a schema:HowTo; schema:step ?step. } GROUP BY ?howto ORDER BY DESC(?stepCount);" ;
  ont:hasResponse ont:ResponseCountHowToSteps .
ont:ResponseCountHowToSteps rdf:type ont:Response .

ont:PromptCountHowToStepsByOrg rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "Count the number of steps associated with HowTos from <Org> in knowledge graph <G>, using terms from schema.org;SELECT ?howto (COUNT(?step) as ?stepCount) FROM <G> WHERE { ?howto a schema:HowTo; schema:step ?step. FILTER (CONTAINS(STR(?howto),'<Org>')) } GROUP BY ?howto ORDER BY DESC(?stepCount);" ;
  ont:hasResponse ont:ResponseCountHowToStepsByOrg .
ont:ResponseCountHowToStepsByOrg rdf:type ont:Response .
## Turtle End ##

Q: example.com should always be replaced by the following namespace prefix declaration:
@prefix ont: <#> .

A:
Got it, here’s the RDF-Turtle representation of the newly created template for counting questions associated with a FAQ using the relative namespace:

## Turtle Start ##
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ont: <#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ont:PromptCountFAQQuestions rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "Count the number of questions associated with FAQ in knowledge graph <G>, using terms from schema.org;SELECT ?faq (COUNT(?question) as ?questionCount) FROM <G> WHERE { ?faq a schema:FAQPage; schema:mainEntity ?question. } GROUP BY ?faq ORDER BY DESC(?questionCount);" ;
  ont:hasResponse ont:ResponseCountFAQQuestions .
ont:ResponseCountFAQQuestions rdf:type ont:Response .
## Turtle End ##

Q: Repeat, but for HowTo step count

A:
Here’s the RDF-Turtle representation of the newly created template for counting steps associated with a HowTo using the relative namespace:

## Turtle Start ##
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ont: <#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ont:PromptCountHowToSteps rdf:type ont:Prompt ;
  ont:hasUser ont:UserA ;
  ont:hasInput "Count the number of steps associated with HowTos in knowledge graph <G>, using terms from schema.org;SELECT ?howto (COUNT(?step) as ?stepCount) FROM <G> WHERE { ?howto a schema:HowTo; schema:step ?step. } GROUP BY ?howto ORDER BY DESC(?stepCount);" ;
  ont:hasResponse ont:ResponseCountHowToSteps .
ont:ResponseCountHowToSteps rdf:type ont:Response .
## Turtle End ##

Q: How many steps are there in the HowTo denoted by https://www.net-usb.com/share-usb/share-yubikey-over-network/#HowTo?

A:
To count the number of steps associated with a specific HowTo, you can use the newly created template. Here is the clickable SPARQL link:

Count Steps in HowTo

Q: Can you list the 3rd step from the HowTo denoted by https://www.net-usb.com/share-usb/share-yubikey-over-network/#HowTo?

A:
To retrieve the third step in a HowTo, you can use a SPARQL query with the ORDER BY and LIMIT keywords to sort the steps and select only the third one. Here is a clickable SPARQL link to get the third step:

Get 3rd Step in HowTo

Q: Redo, but for the 2nd question from the FAQ denoted by https://www.net-usb.com/share-usb/share-yubikey-over-network/#FAQPage

A:
To retrieve the second question from an FAQPage, you can use a SPARQL query with the ORDER BY and LIMIT keywords to sort the questions and select only the second one. Here is a clickable SPARQL link to get the second question:

Get 2nd Question in FAQ

Related

Uniprot Knowledge Graph & ChatGPT Fine-tuning using SPARQL

Q: What is Uniprot, and why is its SPARQL Query Service important?
A:
Uniprot is a comprehensive resource for protein sequence and annotation data². It consists of three databases: the UniProt Knowledgebase (UniProtKB), which contains manually reviewed and automatically annotated protein entries; the UniProt Reference Clusters (UniRef), which clusters protein sequences at different levels of identity to reduce redundancy; and the UniProt Archive (UniParc), which stores all publicly available protein sequences².

The Uniprot SPARQL Query Service is a web service that allows users to query and access Uniprot data using the SPARQL query language¹. SPARQL is a standard language for querying and manipulating RDF data, which is a format that represents data as a graph of entities and relationships¹. The Uniprot SPARQL Query Service supports the SPARQL 1.1 Standard and provides a web interface where users can enter and execute queries, as well as download the results in various formats².

The Uniprot SPARQL Query Service is important because it enables users to perform complex and customized queries on Uniprot data that are not possible with the web interface or the API. For example, users can query and combine data from different Uniprot databases, such as UniProtKB, UniRef, UniParc, and taxonomy, as well as from other SPARQL endpoints hosted by the EBI, the SIB, or other sources¹. This provides a valuable low-cost alternative to building your own data warehouse. The Uniprot SPARQL Query Service also provides examples of simple and advanced queries that users can modify and reuse for their own purposes².

Source: Conversation with Bing, 6/2/2023
(1) UniProt. https://sparql.uniprot.org/.
(2) SPARQL for UniProt. https://legacy.uniprot.org/help/sparql.
(3) UniProt. https://www.uniprot.org/.
(4) SPARQL | UniProt - EMBL-EBI. SPARQL | UniProt.

In this post, I will demonstrate the powerful benefits that arise from fine-tuning the GPT4 Large Language Model (the LLM) used by ChatGPT (set to GPT 4.0 mode) using SPARQL query templates derived from the publicly available Uniprot sample queries collection.

Large Language Models & Natural Language Processing

A lot has been written about this topic, and from my own experimentation the utility of LLMs boils down to their ability deliver pretty revolutionary natural language processing due to a profound understanding of sentence structure, rules of grammar, and “parts of speech” semantics.

Here’s a simple demonstration, using ChatGPT, that provides easy to understand clarity to the statement above,

Q: How many variations of the following sentence can you generate?
Find all Natural Variant Annotations if associated via an evidence tag to an article with a pubmed identifier.

A:
Sure, here are a few variations on your sentence:

  1. Locate all Natural Variant Annotations tied to an article by an evidence tag, if the article has a PubMed identifier.
  2. If an article possesses a PubMed identifier, obtain all Natural Variant Annotations connected to it through an evidence tag.
  3. Ascertain all Natural Variant Annotations related to an article via an evidence tag, given that the article holds a PubMed identifier.
  4. Discover all Natural Variant Annotations linked by an evidence tag to any article carrying a PubMed identifier.
  5. Uncover all Natural Variant Annotations related to an article via an evidence tag, provided the article features a PubMed identifier.
  6. Retrieve all Natural Variant Annotations that are associated through an evidence tag to an article, if it is identified by PubMed.
  7. Identify all Natural Variant Annotations associated with an article, through an evidence tag, if the article is referenced by a PubMed identifier.
  8. Look for all Natural Variant Annotations associated with an article via an evidence tag, if the article has a PubMed ID.
  9. Search for all Natural Variant Annotations linked to an article via an evidence tag, assuming the article comes with a PubMed identifier.
  10. Get all Natural Variant Annotations connected to an article using an evidence tag, if the article has a PubMed identifier.

Linked Data Principles, Knowledge Graphs, and a Semantic Web

Linked Data Principles simply imply the use of HTTP-based hyperlinks to unambiguously name entities, entity types, and entity relationship types. These entities, types, and relationships provide the framework for constructing digital sentences that describe anything of interest using RDF (Resource Description Framework).

This approach to structured data representation enables the construction of knowledge graphs that manifest a semantic web rife with connections that facilitate explicit and/or serendipitous discovery. Basically, a variant of the Web on steroids.

Combined Force Implications

This implies the ability to map a myriad of natural language sentences to a single, structured query expressed in SPARQL that targets the high-quality, publicly-accessible knowledge graph provided by UniProt. UniProt’s knowledge graph comprises more than 100 billion entity relationships, which are deployed using Linked Data principles. The knowledge graph is hosted on a single Virtuoso DBMS instance.

How?

Let’s resume our interactions with ChatGPT to demonstrate the effects of SPARQL Query based fine-tuning that ultimately ups the ante regarding response quality.

  1. Go to the Uniprot SPARQL examples page
  2. Take an example query of interest
  3. Use the query description and actual query text to create a simple {Prompt};{Response} structured fine-tuning template
  4. Register template with your ChatGPT session
  5. Construct prompts that are similar to what was used in your template definition – which is where the sentence similarity comprehension magic of the GPT4 LLM kicks in

Query Description Example
Find all Natural Variant Annotations if associated via an evidence tag to an article with a pubmed identifier

Associated SPARQL Query Text Example

SELECT 
    ?accession
    ?annotation_acc 
    ?pubmed
WHERE
{
        ?protein a up:Protein ;
            up:annotation ?annotation .
        ?annotation a up:Natural_Variant_Annotation .
        ?linkToEvidence rdf:object ?annotation ;
                        up:attribution ?attribution .
        ?attribution up:source ?source .
        ?source a up:Journal_Citation .
  BIND(SUBSTR(STR(?protein),33) AS ?accession)
  BIND(IF(CONTAINS(STR(?annotation), "#SIP"), SUBSTR(STR(?annotation),33), SUBSTR(STR(?annotation),36))AS?annotation_acc)
  BIND(SUBSTR(STR(?source),35) AS ?pubmed)
}

Fine-tuning Template Example

Structure: {Query-Description-Text};{Query}

Find all Natural Variant Annotations if associated via an evidence tag to an article with a pubmed identifier;
SELECT 
    ?accession
    ?annotation_acc 
    ?pubmed
WHERE {
        ?protein a up:Protein ;
            up:annotation ?annotation .
        ?annotation a up:Natural_Variant_Annotation .
        ?linkToEvidence rdf:object ?annotation ;
                        up:attribution ?attribution .
        ?attribution up:source ?source .
        ?source a up:Journal_Citation .
  BIND(SUBSTR(STR(?protein),33) AS ?accession)
  BIND(IF(CONTAINS(STR(?annotation), "#SIP"), SUBSTR(STR(?annotation),33), SUBSTR(STR(?annotation),36))AS?annotation_acc)
  BIND(SUBSTR(STR(?source),35) AS ?pubmed)
}

Here’s a session transcript snippet demonstrating the fine-tuning effects of the example template.

Q: Given the template.
Find all Natural Variant Annotations if associated via an evidence tag to an article with a pubmed identifier;
SELECT 
    ?accession
    ?annotation_acc 
    ?pubmed
WHERE {
        ?protein a up:Protein ;
            up:annotation ?annotation .
        ?annotation a up:Natural_Variant_Annotation .
        ?linkToEvidence rdf:object ?annotation ;
                        up:attribution ?attribution .
        ?attribution up:source ?source .
        ?source a up:Journal_Citation .
  BIND(SUBSTR(STR(?protein),33) AS ?accession)
  BIND(IF(CONTAINS(STR(?annotation), "#SIP"), SUBSTR(STR(?annotation),33), SUBSTR(STR(?annotation),36))AS?annotation_acc)
  BIND(SUBSTR(STR(?source),35) AS ?pubmed)
}

Search for all Natural Variant Annotations linked to an article via an evidence tag, assuming the article comes with a PubMed identifier, using the uniprot sparql endpoint

A:
Given your SPARQL query and the template, here’s the variation you’re looking for:

“Query the uniprot SPARQL endpoint to find all Natural Variant Annotations that are linked to an article through an evidence tag, assuming that the article is associated with a PubMed identifier.”

In a technical environment, the sentence might be written as:

“Using the uniprot SPARQL endpoint, execute a SELECT query to fetch all Natural Variant Annotations tied to an article via an evidence tag, given the article is attached with a PubMed identifier.”

And here’s the SPARQL query in a format ready for use:

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT 
    ?accession
    ?annotation_acc 
    ?pubmed
WHERE {
    ?protein a up:Protein ;
        up:annotation ?annotation .
    ?annotation a up:Natural_Variant_Annotation .
    ?linkToEvidence rdf:object ?annotation ;
                    up:attribution ?attribution .
    ?attribution up:source ?source .
    ?source a up:Journal_Citation .

    BIND(SUBSTR(STR(?protein),33) AS ?accession)
    BIND(IF(CONTAINS(STR(?annotation), "#SIP"), SUBSTR(STR(?annotation),33), SUBSTR(STR(?annotation),36)) AS ?annotation_acc)
    BIND(SUBSTR(STR(?source),35) AS ?pubmed)
}

Q: Redo, but returning a hyperlink that denotes the generated SPARQL query .

A:
Certainly. Here is a revised version:

Utilize the uniprot SPARQL endpoint to query all Natural Variant Annotations associated with an article via an evidence tag, provided that the article is equipped with a PubMed identifier. Access the generated SPARQL query here.

Q:
Redo, in response to the question:
If an article possesses a PubMed identifier, obtain all Natural Variant Annotations connected to it through an evidence tag.

A:
Sure. Here is a revised version:

If an article possesses a PubMed identifier, it’s possible to retrieve all Natural Variant Annotations associated with it through an evidence tag. This can be done by querying the uniprot SPARQL endpoint. You can access the generated SPARQL query here.

Conclusion

It is common knowledge that ChatGPT (and other LLM-based Smart Agents) have response accuracy challenges when dealing with the kind of fine-grained knowledge held in knowledge graphs such as Uniprot. However, it is not as widely known that SPARQL Query based fine-tuning can provide a powerful, yet loosely coupled, solution to this challenge. This post has demonstrated, using a simple example, the use of SPARQL Query based fine-tuning to improve the response accuracy of ChatGPT.

Important Takeaway

Anyone can build on this effort by simply creating more query templates from the Uniprot examples collection, or by repeating the process across 200+ publicly accessible SPARQL Query Services endpoints that make up the massive LOD Cloud Knowledge Graph.

Related

1 Like