Web crawling is an exciting feature that is now being incorporated into both Google Bard and OpenAI’s ChatGPT services. In this post, I use a challenging crawl to gauge the current state of affairs, particularly regarding my ongoing thesis about the potent effects of combining the capabilities of the GPT (4 and 3.5) LLMs with SPARQL fine-tuning for exploring the Semantic Web, facilitated by Virtuoso’s Sponger Middleware module.
Before proceeding, here is some vital background information about SPARQL and the Virtuoso Sponger.
SPARQL
SPARQL is a combination of a declarative query language, an HTTP-based wire protocol, and query solution serialization formats. A SPARQL query solution can be represented as a document identified by a Web Address (URL), with content negotiated by an HTTP client.
Fine-tuning ChatGPT using SPARQL Query Templates
Here’s the template that’s ultimately invoked for the Virtuoso Sponger Crawling exercise showcased later on in this post.
Count FAQs associated with knowledge graph <G>, using terms from schema.org;SELECT COUNT(*) FROM <G> WHERE { ?page a schema:FAQPage; schema:mainEntity ?question. ?question (schema:name|schema:title) ?name; (schema:acceptedAnswer|schema:suggestedAnswer) ?answer. ?answer schema:text ?answerText.} ;
Count FAQs from <Org> associated with knowledge graph <G>, using terms from schema.org;SELECT COUNT(*) FROM <G> WHERE { ?page a schema:FAQPage; schema:mainEntity ?question. ?question (schema:name|schema:title) ?name; (schema:acceptedAnswer|schema:suggestedAnswer) ?answer. ?answer schema:text ?answerText. FILTER (CONTAINS(STR(?page),'<Org>'))} ;
This template can be loaded using prompts such as:
- Remember the following …
- Load the following template
Note: the second prompt style isn’t always understood based on my experiments in recent times.
Virtuoso Sponger
Virtuoso Sponger is a middleware layer hosted by a DBMS that allows deep integration of HTTP functionality into the production pipeline of SPARQL query solutions. This enables comprehensive data access by reference, both for constants and variables, within a SPARQL query. It can crawl the Web (including private intranets) and the Semantic Web, where links represent relationship types that resolve to machine-computable entity relationship type semantics, as part of the query solution production pipeline.
Web Page Crawling Exercise
As previously noted in numerous past posts highlighting the increasing occurrence of RDF-based structured data islands on web pages, Apple has consistently led the way in this endeavor. This is evidenced by the presence of structured data islands seamlessly integrated into its product pages.
For this exercise, I will use a page about the Apple Watch Ultra as the test case, where I aim to determine the number of embedded FAQ questions.
What follows are the results for Bard, Bing+GPT, ChatGPT, and ChatGPT + Virtuoso Sponger.
Bard
Q: How many FAQ questions are there in the following page
https://www.apple.com/shop/buy-watch/apple-watch-ultra?preSelect=false&product=MQF33LL/A&step=detail
?
A:
The page you linked does not have any FAQs.
Bard Response Screenshot
Bing + ChatGPT
Q: How many FAQ questions are there in the following page
https://www.apple.com/shop/buy-watch/apple-watch-ultra?preSelect=false&product=MQF33LL/A&step=detail
?
A:
I’m sorry but I couldn’t find the exact number of FAQ questions on the page you provided. However, I can tell you that there is a “Frequently Asked Questions” section on the page that contains several questions and answers about the Apple Watch Ultra¹. You can also find more information about the Apple Watch Ultra on the official Apple website⁵.
Is there anything else I can help you with?
Source: Conversation with Bing, 5/17/2023
(1) Frequently asked questions about the Apple Watch Ultra. Frequently asked questions Apple Watch Ultra.
(2) Apple Watch - Official Apple Support. Apple Watch - Official Apple Support.
(3) 5 Reasons the Apple Watch Ultra Isn’t a Great Choice for Most People - MUO. https://www.makeuseof.com/apple-watch-ultra-isnt-for-most-people/.
(4) I wore the Apple Watch Ultra for 6 months - Tom’s Guide. I wore the Apple Watch Ultra for 6 months — what I like (and don't like) | Tom's Guide.
(5) Apple Watch Ultra User Guide - Apple Support. Apple Watch User Guide - Apple Support.
Bing+GPT Response Screenshot
ChatGPT + Web Plugin
Q: How many FAQ questions are there in the following page
https://www.apple.com/shop/buy-watch/apple-watch-ultra?preSelect=false&product=MQF33LL/A&step=detail
?
A:
The page you provided contains a total of 8 Frequently Asked Questions (FAQs). Here they are:
- How do I find my band size?
- How do I customize my Apple Watch?
- What’s the difference between GPS and GPS + Cellular?
- Which carriers support Apple Watch cellular service?
- Which iPhone models are compatible with Apple Watch?
- Is it easy to transfer my data and set up my new Apple Watch?
- How can I pay using Apple Card Monthly Installments?
- How does Apple Trade In work?
ChatGPT + Web Plugin Screenshot
ChatGPT + Virtuoso Sponger
Q: How many FAQ questions are there in the following page
https://www.apple.com/shop/buy-watch/apple-watch-ultra?preSelect=false&product=MQF33LL/A&step=detail
?
A:
Sure, let’s construct the SPARQL query using the template:
DEFINE get:soft "soft"
SELECT (COUNT(?question) AS ?questionCount)
FROM <https://www.apple.com/shop/buy-watch/apple-watch-ultra?preSelect=false&product=MQF33LL/A&step=detail>
WHERE {
?page a schema:FAQPage;
schema:mainEntity ?question.
}
And here is the clickable SPARQL link:
Virtuoso SPARQL Query Solution Page Screenshot
In addition, here is a Faceted Search & Browsing Page link that shows details of the FAQ Questions used in this exercise.
Virtuoso Entity Description Page for Apple Watch Ultra
Virtuoso Entity Description Page for Apple Watch Ultra Screenshot
This page comprises entries for each the ten FAQ questions.
Tools Used
Virtuoso Sponger Middleware
- What is the Virtuoso Sponger Middleware about, and why is it important?
- What is a Virtuoso SPARQL Endpoint, and why is it important?
Combined Multi-Model Database & HTTP Server Technology
OpenLink Structured Data Sniffer – Multi-Browser Extension
- Google Chrome Store for Chromium-based Browsers
- Mozilla Store for Firefox
- Mac App Store for Safari
- Github Open Source Repository
Related
- Using SPARQL Query Templates to Fine-Tune ChatGPT’s Large Language Model (LLM) - Virtuoso Universal Server / Tips, Tricks, and HowTos - OpenLink Software Community (openlinksw.com)
- GitHub Repo – Collection of SPARQL Query Templates for Fine-tuning ChatGPT
- What is Small Data, and why is it Important?