Introduction
The Wikidata Snapshot (Virtuoso PAGO) starts as a static instance, preloaded with the Wikidata 2022-12
dataset dump, mirroring the public Wikidata instance found at our live Wikidata SPARQL Query Service Endpoint.
This type of AMI provides several fundamental benefits including —
- Virtuoso DBMS Server is preinstalled with basic tuning for the host operating system. (That said, since we support many AMI machine types/sizes, you should still tune the configuration to suit the available RAM in your instance.)
- Wikidata Dataset is preloaded and preconfigured (and may be configurable to auto-update).
- You can start and stop the Wikidata instance without having to terminate its host AMI.
- With the hourly model, you pay only for the time the AMI is used.
Who is this for?
-
Anyone seeking a preloaded and pre-configured Wikidata instance for personal or service-specific use
-
Anyone seeking SQL and GraphQL access to Wikidata
-
Anyone seeking ODBC or JDBC access to Wikidata via client productivity tools and development environments that support those data access protocols
Usage
Prerequisites
- An Amazon Web Services (AWS) account.
- Recently created AWS accounts will have been automatically signed up for the Amazon S3 and EC2 Web Service. If you created your AWS account a long time ago, you may now need to manually sign up for these services.
- Ensure an AWS
security group
allowing access to ports 22 (standard SSH), 80 (standard HTTP), and 8890 (Virtuoso HTTP-based Admin) is used. (This is the setup of the AMI offerings.)
Instantiating Wikidata Snapshot (Virtuoso PAGO) via Web Interface
- Locate the Wikidata Snapshot (Virtuoso PAGO) image in AWS Marketplace and click the Continue to Subscribe button.
- Click on the Accept Terms button.
- Click on the Continue to Configuration button.
- Select the Region to where the AMI should be deployed and click on the Continue to Launchbutton.
- Select the EC2 Instance Type, Security Group Settings, Key Pair Settings the AMI should be started with and click on the Launch button.
- The deployment is now complete. Click on the EC2 Console link to view the launched instance in the AWS EC2 console
- From the EC2 Console note the Public IP address of the instance for accessing it via ssh, http etc
First-time Setup & Usage Notes
These steps in this section are only necessary the first time you start the Virtuoso instances on the AMI. This section may be ignored thereafter, as it is not necessary after AMI reboots.
There are two Virtuoso instances in this AMI. One which comes up quickly, with no significant content, so you know the AMI is basically functional; and one which comes up more slowly, with the full Wikidata dataset, which takes significant time to start due to some Amazon requirements for such AWS instances.
Wikidata Instance
-
ssh
into your instantiated AMI using a command of the form —
ssh -i {secure-pem-file} ubuntu@{amazon-ec2-dns-name-or-ip-address}
- The Virtuoso DBMS Server for the Basic Instance will have started with the AMI. You can verify this with –
ps -ef | grep "virt*" | grep -v grep
- If you do not see a running instance, execute the following command, and then repeat the command above.
sudo service virtuoso status
sudo service virtuoso start
sudo service virtuoso status
- We strongly recommend you now use the Conductor to change the password for the
dba
user from the the AMIinstance-id
.
- Retrieve the AMI
instance-id
by either –- checking the AMI properties presented by the Amazon AWS console UI –
* executing the following command in the Linux shell –
curl http://169.254.169.254/latest/meta-data/instance-id
- Load the Conductor interface.
http://{amazon-ec2-ami-dns-name-or-ip-address}/conductor
- If you get any error at this point, execute the following commands, and then re-try loading the Conductor in your web browser.
sudo service virtuoso start
sudo service virtuoso status
- At the authentication challenge, log in as the
dba
user, with the AMIinstance-id
as the password. Note: If unable to connect to the Virtuoso server using theinstance-id
as password, please register with our Support Site, and create a Support Case for fastest assistance. - Drill down to System Admin → User Accounts .
- Locate the
dba
user, and click the associatedEdit
link. - The form allows many things to be changed. For now, just input your desired password into both Password and Confirm Password boxes, and click the Save button.
- You can now perform other administrative tasks through the Conductor interface, or return to basic use.
Wikidata Snapshot (Virtuoso PAGO) Database Interaction via Web Interface
Once online, your Wikidata Snapshot instance will be ready for use from —
- Faceted Browsing Endpoint
http://{amazon-ec2-ami-dns-name-or-ip-address}/fct
- Advanced Faceted Browsing Page
http://{amazon-ec2-ami-dns-name-or-ip-address}/describe/?uri=http://dbpedia.org/resource/DBpedia
- SPARQL Query Service Endpoint
http://{amazon-ec2-ami-dns-name-or-ip-address}/sparql
- Virtuoso Instance Administration Page (Virtuoso Conductor)
http://{amazon-ec2-ami-dns-name-or-ip-address}/conductor
Administering the Virtuoso Instance via SSH
- Make a
ssh
connection to the VM using the public key (pem-file
) and username (ubuntu
by default) chosen when creating the deployment, and thePublic IP address
from the previous section as follows:
ssh -i {pem-file} ubuntu@{Public IP address}
- Once connected it is strongly recommended to update the VM to get the latest operating system and Virtuoso updates with the command:
sudo apt-get upgrade
- Check the Virtuoso server is automatically started post deployment with the command:
sudo service virtuoso status
- The following commands can be used to Administer the Virtuoso server:
- Start the Virtuoso Server:
sudo service virtuoso start
- Stop the Virtuoso Server:
sudo service virtuoso stop
- Restart the Virtuoso Server:
sudo service virtuoso restart
- Check status of Virtuoso Server:
sudo service virtuoso status
- Determine the random password set for the
dba
user with the command:
sudo cat /opt/virtuoso/database/.initial-password
- A
SQL
connection can then be made Virtuoso with theisql
command line tool with the command on port1111
:
isql 1111
- Typical output for running these steps are:
$ ssh -i certificates/virtuoso.pem ubuntu@54.221.25.206
The authenticity of host '54.221.25.206 (54.221.25.206)' can't be established.
ECDSA key fingerprint is SHA256:QGsOFcQoa4x5DBavtdHWDQUUQtBdHJ/OkizKep8UOcM.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '54.221.25.206' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1025-aws x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Fri Jan 29 12:41:03 UTC 2021
System load: 0.0 Processes: 104
Usage of /: 2.0% of 116.27GB Users logged in: 0
Memory usage: 4% IP address for eth0: 10.0.0.214
Swap usage: 0%
* Canonical Livepatch is available for installation.
- Reduce system reboots and improve kernel security. Activate at:
https://ubuntu.com/livepatch
9 packages can be updated.
0 updates are security updates.
Last login: Tue Sep 22 19:26:19 2020 from 108.26.205.225
ubuntu@ip-10-0-0-214:~$ cd /opt/virtuoso/database
ubuntu@ip-10-0-0-214:/opt/virtuoso/database$ sudo bash
root@ip-10-0-0-214:/opt/virtuoso/database# cat .initial-password
i-0343ad51fe5e4f196
root@ip-10-0-0-214:/opt/virtuoso/database# service virtuoso status
● virtuoso.service - OpenLink Virtuoso Database
Loaded: loaded (/lib/systemd/system/virtuoso.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-01-29 12:04:31 UTC; 38min ago
Process: 878 ExecStart=/opt/virtuoso/bin/virtuoso-start.sh $VIRTUOSO_DB_NAMES (code=exited, status=0/SUC
Main PID: 1170 (virtuoso)
Tasks: 15 (limit: 4915)
CGroup: /system.slice/virtuoso.service
└─1170 ./virtuoso
Jan 29 12:04:25 ip-10-0-0-214 systemd[1]: Starting OpenLink Virtuoso Database...
Jan 29 12:04:26 ip-10-0-0-214 virtuoso-start.sh[878]: Starting Virtuoso instance in [database]
Jan 29 12:04:26 ip-10-0-0-214 virtuoso-start.sh[878]: - Starting the database
Jan 29 12:04:31 ip-10-0-0-214 systemd[1]: Started OpenLink Virtuoso Database.
root@ip-10-0-0-214:/opt/virtuoso/database# /opt/virtuoso/bin/isql 1111
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 08.03.3323 as of Apr 19 2022
Type HELP; for help and EXIT; to exit.
Enter password for dba :
Connected to OpenLink Virtuoso
Driver: 08.03.3323 OpenLink Virtuoso ODBC Driver
SQL> status('');
REPORT
VARCHAR
_______________________________________________________________________________
OpenLink Virtuoso VDB Server
Version 08.03.3323-pthreads for Linux as of Apr 19 2022
Started on: 2021-01-29 12:45 GMT+0
CPU: 0.05% RSS: 148MB PF: 0
Database Status:
File size 67108864, 8192 pages, 5733 free.
20000 buffers, 1115 used, 85 dirty 0 wired down, repl age 0 0 w. io 0 w/crsr.
Disk Usage: 1074 reads avg 0 msec, 0% r 0% w last 23 s, 138 writes flush 0 MB/s,
34 read ahead, batch = 17. Autocompact 0 in 0 out, 0% saved.
Gate: 166 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = virtuoso.trx, 8325 bytes
VDB: 0 exec 0 fetch 0 transact 0 error
2309 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 1 connects, max 1 concurrent
RPC: 6 calls, 1 pending, 1 max until now, 0 queued, 0 burst reads (0%), 0 second 0M large, 10M max
Checkpoint Remap 38 pages, 0 mapped back. 0 s atomic time.
DB master 8192 total 5733 free 38 remap 1 mapped back
temp 256 total 251 free
Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
Currently 1 threads running 0 threads waiting 0 threads in vdb.
24 Rows. -- 2 msec.
SQL>
Performance Notes
There are a range of AWS VM instance types with different system memory and CPU combinations. Collectively, the factors above affect the performance of your Virtuoso instance. Thus, use AWS VM Instance Type
s with more memory and CPU cores for best performance.
Note: This VM is configured to use minimal system memory. For the instance type chosen, the NumberOfBuffer
and MaxDirtyBuffers
parameters in the /opt/virtuoso/database/virtuoso.ini
configuration file should be increased to match the available memory, as detailed in the Virtuoso Performance Tuning Guide , for example —
VM Instance Type | System RAM | NumberOfBuffers |
MaxDirtyBuffers |
---|---|---|---|
m5.large |
8 GB | 680000 |
500000 |
m5.xlarge |
16 GB | 1360000 |
1000000 |
m5.2xlarge |
32 GB | 2720000 |
2000000 |
m5.4xlarge |
64 GB | 5450000 |
4000000 |
r6i.4xlarge |
128 GB | 10900000 |
8000000 |
– and restart the Virtuoso server as detailed above.
Extrapolate the NumberOfBuffers
and MaxDirtyBuffers
parameters accordingly for different sized VMs.
Troubleshooting
If the Virtuoso server fails to start:
- Run the command
sudo service virtuoso status
to see if the Virtuoso server is running - Check the
/opt/virtuoso/database/virtuoso.log
file to see why the server might have failed to start - Ensure the file
/opt/virtuoso/database/virtuoso.lck
does not exist before starting the server - Attempt to start the Virtuoso server with the command
sudo service virtuoso start
- Run the command
sudo service virtuoso status
again to see if the Virtuoso server is running - If it is now running, attempt a connect via the
SQL
orHTTP
interfaces as detailed above
Related Items
- DBpedia Snapshot (Virtuoso PAGO) EBS-backed EC2 AMI
- DBpedia Snapshot (Virtuoso PAGO) for Microsoft Azure Cloud
- Additional Amazon AWS related documentation
- Virtuoso Pay As You Go (PAGO) EBS-backed EC2 AMI
- Instance-backed Virtuoso EC2 AMI
- Creating Your Own Neurocommons Instance
- Creating Your Own Bio2RDF Instance
- Creating Your Own MusicBrainz Instance
- Backup your Virtuoso EC2 AMI to S3
- Configure your Virtuoso EC2 AMI for use with Amazon Elastic Block Storage (EBS)
- Amazon-provided AWS Simple Monthly Cost Calculator
- Protecting your Virtuoso-hosted SPARQL Endpoint
- Virtuoso documentation
- Virtuoso Tips and Tricks