DBpedia Live (Virtuoso PAGO) EBS-backed EC2 AMI

DBpedia Live (Virtuoso PAGO) EBS-backed EC2 AMI

Introduction

In addition to the Instance-backed EC2 AMI that has been available since 2008, a standard unpopulated Virtuoso instance is available as an EBS-backed EC2 AMI based on either a BYOL (Bring Your Own License) or a PAGO (Pay As You Go) basis. In each case, the AMI delivers a preconfigured Virtuoso instance.

We also now offer two PAGO variants, each pre-loaded with a DBpedia dataset.

  • The DBpedia Live (Virtuoso PAGO) (documented on this page) starts as a static instance, preloaded with the DBpedia 2016-04 dataset, and includes an optional switch that enables data updates based on the Wikipedia firehose, effectively giving you a mirror of the public DBpedia-Live instance found at http://live.dbpedia.org/sparql.

  • The DBpedia Snapshot (Virtuoso PAGO) (documented on another page) starts as a static instance, preloaded with the DBpedia 2016-10 dataset, mirroring the public DBpedia instance found at http://dbpedia.org/sparql. You can make changes to this data, but it will not track changes made to Wikipedia nor DBpedia-Live.

This type of AMI provides several fundamental benefits including —

  • Virtuoso DBMS Server is preinstalled with basic tuning for the host operating system. (That said, since we support many AMI machine types/sizes, you should still tune the configuration to suit the available RAM in your instance.)
  • DBpedia Dataset is preloaded and preconfigured (and may be configurable to auto-update).
  • You can start and stop the DBpedia instance without having to terminate its host AMI.
  • With the hourly model, you pay only for the time the AMI is used.

Prerequisites

  • An Amazon Web Services (AWS) account.
  • Recently created AWS accounts will have been automatically signed up for the Amazon S3 and EC2 Web Service. If you created your AWS account a long time ago, you may now need to manually sign up for these services.
  • Ensure an AWS security group allowing access to ports 22 (standard SSH), 80 (standard HTTP), and 8890 (Virtuoso HTTP-based Admin) is used. (This is the setup of the AMI offerings.)

Instantiating DBpedia Live (Virtuoso PAGO) via Web Interface

  1. Locate the DBpedia Live (Virtuoso PAGO) image in AWS Marketplace and click the Continue to Subscribe button.

AWS Marketplace DBpedia Live (Virtuoso PAGO)

  1. Choose a suitable size EC2 Instance Type and License Quantity , then click on the button Continue to Configuration button. An EC2 Instance Type with a minimum of 16GB RAM is recommended, m5.xlarge for example.

AWS Marketplace DBpedia Live (Virtuoso PAGO) Launch on EC2

  1. Click on the Continue to Launch button.

AWS Marketplace DBpedia Live (Virtuoso PAGO) now Deployed

  1. Review the configuration settings and once satisfied click the Launch button.

AWS Marketplace DBpedia Live (Virtuoso PAGO) now Deployed
AWS Marketplace DBpedia Live (Virtuoso PAGO) now Deployed
AWS Marketplace DBpedia Live (Virtuoso PAGO) now Deployed

  1. Check the in the AWS Console EC2 images Web Interface that the image has been successfully instantiated.

AWS EC2 Launched Image

First-time Setup & Usage Notes

These steps in this section are only necessary when you start the DBpedia DB for the first time, immediately after instantiating the AMI.

This section may be ignored thereafter, as it is not necessary after AMI reboots.

  1. ssh into your instantiated AMI using:
ssh -i {secure-pem-file} ec2-user@{ec2-dns-name-or-ip-address}
  1. Start the Virtuoso DBMS Server against the DBpedia Database by issuing the following command. *Note: At initial launch, it takes the Virtuoso DBMS Server a few minutes to bring the DBpedia database online, due to its size.
sudo service virtuoso restart
  1. We strongly recommend you now use the Conductor to change the password for the ’ dba ’ user from the the AMI instance-id .

  2. Retrieve the AMI instance-id by either –

  • checking the AMI properties presented by the Amazon AWS console UI –

AWS Console

  • executing the following command in the Linux shell –
curl http://169.254.169.254/latest/meta-data/instance-id
  1. Load the Conductor interface
http://{amazon-ec2-ami-dns-name-or-ip-address}/conductor
  1. At the authentication challenge, log in as the dba user, with the AMI instance-id as the password. Note: If unable to connect to the Virtuoso server using the instance-id as password, please create a Support Case for fastest assistance.
  2. Drill down to System AdminUser Accounts .
  3. Locate the dba user, and click the associated Edit link.
  4. The form allows many things to be changed. For now, just input your desired password into both Password and Confirm Password boxes, and click the Save button.
  5. You can now perform other administrative tasks through the Conductor interface, or return to basic DBpedia use.

DBpedia Live (Virtuoso PAGO) Database Interaction via Web Interface

Once online, your DBpedia Live instance will be ready for use from —

  • Basic Linked Data Exploration Page — an obvious starting point
http://{amazon-ec2-ami-dns-name-or-ip-address}/resource/DBpedia
  • Faceted Browsing Endpoint
http://{amazon-ec2-ami-dns-name-or-ip-address}/fct
  • Advanced Faceted Browsing Page
http://{amazon-ec2-ami-dns-name-or-ip-address}/describe/?uri=http://dbpedia.org/resource/DBpedia
  • SPARQL Query Service Endpoint
http://{amazon-ec2-ami-dns-name-or-ip-address}/sparql
  • Virtuoso Instance Administration Page (Virtuoso Conductor)
http://{amazon-ec2-ami-dns-name-or-ip-address}/conductor

Administering the Virtuoso Instance via SSH

All scripts for starting and stopping the Virtuoso instance are found in the following locations —

  • System V init scripts are available enabling the automatic database server instantiation at operating system (AMI) boot or reboot time or manual control from a command terminal with the service — command as detailed below.

  • /opt/virtuoso — scripts for starting and stopping the database server within a running operating system (AMI)

License Manager

The OpenLink License Manager must be launched before you launch the Virtuoso instance, and must remain running at all times for Virtuoso to run.

  • Start the License Manager
sudo service oplmgr start
  • Stop the License Manager
sudo service oplmgr stop
  • Restart the License Manager
sudo service oplmgr restart

Virtuoso Server

  • Start the Virtuoso Server
sudo service virtuoso start
  • Stop the Virtuoso Server
sudo service virtuoso stop
  • Restart the Virtuoso Server
sudo service virtuoso restart

Command-line Interaction with the Virtuoso Database Instance

  1. Set the Virtuoso environment variables by running the command below. Note: This does and must start with dot-space-slash.
. /opt/virtuoso/virtuoso-environment.sh
  1. Run the Virtuoso " isql " command line tool to connect to the database. Note: your EC2 AMI’s instance-id will be the dba user’s password, until you change it (as recommended above).
$ isql 1111 -U dba -P {Password}
Connected to OpenLink Virtuoso
Driver: 07.10.3214 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL>
  1. Run the " tables " command to obtain a list of tables in the default schema
SQL> tables;
Showing SQLTables of tables like 'NULL.NULL.NULL', tabletype/colname like 'NULL'
TABLE_QUALIFIER  TABLE_OWNER      TABLE_NAME       TABLE_TYPE       REMARKS
VARCHAR          VARCHAR          VARCHAR          VARCHAR          VARCHAR
_______________________________________________________________________________

DB               DBA              ADMIN_SESSION    SYSTEM TABLE     NULL
DB               DBA              ADM_OPT_ARRAY_TO_RS_PVIEW  SYSTEM TABLE     NULL
DB               DBA              ADM_XML_VIEWS    SYSTEM TABLE     NULL
.
.
.
DB               DBA              SYS_SQL_INVERSE  SYSTEM TABLE     NULL
DB               DBA              SYS_TRIGGERS     SYSTEM TABLE     NULL
DB               DBA              SYS_VIEWS        SYSTEM TABLE     NULL

209 Rows. -- 1890 msec.
SQL>
  1. You can stop the Virtuoso Database Server by running —
virtuoso-stop.sh
  1. You can restart the Virtuoso Database Server by running —
virtuoso-start.sh

Enabling DBpedia Live Updates

The provided DBpedia Integrator utility program ( dbpintegrator ) downloads change-sets from the DBpedia live website, and processes them into the local Virtuoso instance on this AMI to keep your DBpedia datasets updated following changes to the Wikipedia.

To enable the DBpedia Live updates —

  1. Go to the /opt/virtuoso/dbpintegrator directory.
-bash-4.2$ cd /opt/virtuoso/dbpintegrator
  1. Edit the file dbpedia_updates_downloader.ini and set the Store.pw param to the dba users password which by default will be set to the AMI instance-id , unless it has already been changed. Thus, if you have changed that password as recommended, you will need to update the script with the same password.

  2. Run the command sudo sh update_ontology.sh once to check the setup and attempt to update the database with the latest ontology fixes.

-bash-4.2$ sudo sh update_ontology.sh 
-bash-4.2$ 

Note: The first time these change-sets are applied to your instance, it may take several hours or even days, depending on server resources and bandwidth, for all the change-sets to be loaded, and so for the DBpedia instance to be brought up to date and subsequently to obtain realtime updates from Wikipedia. You can monitor the Latest changes and Top 20 Most Recently Updated Entities sections of the live update web page ( http://{amazon-ec2-ami-dns-name-or-ip-address}/live ) to see the current state of the live update process.
4. Run the command sudo sh update_changesets.sh to start loading the available change-sets.

Note 1: The update_changesets.sh script is written to use the default dba password that is derived from the AMI instance-id . Thus, if you have changed that password as recommended, you will need to update the script with the same password.

-bash-4.2$ sudo sh update_changesets.sh
 nohup: appending output to ?nohup.out?
-bash-4.2$

Note 2: The first time these change-sets are applied to your instance, it may take several hours or even days, depending on server resources and bandwidth, for all the change-sets to be loaded, and so for the DBpedia instance to be brought up to date and subsequently to obtain realtime updates from Wikipedia. You can monitor the Latest changes and Top 20 Most Recently Updated Entities sections of the live update web page ( http://{amazon-ec2-ami-dns-name-or-ip-address}/live ) to see the current state of the live update process.
5. A web page for viewing the live updates to the AMI instance is available at http://{amazon-ec2-ami-dns-name-or-ip-address}/live where the updates can be viewed as they occur.

AWS EC2 Launched Image

If anything goes wrong it will logged in the associate log file dbpedia_dbms_errors.log otherwise update progress is written to dbp.log .

Setting up cron job

The Linux cron utility can be used to automatically (re)start the scripts by adding a few lines to the cron setup for the root user.

  1. Start the cron editor (based on vi ) with —
# crontab -e
  1. Navigate to the bottom of the file with the single keystroke, capital- G .

  2. Use the single keystroke, lowercase- O , to start a new line at the bottom, and add the following two lines (you can just copy-and-paste):

@hourly     /dbpedia/dbpintegrator/update_changesets.sh
@daily      /dbpedia/dbpintegrator/update_ontology.sh
  1. Save the edited file with the single keystroke, ESC , followed by the four-character string below, and ENTER :
:wq!

Performance Notes

Please be aware of the following, which impact the performance and utility of your AMI:

  • This AMI includes a bundled Virtuoso license which enables 10 Database Sessions and the use of 4 logical processors. Licenses that upgrade these attributes are available as paid upgrade options.

  • Virtuoso always takes full advantage of the memory it’s configured to use. This may be much less than is found in its host environment/AMI. This AMI is pre-configured for an m5.xlarge EC2 Instance Type, so will use 16GB or RAM. If you choose a larger EC2 Instance Type, then the NumberOfBuffer and MaxDirtyBuffers parameters in the /opt/virtuoso/database/virtuoso.ini configuration file should be increased to correspond to the chosen Instance Type’s available memory, as detailed in the Virtuoso Performance Tuning Guide. A few examples are shown below. After changing these or any other settings in the INI file, restart the Virtuoso server as described above.

EC2 Instance Type System RAM NumberOfBuffers MaxDirtyBuffers
m5.xlarge 16 GB 1360000 1000000
m5.2xlarge 32 GB 2720000 2000000
m5.4xlarge 64 GB 5440000 4000000
m5.8xlarge 128 GB 10880000 8000000
  • There are a wide range of AMI choices, offering various combinations of system memory and logical processors. To improve performance, use an EC2 Instance Type with more memory and more logical processors. To make use of additional processors, you will need to also acquire an upgraded Virtuoso license.

Troubleshooting

If you encounter any problems resolving the sample DBpedia URIs listed in the steps above, please:

  1. Determine whether Virtuoso is running, with this command
ps -ef | grep "virt*" | grep -v grep
  1. Check the log of Virtuoso’s most recent activity, with this command
tail /opt/virtuoso/database/database.log

The output of those commands will show you whether the initial Virtuoso DBpedia DB setup (which can take a while due to DB size) is still in progress, the setup encountered some error, or the setup has completed but Virtuoso awaits one of the following commands:

  • Startup command
sudo service virtuoso start
  • Restart command
sudo service virtuoso restart

Related Items