Automated Virtuoso Cluster Deployment

I was wondering what is the go-to way for automated deployment of Virtuoso clusters with easy rescaling and monitoring.

This post suggests using Amazon EKS:

This would be a BYOL cluster deployment. Could the Virtuoso Elastic Cluster be deployed using EKS (or equivalent)? Would it be able to leverage the scaling facilities of Kubernetes?

Looking at the AWS offering (AWS Marketplace: Virtuoso 8.3.3329 (PAGO Edition) for RedHat Linux 7.x) an another possibility might be to use AWS Cloudformation with instances defined in an AutoScalingGroup to deploy and scale clusters using the martkeplace AMI. This would lead to a PAGO deployment. Is that a feasible solution?

Am I missing something here?

The Virtuoso Elastic Cluster is only supported in the 7.x product release, and not in the 8.x release that is currently available from the AWS and Azure Marketplaces.

By autoscaling in kubernetes (or equivalents like Cloudformation etc) are you referring to defining replica sets/pods, which can be dynamically increased or decreased and a minimum threshold set for the number of replicas to be kept online at any given time ? As if so this will not work with Virtuoso as a transactional database, as replica sets share the same files including database files, which the Virtuoso engine does not allow, thus the replicas would fail to start. Replica sets a more suitable for front end applications or read only database where no updates are performed.

YAML , HELM, or other such scripts can be created to setup and administer a Virtuoso 7 Elastic Cluster, and docker or other images created for deployment in kubernetes or other platforms, but these would need to be created for a specific cluster size, configuration and amended accordingly if the cluster size is to change.

What is your intended use case for a Virtuoso Elastic Cluster as we maybe able to provide advise/guidance on a more suitable Virtuoso configuration ?

The standard use case would be: you have a running cluster deployed in an automated/reproducible way, you monitor its load and resource usage and based whether it’s over- or under-utilised you can scale it up or down.

Is that achievable with Elastic Cluster? What would be the recommended way to automate this process?

What about 8.x then? Is there no way to scale it horizontally?

You seem to be seeking horizontal cluster for high availability and load balancing, scaling the number of nodes up or down depending on query work load ?

The Virtuoso Elastic Cluster is a horizontal scale out cluster to make use of additional memory mainly and CPU resources for hosting large volume of data in a single clustered database across multiple Virtuoso instances. Whilst this can provide load balancing as you can query across each node of the cluster it does not provide high availability as if 1 node goes down the entire cluster then goes down.

A Virtuoso replication cluster seems more like what you are seeking which is available for RDF Graph Replication and can be used for providing high availability and load balancing, which can be scaled up and down in size as demand dictates, adding or removing Virtuoso instances to the front end proxy.

See the Virtuoso Clustering Deployment Architecture Diagrams which outlines the architectures of the 2 different cluster types.

We do not provide scripts for setup up Virtuoso clusters in kubernetes and other cloud platforms, but customers have created there own such scripts for their specific used cases in YAML, HELM and other such scripting languages for micro service type deployments.

@hwilliams I was wondering if you knew where I could find such a Helm chart? Turning out single replica Virtuoso deployment in EKS into a replication cluster seems like a perfect solution. Currently we are dedicating an i3.2xlarge node with 64gb of memory to Virtuoso but would really like to be able to give it more power, as the data is about to hundreds of times more in magnitude. Thanks!

We do not have sample HELM Charts, as we have not looked at or used such in-house. Although as indicated previously customers who have HELM expertise have created the own HELM charts for their Virtuoso deployments. We have documentation on Virtuoso Docker Swarm service creations, showing how Virtuoso deployments can be made with docker swarm, which can be replicated in Helm Charts by someone with such expertise as others have done.

OK thanks. We do currently manage our deployment via Helm and Terraform on EKS, using one replica as indicated in the docker swarm article you sent. I suppose my question was really is it possible to use more than one replica for the Virtuoso pod? What the vendor and I have concluded is that due to locking the db file we do not think that this would be possible.

Your conclusion is correct, you cannot have multiple Virtuoso replicas in a POD sharing the same database file for a read/write database where locking would prevent locking from multiple Virtuoso instances seeking to read/write to the same database file(s). Each Virtuoso instance would have to be part of separate PODS.