Automated Virtuoso Cluster Deployment

I was wondering what is the go-to way for automated deployment of Virtuoso clusters with easy rescaling and monitoring.

This post suggests using Amazon EKS:

This would be a BYOL cluster deployment. Could the Virtuoso Elastic Cluster be deployed using EKS (or equivalent)? Would it be able to leverage the scaling facilities of Kubernetes?

Looking at the AWS offering (AWS Marketplace: Virtuoso 8.3.3319 (PAGO Edition) for RedHat Linux 7.x) an another possibility might be to use AWS Cloudformation with instances defined in an AutoScalingGroup to deploy and scale clusters using the martkeplace AMI. This would lead to a PAGO deployment. Is that a feasible solution?

Am I missing something here?

The Virtuoso Elastic Cluster is only supported in the 7.x product release, and not in the 8.x release that is currently available from the AWS and Azure Marketplaces.

By autoscaling in kubernetes (or equivalents like Cloudformation etc) are you referring to defining replica sets/pods, which can be dynamically increased or decreased and a minimum threshold set for the number of replicas to be kept online at any given time ? As if so this will not work with Virtuoso as a transactional database, as replica sets share the same files including database files, which the Virtuoso engine does not allow, thus the replicas would fail to start. Replica sets a more suitable for front end applications or read only database where no updates are performed.

YAML , HELM, or other such scripts can be created to setup and administer a Virtuoso 7 Elastic Cluster, and docker or other images created for deployment in kubernetes or other platforms, but these would need to be created for a specific cluster size, configuration and amended accordingly if the cluster size is to change.

What is your intended use case for a Virtuoso Elastic Cluster as we maybe able to provide advise/guidance on a more suitable Virtuoso configuration ?

The standard use case would be: you have a running cluster deployed in an automated/reproducible way, you monitor its load and resource usage and based whether it’s over- or under-utilised you can scale it up or down.

Is that achievable with Elastic Cluster? What would be the recommended way to automate this process?

What about 8.x then? Is there no way to scale it horizontally?

You seem to be seeking horizontal cluster for high availability and load balancing, scaling the number of nodes up or down depending on query work load ?

The Virtuoso Elastic Cluster is a horizontal scale out cluster to make use of additional memory mainly and CPU resources for hosting large volume of data in a single clustered database across multiple Virtuoso instances. Whilst this can provide load balancing as you can query across each node of the cluster it does not provide high availability as if 1 node goes down the entire cluster then goes down.

A Virtuoso replication cluster seems more like what you are seeking which is available for RDF Graph Replication and can be used for providing high availability and load balancing, which can be scaled up and down in size as demand dictates, adding or removing Virtuoso instances to the front end proxy.

See the Virtuoso Clustering Deployment Architecture Diagrams which outlines the architectures of the 2 different cluster types.

We do not provide scripts for setup up Virtuoso clusters in kubernetes and other cloud platforms, but customers have created there own such scripts for their specific used cases in YAML, HELM and other such scripting languages for micro service type deployments.