Activeeon Team Blog: October 2016

Monday, October 31, 2016

Leverage Hybrid Infrastructure with SPOT instances or Preemptible VMs

Cloud computing allows companies to accelerate their business processes, optimize infrastructure costs and scale more quickly. However, integrating these new services with an existing infrastructure could be complex and not fully leverage this new opportunity. This article focuses on unstable instances like SPOT (AWS) instances or Preemptible VMs (GCP) which offers cheaper computational power.

What is a SPOT instance and a Preemptible VM?

AWS offers a service called Spot instances. It allows customers to bid on unused EC2 resources in any availability zone. GCP offers a similar service called Preemptible VMs. Customers can then use compute capacity with no upfront commitment at an hourly rate lower than the on-demand rate. The main drawback is that GCP and AWS can withdraw this instance at any time with little upfront warning depending on the market price of the resource and the bidding price.

Workloads Requirements

As explained in those two descriptions, instances can be withdrawn from customers at any time. The workloads leveraging the computing capacity require an advanced error management tool to support uncertainty on the lifecycle, otherwise any previous work will be lost and defeat the overall purpose. These workloads also require to be split in smaller tasks to ensure computation can be perform in a short period of time. This will ensure any work done to be saved for other dependent tasks.

When to leverage those?

One the common use cases of this services is when time is less a constraint than cost. For instance, a workload running at night could take as much time as required as long as it is completed at 8am.

Another common use case is when demand spikes in the current infrastructure occur which generate a large queue. This could often be seen in R&D environment using a single HPC or a limited infrastructure. In that case, time constraints have to be balanced with price. Spot instances offer the ability to unload the queue with cheaper than on-demand price instances.

Some business applications require a stable environment to perform efficiently. However, this stable resource might be taken when needed. Leveraging these unstable resources will enable to free up future stable resources beforehand.

Many other use cases can be found on GCP and AWS websites or on the Netflix blog (e.g. Netflix Blog)

What offers ProActive in these situations?

Reliable Execution in each Environment with Docker and ProActive

Using Docker containers allows the code to be independent from the underlying environment. Being heavily multi-platforms, ProActive offers different way to manage dependencies including the use of containers, detailed in this article.

How to start a container in ProActive

In ProActive, launching a task within a container is a simple 3 steps:

go to the “Fork Environment” tab,
select on the Java Home field the path to the Java installation directory on the node,
finally in the environment script field, fill the “preJavaHomeCmd” variable with the docker command that will start the appropriate container. For instance, preJavaHomeCmd = ‘docker run -p 80:8080 -rm java’.

After that, you’re done.

Let’s start with an example

A picture paints a thousand words. Below is an example based on a previous article that will be containerized. It is possible to follow those instructions in try.activeeon.com or after installing ProActive on your server.

Error Management Best Practices in Production?

In various situations, advanced error management tools save time and money. For instance, in production where problems must be solved without human intervention to ensure high-availability, unstable environment like SPOT instances where errors are commonplace and disrupting, as well as resources-intensive workflows which mustn't be run twice due to their cost.
To manage instabilities, ProActive offers many functionalities helping you with errors.

Functionalities and Support

The Advanced Error Management feature provides multiple options in case of failure. First, the number of attempts for any given task can be selected with the ability to choose whether to run subsequent iterations on a different node. Then,

Links