Cloud computing allows companies to accelerate their business processes, optimize infrastructure costs and scale more quickly. However, integrating these new services with an existing infrastructure could be complex and not fully leverage this new opportunity. This article focuses on unstable instances like SPOT (AWS) instances or Preemptible VMs (GCP) which offers cheaper computational power.
What is a SPOT instance and a Preemptible VM?
AWS offers a service called Spot instances. It allows customers to bid on unused EC2 resources in any availability zone. GCP offers a similar service called Preemptible VMs. Customers can then use compute capacity with no upfront commitment at an hourly rate lower than the on-demand rate. The main drawback is that GCP and AWS can withdraw this instance at any time with little upfront warning depending on the market price of the resource and the bidding price.
Workloads Requirements
As explained in those two descriptions, instances can be withdrawn from customers at any time. The workloads leveraging the computing capacity require an advanced error management tool to support uncertainty on the lifecycle, otherwise any previous work will be lost and defeat the overall purpose. These workloads also require to be split in smaller tasks to ensure computation can be perform in a short period of time. This will ensure any work done to be saved for other dependent tasks.
When to leverage those?
One the common use cases of this services is when time is less a constraint than cost. For instance, a workload running at night could take as much time as required as long as it is completed at 8am.
Another common use case is when demand spikes in the current infrastructure occur which generate a large queue. This could often be seen in R&D environment using a single HPC or a limited infrastructure. In that case, time constraints have to be balanced with price. Spot instances offer the ability to unload the queue with cheaper than on-demand price instances.
Some business applications require a stable environment to perform efficiently. However, this stable resource might be taken when needed. Leveraging these unstable resources will enable to free up future stable resources beforehand.
Many other use cases can be found on GCP and AWS websites or on the Netflix blog (e.g. Netflix Blog)