Deploy a ProActive cluster using Docker
The goal of this blog post is to have an overview of how easy it is to deploy a ProActive cluster and what are the benefits you can gain from it.
Docker containers are a great way to deploy and re-deploy a pre-configured ProActive cluster quickly.
As you can see on the figure above, a ProActive cluster is composed of three different components:
- the ProActive Scheduler Server
- the database
- the ProActive Nodes
So, all these components will be set up in Docker containers thanks to a Containers Orchestrator.
First of all, we suppose that you have several hosts for Docker containers, and that you use a orchestrator for your Docker containers. For instance, it could be Swarm, Mesos, Kubernetes. So you have a way to abstract network in your cluster thanks to Docker network overlay if you use Swarm (1), any Mesos network (2) or Kubernetes network (3).
The protocol that will be used for communication between ProActive Nodes and the ProActive Scheduler Server will be by default PNP but there are several other protocols available, PNPS, PAMR (ProActive Message Routing)
Once you are sure that you have a network that allow your containers to ping each other across hosts, you can, at first, run your Database container. Obviously, you can save data thanks to Docker Volumes and configure users for the Scheduler Server at the Runtime.
The second step is to launch the Scheduler Server Container and link it to the Database Container. If you access the Web UI on the Docker Host, thanks to the port redirection, you can notice that there is no Node running in the Resource Manager portal. This is the normal behaviour, indeed, our goal is to have Nodes running in others containers.
Finally, the last step is the deployments of the Nodes. You just have to configure them to connect to the Resource Manager and chose how many workers you want per Node. You can launch as many Nodes as you want, on as many hosts as you want.
Obviously, you can also keep data and logs for Nodes and Scheduler Server, in Docker Volumes.
Once everything is running, you can write some Workflows, execute them, look on which nodes these are executed.
And here you are, now, you have an entire cluster ready to execute some jobs and enjoy all the benefits of our generic Scheduler which allow you to run Spark jobs, MapReduce jobs, ETL processes…