Monday, August 29, 2016

Deploy a ProActive cluster using Docker

Deploy a ProActive cluster using Docker

The goal of this blog post is to have an overview of how easy it is to deploy a ProActive cluster and what are the benefits you can gain from it.

Docker containers are  a great way to deploy and re-deploy a pre-configured ProActive cluster quickly.

As you can see on the figure above, a ProActive cluster is composed of three different components:
  • the ProActive Scheduler Server
  • the database
  • the ProActive Nodes
So, all these components will be set up in Docker containers thanks to a Containers Orchestrator.

First of all, we suppose that you have several hosts for Docker containers, and that you use a orchestrator for your Docker containers.  For instance, it could be Swarm, Mesos, Kubernetes. So you have a way to abstract network in your cluster thanks to Docker network overlay if you use Swarm (1), any Mesos network (2) or Kubernetes network (3).
The protocol that will be used for communication between ProActive Nodes and the ProActive Scheduler Server will be by default PNP but there are several other protocols available, PNPS, PAMR (ProActive Message Routing)

Once you are sure that you have a network that allow your containers to ping each other across hosts, you can, at first, run your Database container. Obviously, you can save data thanks to Docker Volumes and configure users for the Scheduler Server at the Runtime.

The second step is to launch the Scheduler Server Container and link it to the Database Container. If you access the Web UI on the Docker Host, thanks to the port redirection, you can notice that there is no Node running in the Resource Manager portal. This is the normal behaviour, indeed, our goal is to have Nodes running in others containers.

Finally, the last step is the deployments of the Nodes. You just have to configure them to connect to the Resource Manager and chose how many workers you want per Node. You can launch as many Nodes as you want, on as many hosts as you want.

Obviously, you can also keep data and logs for Nodes and Scheduler Server, in Docker Volumes.

Once everything is running, you can write some Workflows, execute them, look on which nodes these are executed.

And here you are, now, you have an entire cluster ready to execute some jobs and enjoy all the benefits of our generic Scheduler which allow you to run Spark jobs, MapReduce jobs, ETL processes…

Friday, August 26, 2016

Submitting ProActive Workflows with Linux cURL

Using the cURL in a linux command line (bash) is a very convenient and resource efficient way to submit workflows.
We need to login and store the current session id with the command:

sessionid=$(curl --data 'username=admin&password=admin' \ http://localhost:8080/rest/scheduler/login)

One can login with curl using username and password as header parameter, transmitted with -H. The result is written into the sessionid variable. The session id can be displayed with echo $sessionid.

Workflows can be submitted with cURL:

curl --header "sessionid:$sessionid" \
 --form \ "file=@filename.xml;type=application/xml" \

The session id variable is inserted into the header and the @ notation allows to send files directly to the server.

Advanced: Workflow Submission with Variables

Workflow variables can be send in the submission URL. Those variables will be replaced.

curl --header "sessionid:$sessionid" \
 --form "file=@file.xml;type=application/xml" \

Important: the URL is now embedded in double quotes "", only then the matrix parameters are properly transferred. Variables are separated by semicolon ;