Thursday, October 13, 2016

Error Management Best Practices in Production?

In various situations, advanced error management tools save time and money. For instance, in production where problems must be solved without human intervention to ensure high-availability, unstable environment like SPOT instances where errors are commonplace and disrupting, as well as resources-intensive workflows which mustn't be run twice due to their cost.
To manage instabilities, ProActive offers many functionalities helping you with errors.

Functionalities and Support

The Advanced Error Management feature provides multiple options in case of failure. First, the number of attempts for any given task can be selected with the ability to choose whether to run subsequent iterations on a different node. Then, different policies will control the job behavior which can be to cancel or continue the job after last attempt, or suspend the task or the job after the first failure. These parameters can be overridden at task level.
This feature is fully accessible on the enterprise version (>7.17) which is available on .
Workflow level
Default settings

The Selection Script offers the ability to select nodes with wanted parameters. For instance, it can prevent errors related to software or packages availability. It also insures relevant access rights and security are available on the execution node. Finally, it allows to dynamically execute tasks on nodes with selected resources available (CPU, RAM, GPU) even if other resources are intensively used. (Selection scripts can be found under “Node Selection” in the task menu).
Samples of such scripts are available under PROACTIVE_HOME/samples/scripts/selection.
The Console Output and Logs give details on the faulty task and errors. In some circumstances looking at the logs will be required to retrieve additional information. They are both available in the scheduler portal.
The logs are also available at PROACTIVE_HOME/logs.
The Pre-script allows you to prepare the node environment where the task will be executed. For instance, it can install packages through a bash script before a Python task. Post-script may be used to reverse these changes. However, a Clean-script is more adapted in such situation since it will be executed in case of errors in the pre-script, post-script or in the task. Even though the output is not available, information regarding the state of the environment (variable, packages, etc.) will be written in the logs.
Examples of pre/post/clean -scripts are available in the studio under the “Templates” menu.
The ProActive Agent can restart failing nodes and thus insure high availability of resources. More info at the administrator guide.
The ProActive support will help recover from more complex issues, in more complex cases. They will reproduce the errors and send you appropriate recommendations that will fit your environment and constraints.
All of those can help you create a workflow adapted to your error management situation. You should also try to follow those advices to insure high availability and best behaviours.

Best practices

When writing a workflow, error handling policy should be part of the design process. You should ponder:
  • how critical is this: can you pause it for hours?
  • how long does it takes: can you afford to execute it multiple times?
  • how stable is the environment: will connections crumble during the job leading to errors?
  • how skilled are your developers: how likely are errors?
  • how skilled is your maintenance team: what can they do in case of error?
  • how many workflows do you run per unit of time?
After answering these questions, select the error handling policy which fits the global needs and then update the parameters for individual tasks requiring different behaviour.
When using tasks with special requirements (OS, packages, etc.) make sure to use a selection script to target the right nodes.
When packages are installed, usually in the pre-script, do not forget to uninstall them in the clean-script of the same task. This will leave the node in their pre-task state. Alternately, Docker containers can be used to avoid those issues.

Thursday, September 29, 2016

How to accelerate your R algorithm with ProActive?

When analyzing sets of data, getting “instant” insight will often be critical for business decision. Which is why one may want to allocate all his/her resources on a set of tasks delivering analysis for stakeholders. Here is an example of a Twitter feed analysis using R (statistical oriented language) and ProActive. It gathers messages, then performs distributed analysis on distinct computing nodes.

How it works

In selected cities, this workflow will search Twitter feeds containing a chosen set of keywords and return a chosen number of tweets. (Due to restrictions from Twitter, the number of towns cannot be greater than 15: “Rate limits are divided into 15 minute intervals [...] 15 calls every 15 minutes.”,

ProActive first creates as many tasks as the number of cities then run them in parallel using ProActive nodes. In those R-based tasks, each node will:

  • connect to Twitter,
  • perform the desired search,
  • analyze the information,
  • store the result in the ProActive dataspace (a shared storage environment adapted to distributed systems).
A last task analyzes all the data before storing them in the above-mentioned dataspace.

Let's code

Monday, August 29, 2016

Deploy a ProActive cluster using Docker

Deploy a ProActive cluster using Docker

The goal of this blog post is to have an overview of how easy it is to deploy a ProActive cluster and what are the benefits you can gain from it.

Docker containers are  a great way to deploy and re-deploy a pre-configured ProActive cluster quickly.

As you can see on the figure above, a ProActive cluster is composed of three different components:
  • the ProActive Scheduler Server
  • the database
  • the ProActive Nodes
So, all these components will be set up in Docker containers thanks to a Containers Orchestrator.

First of all, we suppose that you have several hosts for Docker containers, and that you use a orchestrator for your Docker containers.  For instance, it could be Swarm, Mesos, Kubernetes. So you have a way to abstract network in your cluster thanks to Docker network overlay if you use Swarm (1), any Mesos network (2) or Kubernetes network (3).
The protocol that will be used for communication between ProActive Nodes and the ProActive Scheduler Server will be by default PNP but there are several other protocols available, PNPS, PAMR (ProActive Message Routing)

Once you are sure that you have a network that allow your containers to ping each other across hosts, you can, at first, run your Database container. Obviously, you can save data thanks to Docker Volumes and configure users for the Scheduler Server at the Runtime.

The second step is to launch the Scheduler Server Container and link it to the Database Container. If you access the Web UI on the Docker Host, thanks to the port redirection, you can notice that there is no Node running in the Resource Manager portal. This is the normal behaviour, indeed, our goal is to have Nodes running in others containers.

Finally, the last step is the deployments of the Nodes. You just have to configure them to connect to the Resource Manager and chose how many workers you want per Node. You can launch as many Nodes as you want, on as many hosts as you want.

Obviously, you can also keep data and logs for Nodes and Scheduler Server, in Docker Volumes.

Once everything is running, you can write some Workflows, execute them, look on which nodes these are executed.

And here you are, now, you have an entire cluster ready to execute some jobs and enjoy all the benefits of our generic Scheduler which allow you to run Spark jobs, MapReduce jobs, ETL processes…

Friday, August 26, 2016

Submitting ProActive Workflows with Linux cURL

Using the cURL in a linux command line (bash) is a very convenient and resource efficient way to submit workflows.
We need to login and store the current session id with the command:

sessionid=$(curl --data 'username=admin&password=admin' \ http://localhost:8080/rest/scheduler/login)

One can login with curl using username and password as header parameter, transmitted with -H. The result is written into the sessionid variable. The session id can be displayed with echo $sessionid.

Workflows can be submitted with cURL:

curl --header "sessionid:$sessionid" \
 --form \ "file=@filename.xml;type=application/xml" \

The session id variable is inserted into the header and the @ notation allows to send files directly to the server.

Advanced: Workflow Submission with Variables

Workflow variables can be send in the submission URL. Those variables will be replaced.

curl --header "sessionid:$sessionid" \
 --form "file=@file.xml;type=application/xml" \

Important: the URL is now embedded in double quotes "", only then the matrix parameters are properly transferred. Variables are separated by semicolon ;

Wednesday, November 25, 2015

ProActive in Docker containers

ProActive Workflows & Scheduling 7.0.0 is available on the Docker hub, now!

How to run ProActive Workflows & Scheduling inside Docker

1) Have Docker and Docker Compose installed
2) Get the ProActive Docker Compose file here
3) Save the ProActive Docker Compose file as docker-compose.yml
4) Run "docker-compose up" in the same directory as the ProActive Docker Compose file
5) Wait  for the "*** Get started at http:/[IP]:8080 ***" message
6) Browse to http://[IP]:8080
7) Login with default login: admin:admin

Get 10 days free support: registering through 

Be part of our open source community on github

Tuesday, November 24, 2015

Add Docker to your legacy systems with ProActive

ProActive Workflows & Scheduling 7.0 is out! Which has an exciting new feature, Docker containers are supported inside Workflows.

Docker container support > now < in ProActive Workflows & Scheduling 7.0. It combines the strength of supporting, keeping and combining legacy systems with the newly added Docker containers support. That enables everyone to combine legacy systems with new cutting edge technology/systems.
Get access through:

What is ProActive Workflows & Scheduling?

Quick introduction to our:

  • Workflow based Scheduler
  • Resource Manager
  • Web interfaces

Quick introduction

Our Workflow Scheduler allows to compose bigger tasks out of smaller components, by creating Workflows. Workflows consist out of many languages and can contain services and batch tasks. So they provide a lot of freedom in solving problems, collecting and analyzing data and much more. The best part about it is, it is automatically distributed, but more about that in the Scheduler section.

The Scheduler knows all about jobs and tasks, a ProActive Workflow is represented as a job and the Workflow's building blocks are called tasks. It communicates with the Resource Manager to distribute all the work. Resources are maximally utilized because the Scheduler knows what needs to go first for an efficient resource utilization, while not breaking the constraints of the Workflows. So your workload is automatically distributed and the advanced fault tolerance ensures error-recovery after hardware and software failures.

Resource Manager
The Resource Manager knows all resources and everything about them. Resources can be removed and added at runtime, ProActive’s fault tolerance will ensure proper execution of Workflows even if nodes are removed or fail. Best about it is: it is very easy to scale, just one command adds or removes new resources.

Many more features

ProActive Workflows & Scheduling has many more features than introduced in the speedy introduction. If you are interested in something particular, contact us or visit our website.

Legacy systems? - ProActive has you covered

ProActive Workflows & Scheduling is a perfect fit with your legacy system. We at Activeeon have many years of experience with bringing legacy systems together and adding new functionality to the mix. Legacy systems can be re-written inside the ProActive Studio or can be accessed inside Workflows, without touching them. Due to the flexibility of Workflows, new systems can be added just fine.

Working many years with customers brought a wide support of languages for our Studio. Including:

  • Python
  • Bash
  • R
  • Java
  • Ruby
  • And more....

Our ProActive Workflows allow you to combine all types of languages in one Workflow, to reach your goal faster, and without changing your working legacy code.

If you are familiar with High Performance Computing, you might know:

  • IBM Platform LSF
  • PBS Pro

We connect all of them and combine access in our single web interface.
If you have worked with legacy systems, you know that you better keep them running as long as they do, because changes can be expensive.
But Docker has shaken the IT world and made many things easier and faster than before, and now, with ProActive Workflows & Scheduling you can use the newest technologies next to your legacy systems.

Docker support in Workflows & Scheduling

Finally the day arrived to announce Docker support inside ProActive Workflows & Scheduling.
It gives you many advantages:

  • Advanced ProActive Workflows & Scheduling fault tolerance for your Docker containers
  • Run Docker and legacy systems without updating existing code
  • Improve your resource utilization significantly

ProActive Workflows & Scheduling brings advanced fault tolerance features to Docker containers:

  • Resilient to hardware failures
  • Resilient to software errors

Re-execute your Docker containers automatically on different hardware if one or more fail.
If your software experiences errors, re-execute X times in Y different environments.

Run Docker containers alongside legacy systems:

  • Integrate Docker containers in your legacy system mix
  • Don't touch any legacy system
  • ProActive Workflows & Scheduling combines legacy and new systems

Improve your resource utilization significantly
Docker containers, in a distributed environment, need efficient resource management, otherwise machines will be over- or under-utilized.

ProActive Workflows & Scheduling has an effective resource management which reaches high efficiency, and scales your infrastructure according to the current demand.

Want to have a try?

Open source and free!
No installation required, as a web-app or download at:

Tuesday, July 21, 2015

Portfolio risk diversification with Spark & ProActive Workflows

Authors: Michael Benguigui  and Iyad Alshabani

In order to design a financial tool to diversificate risks, we describe the following specific spark jobs orchestrated by ProActive Scheduler. Since we need to process huge amounts of asset prices and achieve streaming k-means to keep models up to date, Spark ( is definitely the right technology. Firstly, MLlib (Spark’s scalable machine learning library offers functionalities for streaming data mining. Secondly, Spark allows to work with advanced map/reduce instructions on RDDs (Resilient Distributed Dataset), i.e. partitioned collections of elements that can be operated in parallel. All these jobs required a powerful framework to be deployed, orchestrated and monitored, as provided by ProActive.


Spark streaming jobs 


Spark Cluster

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program). Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers to allocate resources across applications: Spark’s own standalone cluster manager, Mesos (, YARN ( Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks for the executors to run.

ProActive Workflow, Scheduler & Resource Manager

Spark jobs are created and deployed using Proactive Workflows, offering task templatization to facilitate workflow elaboration. Indeed, Proactive Workflow & Scheduling is an integrated solution for deploying, executing and managing complex workflows in the cloud and on premise as well. All jobs are orchestrated by the ProActive Scheduler and the ProActive Resource Manager virtualizes and monitors  resources.
The ProActive Scheduler and Resource Manager are deployed in a way that monitors each Spark Cluster which, in its turn, monitors its applications. The ProActive scheduler is used as a meta scheduler of the set of Spark clusters. Indeed, each Spark cluster has its own Spark scheduler. ProActive then orchestrates the set of clusters and jobs as shown with the next figure.

Architecture of our application dedicated to diversificate risks

Spark Streaming jobs

We focus on the CAC40 stocks correlations analysis over short periods. Indeed, our spark streaming jobs require a batch duration parameter to be set. This parameter controls the time period in seconds, during which the job stream input data, before processing. The main Spark jobs of our financial tool are described as follows:
  • The first streaming job queries data from Yahoo! or Google web services (user choice), cleans data for the following jobs, and writes data in a shared directory. The batch duration is set by the user, and only successive distinct quotes are stored. For instance, considering a 10s batch duration, quotes will be queried during 10s before being processed and written in a shared directory.
  • A second Spark streaming job keeps up to date the correlation matrix coefficients (“Spearman” correlations in MLlib, against the commonly used “Pearson” correlations not applicable, if we consider the price processes are lognormally distributed as the Black & Scholes model states) by streaming the processed stock quotes from the previous job.
  • A specific Scala job is in charge of drawing a heatmap from a matrix. Applied to the estimated correlations, it depicts the constantly updated correlation coefficients with associated colors.

CAC40 correlation heatmap

  • Relying on KMeans algorithm (MLlib), another Spark streaming job clusterizes correlations to build a well-diversified portfolio. KMeans clustering aims to partition a set of observations into k clusters in which each observation belongs to the cluster with the nearest mean.
  • Here, the drawing job is used for the clusterized correlations representation: companies with similar correlations (considering as many features/correlations as stocks, i.e. 40) are assigned to the same cluster.

CAC40 clusterized correlation heatmap

We could use resulting correlations to simulate the underlying stocks of an option, and moreover through Monte Carlo simulations, the delta greek to delta hedge it (i.e. find positions on the market to reduce the exposition of the underlying stocks variations). For sure, all embarrassingly parallel jobs can benefit from the mapReduce paradigm offered by Spark. In the automated trading context, it could be interesting to let ProActive automatically deploy such application involving specialized streaming spark jobs, and let ProActive dynamically orchestrate them: some streaming jobs would be specific to trading strategies, and other streaming jobs would be in charge of market data analysis. Proactive Workflow & Scheduling afford such dynamic deployment and orchestration, in a fully scalable way.