Friday, March 6, 2015

ProActive and R: a Machine Learning Example

The ProActive Scheduler is an open-source software to orchestrate, scale and monitor tasks among many hosts. It supports several languages, one of them is the statistical computations and graphics environment R. This environment is known for providing computational intensive functionality, so write your R scripts on a laptop and execute them on different, more powerful machines.

Docker container for portability and isolation

On the Cloud Expo Europe in London you can see an exciting and heavily developed feature of the ProActive Scheduler which is: Docker container support. Running tasks in containerized form has the advantage of increasing isolation between tasks and providing self defined environments in which you can them. Thought further, containers could be used as a replacement for tasks and run them in your environment inside a container. The possibilities are endless and you do not have to care about error recovery, network outage or other complications running in distributed environments, because the ProActive software deals with that.

Machine Learning with ProActive and R: Local Setup

Following, a few steps on how to install and run ProActive and finally execute an R script with the ProActive Scheduler. The following steps are done using an Ubuntu operating system.

Requirement:Installing the R Environment and RJava

Install the R Environment and RJava by typing:

    # sudo apt-get install r-base r-cran-rjava

Download ProActive

  1. Create an account on
  2. Download the current ProActive Workflows & Scheduling
  3. Unzip
  4. Download the ProActive-R-Connector (
  5. Unzip into the ‘ProActiveWorkflowsScheduling-linux-x64-6.1.0/addons’ folder

Ready, you just installed ProActive and R support.

Start ProActive Server


# ./ProActiveWorkflowsScheduling-linux-x64-6.1.0/bin/proactive-server

The standard setting will run the ProActive Scheduler and local 4 nodes.

Note: ProActiveWorkflowsScheduling-linux-x64-6.1.0 is the ProActive home directory, it might be called different when you downloaded a newer version.

Wait until you see “Get started at” showing the link to access the web-interface.

Start the ProActive Studio

The interface will show three possibilities, the most left orange circle is a link to the ProActive Studio, which is used to create workflows and execute them. Click on the left circle to open the Studio. Login with: admin and password admin

Create an R task

After creating a workflow and opening it, the interface will show a 'Tasks' drop down menu, select 'Language R'
to create an R task.

Add your R code

Add your R code, here you can download an altered example from
Add your code under the "Execution" menu which appears after selecting the R_Task.

Note: When R is executed on another machine, it must have all necessary packages installed and loaded, ensure it by installing packages in advance, it can be done within a script by specifying the library and the mirror

Add datasets to R_Task

The script will load a dataset, the SP500_Shiller dataset which you can download here. The R script will be send to one ProActive Node and to ensure that the node has the data we need to specify the data dependency inside the 'Data Management' settings of the R_Task. Specify the SP500_Shiller.csv as an input file from user-space. The file must be copied to ProActiveWorkflowsScheduling-linux-x64-6.1.0/data/defaultuser/admin which is the user-space for the admin user.

Specify output file inside Scheduler

The R script will output an image 'ml-result.png', to see the result we need to tell the task to copy it into our user-space after the task finishes. That is done by adding ml-result.png as an output file to user-space.

Access result

To see the result open ProActiveWorkflowsScheduling-linux-x64-6.1.0/data/defaultuser/admin – the user-space of the admin user - which contains the 'ml-results.png' after the R script finished.

No comments:

Post a Comment