Tuesday, July 30, 2013

Distributed Groovy with ProActive - Part 1

Being a new member of the ProActive team, I tried to teach to myself a bit of active objects, or at least create one. Active Objects are a core concept of ProActive, a distributed computing framework which is the foundation of the Scheduler and Resource Manager

I will not detail how the active objects work but illustrate how they can be used with concrete examples and especially showing how a language such as Groovy could integrate with to provide a nice and fluent distributed API. Groovy is a dynamic language running on the JVM providing scripting capacities and a transparent integration with Java. It also enables you to adopt functional programming in Java with closures for instance. To me, it is really a Java++ where you can increase your code readability. And if you are afraid of the performance of a dynamic language, well the latest version brought static typing so you can still use all the nice features and keep a coding style closer to Java with good performance. 

This is the first item of two blog posts where I will explain how to use Groovy with ProActive to execute code in a distributed manner and how GPars could be distributed with ProActive. 

The code presented here is available on GitHub and can be compiled and ran using Gradle wrapper (distributed with the sources). Useful commands are gradlew tasks, gradlew build. To build it you will also need the programming library (where the active objects live), available here and to set the project.ext.programmingLibDir property inside the build.gradle script. The programming library sources are available here

Let’s start with a very simple example showing the deployment of a remote execution node, the creation of a remote active object on this node and the remote execution of code.

This program finds Wally and his friend the Wizard whitebeard. Wally being a normal guy lives here in the present world (main thread) whereas the Wizard with its magic powers lives elsewhere (on a remote node). So when you run this program, you will see where each of them live.

$> gradle WhereIsWallyJava
Wally is here: main 
Wizard whitebeard is here: WallyFingerPointer on rmi:// 

Here we deployed the node locally using a GCM descriptor. WallyFingerPointer is the active object, created by the call PAActiveObject.newActive(). It is very simple class with one method, printing the current thread name from which we can infer Wally’s location. Now this is all Java code and we will now try to improve it with some Groovy.

Here we have the same program written in Groovy, the output is exactly the same. Let’s explain how it works. We create a WallyFingerPointer object and using the with() method we express that all the code inside the closure (inside the braces) will be invoked on the WallyFingerPointer object. It is a shortcut to avoid repeating the variable name when calling several methods on the same object. Then we have the remote code execution using the same closure block but with a call to a static method remote(). This methods takes care of creating the remote deployment context and cleans it when the closure is executed. The foundHim() calls inside the closure will be executed by the active object. Groovy enables us to hide the technical details around active objects and helps to provide a clean API to the user. Of course this a very limited example, a few technical limitations exist related to object serialization for instance.

Now what would be nice is to be able to execute remote closure because closures are core to Groovy, used everywhere for conciseness (and that will be useful for the second blog post of this series). A closure can be seen as an interface with one method so it is itself an object, with call() methods, eventually taking parameters. The closure has also access to variables outside its scope but we will not allow that in our example to facilitate serialization. 

We still use the same program with the first line printing out the location of Wally (local execution) and the second line using a remote closure. The code inside the braces will be executed on the remote node. The way it works is that we created an active object that has one method taking a closure as a parameter and executing it. The closure object will be serialized and sent to the remote node for execution. To serialize the closure we use the dehydrate() method that simply remove references to object out of the closure scopes (the script itself for example). It is now even simpler to execute remote code with Groovy!

We have shown a very simple example of remote code execution with ProActive, adapted this code to use Groovy and introduced a way to run remote closures. In the next blog post, we will use these remote closures to turn GPars, a concurrency framework, into a distributed concurrent framework!