Tuesday, December 20, 2016

Resource Reservation

Each IT infrastructure includes heterogeneous resources which serve different purposes. Some resources might be more suited for RAM consuming tasks, other might include a secured database, other offer low latency to the customer and other might have bigger bandwidth.

This article will take as an example the need to access a database or any specific resource for a given task. A selection script will ensure this task will be executed on the server hosting this resource but other requirements have to be considered.
What if there are other tasks waiting and yours has higher priority? What if your task requires the whole capacity of the machine?

ProActive offers features to take care of those situations.

  • The multi-node Execution allows to use several nodes for one task. Those nodes can be selected according to a chosen rule (for instance, to use all nodes on the host). This features was created for MPI applications but can be used here.
  • An access token ensure a given node is only used by tasks presenting the matching token and that given tasks will only use on nodes presenting the matching token.
  • A selection script is more flexible than a token and enables custom selections including dynamic parameters such as latency, bandwidth, etc.
To learn more about those features, scroll down.

Reserving an host with a specific token in it.

This stands on the borderline between two features: tokens and multi-node execution. There are some limits due to the fact that the multi-node execution will only take nodes with the right token. This is why, if the host is made available to normal use through the use of tokens, the multi-nodes execution can not be used as-is to reserve the entire host.
One solution is to lock all other nodes on the host before executing the main algorithm. They will need to be unlocked afterwards. A token is used but the exclusive part is handled by this method instead of a multi-nodes execution.

To lock the nodes a pre-script is adapted, to unlock them clean-script is. A post-script could have been used but a clean-script is a better idea as it will be executed no matter the status of the task, as opposed to a post-script that will not be executed in case of error.

How to do it

Locking and unlocking nodes can be done with REST calls (documentation here). Unlocking nodes can be done via a call to {REST server}/rm/node/unlock/ (POST request) with the url of the node to unlock as form and a session id (for identification) as header.
The session id can be retrieved by calling the {REST server}/rm/login/ resource but the url may not be known. In that case, it is possible to match it, from the list of all alive nodes, using the ip address of the target host.

Locking nodes is very similar to unlocking. The request is sent to {REST server}/rm/node/lock/ instead. However, since a busy node will just ignore the request, there is a need to be sure the node was locked. Fortunately, the request return a boolean value corresponding to the result: true if the node was actually locked. A loop on all the not-yet-locked nodes will ensure the task will wait for all the other nodes on the host to be locked (and free).

EDIT: A new feature has been added in the most recent version to simplify this task See Youtube Video.

After that, when running the job, it will be stuck in the pre-script, locking nodes before new tasks are assigned to them (if up to two nodes are freed at the same time) until the host is running only this task, execute the critical script that need exclusive access to a given resource and finally release the nodes.

Example of clean-script for this situation in groovy Example of pre-script for this situation in groovy

Using the described features

For more informations about selection scripts as well as pre, post and clean-scripts, see a precedent article or the documentation.
An access token is defined when creating a node source, within the “userAccessType” of its policy. Only tasks that specified the right “NODE_ACCESS_TOKEN” “Generic Info” can use the nodes from this source. A task can only be executed on a node with the right token.
The multi-node Execution must be set through the “Multi-Node Execution” tab of a task. There, the number of nodes that the task will occupy can be selected and a topology given. A blank or “none” topology will cause the task to be executed as though the tab was empty. For a list of the topologies, see here.

No comments:

Post a Comment