Activeeon Team Blog: 2017

Tuesday, December 5, 2017

Agility for Data Scientists

The process followed by data scientists can be simplified by the diagram below. As shown, it is an iterative process where hypothesis are made and the model improved incrementally. Multiple platforms are competing on supporting data scientists’ needs but too often focus on standard workflows and methods whereas each use case is different and require specific development.

At Activeeon, our data scientist team is focusing on building templates that can be used and tweaked easily at each iteration.

Scheduling Recurring Jobs - Best Practices

This article aims to briefly present the best practices and suggest tools to plan and monitor recurring jobs with Activeeon’s solution. The main concerns are addressed through different features and services:

Schedule management through the Job Planner service,
Workflow validity and management through the Catalog,
Notification on event with an integrated feature to be more proactive,
Requirements checks prior to execution through a selection script to avoid unnecessary issues.

Job Planner

The Job Planner is a service included in ProActive to manage recurring jobs. The main benefits are:

Dedicated and centralized interface,
Clear forecast of workflows,
Simplified management of exceptions: additional executions and/or exclusion periods.

Generate cron expressions

Activeeon supports Python natively

In an objective to build an open platform for Machine Learning workflows and better data analytics, the latest release of Activeeon's solution includes a native Python task.

To keep it to the main benefits:

Analyzing data in Python using numpy, pandas, TensorFlow, etc. is now greatly simplified.
Native Python tasks run 10 to 100 times faster than Jython tasks.
It fully integrates with existing system such as Generic Information or Variable propagation.
Multiple Python versions are supported , even within the same workflow.

How to

AZURE PoC in the Box

As part of the $3 billions invested in Europe, Microsoft is opening multiple datacenters to support fast growing cloud adoption of Azure. The partnership between Microsoft and Activeeon resulted in developing the “AZURE PoC in the BOX” program. The main objective is to support companies on their path to gain competitive advantage through cloud services, flexibility and open solutions and support their transition.

AZURE PoC in the BOX principles

The AZURE PoC in the BOX program aims to reduce transition to the Cloud by tackling adoption barriers. The challenges usually faced are on multiple levels:

Management approval,
Investment justification and money allocation,
Environment configuration and time to market deployments,
Workload transition.

Compute resources will be allocated on the fly for the AZURE PoC to replicate your environment. Thus, investment is greatly reduced and a replicated infrastructure is instantly available.

Activeeon is an open source Software Vendor, the solution aims to simulate the workload on the platform. Its flexibility lets any business (ie “métier”) users of the PoC to orchestrate application framework across environments (e.g. local and Azure) while controlling execution. Scenarios can then be tested in a timely manner thus giving instant feedback to the testing teams.

Moreover, Activeeon supports transition to the cloud in multiple ways. Activeeon’s engineers are Azure certified expert. A workflow conversion tool has been developed to quickly adapt hundreds of workflows to ProActive standards. Finally, Activeeon’s solution being agnostic to the resource, the transition to the cloud is seamless and progressive without IT process interruption and therefore end users disruption.

Leveraging those technologies, the AZURE PoC in the BOX also aims to present companies how to boost agility at a process and infrastructure levels and estimate real cost savings. Moreover, it will stimulate access to new services enabling new insights or free additional resources. For instance, access to high available databases, user friendly replication mechanisms, etc.

AZURE PoC in the BOX journey

In three simple steps, get setup and explore new opportunities.

Easily Try ProActive Integration with TORQUE

In addition to deploy and manage its own resources, ProActive can be used as a meta-scheduler and benefit from infrastructures that are already deployed and configured. ProActive can interface with several schedulers and resource managers, including TORQUE. In this post, we show how ProActive can manage native scheduler node sources, whose nodes belong to the TORQUE resource manager, and how ProActive can submit jobs to these resources. We provide the ‘activeeon/proactive-torque-integration’ Docker image that allows our users to try this particular integration easily in a Docker container. This Docker image includes an installation of TORQUE and an entrypoint that downloads and runs the latest release of ProActive.

Setup the Docker Container

First start the Docker container:

$ docker run -d -h docker -p 10022:22 -p 18080:8080 --privileged --name proactive-container activeeon/proactive-torque-integration

Before using ProActive, you need to monitor the Docker container until the ProActive scheduler is running (a few minutes are needed). You can do this with the following command:

$ docker logs proactive-container --follow

As long as it does not return anything, it means that the ProActive scheduler is not yet running.

When the SchedulerStarter java process is displayed, open a web browser and go to http://localhost:18080/. Wait a bit more if the page cannot be displayed, ProActive is still starting.

Create a Native Scheduler Node Source

Now we can create a native scheduler node source, that will eventually manage the nodes of TORQUE. In order to manage the resources of another scheduler than ProActive, and to have these resources represented as ProActive nodes, you need to create a node source with an infrastructure and policy that are dedicated to native schedulers.

Go to the ProActive Resource Manager and login with the admin/admin credential. Click on ‘Add Nodes’. Choose a name for your node source and fill the form with the following values (remove the quotes):

Orchestration in the Machine Learning and Big Data Ages

Modern big data and Machine Learning ecosystems are growing fast in a multi-layered and heterogeneous systems. Businesses are not considering a unique technology but an architecture of interconnected solutions that would bring value to a company. It is expected to consolidate as the market matures but at the moment, tech articles promote multiple solutions to answer specific use cases.

Fortunately, most tech companies are now embracing open development. This supports initial integration and setup through Rest API, SDK, CLI, GraphQL, etc. Then managing diverse solutions at scale quickly leads to new challenges for longer term control and governance. Indeed, each company deploys specific data flows and manages access differently, therefore only custom solutions are suitable.

To control and orchestrate these new environments, ActiveEon developed a product to support business IT, sysadmins and data scientists. The concepts followed by the team behind the solution are:

“A picture paints a thousand words.” This is why through the workflow studio, users can easily manage dependencies in order to visualize and design business processes. Advanced controls such as replication simplifies use of advanced behaviors.
Flexibility and agility drive successful businesses. Integration at customer’s is supported by an open Rest API, CLI and SDK with a resource agnostic solution. Moreover, users can leverage their preferred choice of language from bash scripts, to python through R.
Stability through control and governance drive further innovation. IT admins build and test universal workflows. Operators monitor progress of workflows and manage errors at a granular level.
Resource consumption directly impact projects’ ROI. Resource aware applications improves utilization and reduce overprovisioning. Granular management of resources leads to better distribution.

Activeeon Job Planner: in depth

In the previous article, we already presented the new job planner, a tool to execute jobs periodically. Now let’s explore how to add exceptions: include or exclude specific dates. To sum up, be more flexible.
For example, if you want to send analytic reports each monday but not the day’s off, you can create a calendar which avoid these days. Or, if you execute a job once a month but need two extra ones, at the beginning and at the end of the summer, it is possible.

HOW TO USE IT ?

Pilot with your Applications (ETL, CI/CD, etc.)

As any software solution nowadays, the key for company success is the ability to integrate all solutions into one. Indeed, as technology evolves, software solutions are becoming more and more specialized and require tight integration between each other to bring true value.

ActiveEon ProActive has been developed with an open approach. The solution is easy to integrate in any architecture and connect to external services. To be more precise, the solution has an comprehensive open Rest API to let any third party application integrate with it. On the other side, tasks can be developed in any language to execute rest calls or simply execute a command line in the relevant host.

Benefits of using ProActive as a pilot

Without going in too much detail, some of the benefits of using ProActive to pilot third party software are:

Shared resources, allocate resource dynamically depending on each service and application needs,
Priority, preempt resources from low priority tasks to give them to urgent ones,
Multi-language, automate business workflows through custom made scripts written in the most suited language,
Automation of preparation tasks before starting third party services,
Error handling, monitor and manage errors at a higher level for more control

How simple is it?

Nowadays, most companies are providing ways to connect to their system through API, CLI or SDKs.

API - If the company is providing an API, a ProActive task will be responsible for connecting and submitting Rest calls to the ETL endpoints. In Groovy, it is as simple as json = ['curl', '-X', 'GET', '--header', '', 'http://api.giphy.com/v1/stickers/search?q='+input+'&api_key=dc6zaTOxFJmzC'].execute().text
SDK - If the company is providing a library in Java/Python/etc., a ProActive task in the relevant language will be responsible for connecting and submitting relevant request to the ETL service. In that case, the library will have to be loaded within the ProActive folder or using a fork environment such as with Docker.
CLI - If the company is providing a CLI, a ProActive task will be responsible for connecting and submitting requests to the CLI service. In that case. a selection script may be used to select the relevant host and execute the command within it or as explained above a Docker container with the relevant SDK can be used.

Do not hesitate to try these solutions on our user platform.

Successful integrations at customers

Accelerating machining processes

The MC-SUITE project proposes a new generation of ICT enabled process simulation and optimization tools enhanced by physical measurements and monitoring that can increase the competence of the European manufacturing industry, reducing the gap between the programmed process and the real part.

Automatization of the full machining process using ProActive workflows

Figure 1: The full machining process

The workflow controls the complete execution of all tools involved in the virtual machining process and automatically manages file transfers between tool executions. Figure 1 depicts the graphical representation of the orchestration xml file.Using dataspaces is crucial since tasks are submitted to ProActive nodes that could live remotely. Therefore, required files by a task must be placed in the scheduler dataspace to be automatically transferred to the running task temporary dir. To achieve that, ProActive provides dedicated tags (transferFromUserSpace, transferToUserSpace,...). Moreover, files will be referred from the task script using the file name, i.e. without specifying the path.

This workflow suffers from a lack of automatization. Indeed, the CAD task pops up a GUI, requiring parameters to be set for the Himill configuration file generation. This step breaks the full procedure automatization. To tackle that, we proposed an updated version of the workflow by first, migrating all the CAD parameters in the workflow parameters section. This can be easily achieved since the orchestration code follows the xml syntax, clearly separated from the functional code.
removing the CAD task and the CAD installation path parameter. Then by adding a groovy section to dynamically generate the Himill configuration file according to the workflows parameters. Each task supports most of the main programming language, and we used Groovy which offers advanced methods to easily work with .ini files.

High Availability / Disaster Recovery Plan

Today let's discuss about high availability (HA) or more precisely disaster recovery plan (DRP).

As with any system, downtime can have major consequences for businesses. This quick article simply discuss two ways of achieving HA for ProActive.

Overall architecture

There are multiple ways for ProActive to be configured for High Availability (HA) / Disaster Recovery Plan (DRP).

ProActive stores its state within a database and includes an abstraction layer to configure the connection to a database. By default, the database is embedded within the ProActive folder. The objective is consequently to connect ProActive to a HA database (e.g. MariaDb can be configured this way, AWS RDS, ...)
The state being stored in an external database, it is important to monitor the behavior of ProActive. If it does not respond, it can be restarted which will then restart the scheduler, the diverse interfaces and connect to the database.

Below are two simple examples.

Introduction to Job Planner

Job Planner Methods

polling informations from other website,
updating regularly your data,
performing verifications and maintenance,
testing for new file in folder,
etc.

In this article, we will review these methods and go deeper into the latest one: the job planner.

Workflow Catalog through Examples

Introduction with an example : the workflow lifecycle management

Today let’s discover the new workflow catalog from ProActive. In a few words, the Workflow Catalog is a ProActive component that provides storage and versioning of Workflows through a REST API.

For a simplified explanation, we have here an example of ProActive utilization with three buckets. Each buckets represents a different stage of the workflow lifecycle. For instance, the workflow1, in the development bucket, was edited 3 times at the moment. Each edition corresponds to a revision. All the people who have access to the same bucket can read and write on all the workflows and their revisions.

A few use cases :

How would you handle sharing workflows ? Since buckets can be accessed by several users, transferring workflows between buckets simplifies the sharing process.

What about when you need a specific workflow within hundreds ? Don’t worry, use the search tool to narrow the list returned. Parameters such as owner can be used, other custom fields are also available thanks to generic information (e.g. infrastructure, language, etc.).

You found new bugs in the latest workflow revision ? The delete function can remove a selected revision to come back to another version.

Now let’s try some of these functions :

Legal & General Use Case

Resource consumption optimization and cloud leverage

Financial institutions are heavy consumers of computer resources for calculation of risks, opportunities, etc. They will take advantage of schedulers to ensure good distribution of workloads onto their existing infrastructure and minimize computing needs. Today let’s focus on Legal & General (L&G) case study and their transition to the new generation of open source scheduler.

Background and Specifications

Legal & General Group plc is a British multinational financial services company headquartered in London (UK). Its products include life insurance, general insurance, pensions and investments. It has operations in the United Kingdom, Egypt, France, Germany, the Gulf, India, the Netherlands and the United States. Their market capitalisation is at £13.5bn and they have £746bn assets under management.

Technologically, L&G used to base its Economic Capital and Solvency II simulation on IBM AlgoBatch. Their objective was to migrate from a private datacenter and Tibco DataSynapse to Azure Cloud and hybrid scheduling solution. Part of this migration the specifications were to handle Solvency II analysis on 2.5 million Monte Carlo scenarios, dynamically define and prioritize workloads and minimize time to delivery of results.

Orchestration and Meta-Scheduling for Big Data Architecture

MAPREDUCE, HADOOP, ETL, ELT, INFRASTRUCTURE, CLOUD, WORKFLOW, META-SCHEDULER AND COST OPTIMISATION

In today’s world, the adoption of Big Data is critical for most company survival. Storing, processing and extracting value from the data are becoming IT department's’ main focus. The huge amount of data, or as it is called Big Data, have four properties: Volume, Variety, Value and Velocity. Systems such as Hadoop, Spark, Storm, etc. are de facto the main building blocks for Big Data architectures (e.g. data lakes), but are fulfilling only part of the requirements. Moreover, in addition to this mix of features which represents a challenge for businesses, new opportunities will add even more complexity. Companies are now looking at integrating even more sources of data, at breaking silos (variety is increasing with structured and unstructured data), and at real-time and actionable data. All those are becoming key for decision makers.

Multiple solutions in the market have been supporting Big Data strategies, but none of them fits every company’s use cases. Consequently, each of these solutions will be responsible for extracting some meaning from the data. Although this mix of solutions adds complexity to infrastructure management, it also leverages the full information that can be extracted from the data. New questions are then raised like: How do I break company silos? How to make sense of this pool of unrelated and unstructured data? How to leverage supervised and unsupervised machine learning? How do I synchronize and orchestrate the different Big Data solutions? How can I allocate relevant resources? How do I ensure critical reports get prioritized? How do I enforce data locality rules and spread of the information? How do I monitor the whole data journey?

This article highlights two points on technical, operational and economic challenges around orchestration solutions. To leverage Big Data, companies will address those in order to optimize its infrastructure, extract faster and deeper insight into the data, and thus get a competitive edge. For a more detail and complete document, do not hesitate to download the full white paper.

Build Your Employee Self-Service Portal With ProActive

In the past few years, people are consuming more and more services to perform their jobs properly. Managing them is the role of the IT department. However some employees might want to bypass this department to be faster which can create what is called “shadow IT”. Some others might be frustrated at being queued and waiting for additional checks to be performed.

A solution for this situation needs to balance IT and non-IT needs by providing a user friendly interface which is fast for non-IT (from another department) users and provides governance for IT users. More precisely, this could be achieved by joining all possible applications and services into a single platform. This way, application and service lifecycles can be easily managed and custom templates can be made available for all to use (and create). This would allow for faster and more agile deployment as well as improve governance by giving visibility over the current services and by using a common standard.

ProActive Cloud Automation offers a solution through a self-service portal to monitor and manage application and service lifecycles. The IT department can easily create templates which follow business policies and could be made available to user groups.

Network State Selection for Data Transfer

Many applications such as online meetings and video streaming need to transmit large amount of data as quickly as possible. Good signal is consequently required for those transmissions, otherwise you might experience web conferences cutting or video pausing at unexpected time. Other applications need to send content with appropriate quality according to signal strength.

Resource selection based on network properties

However, the quality of said signal can vary due to external parameters. Fortunately, networks are redundant which allow multiple paths with different properties to lead to the same device. This is why there is a use for flows to pass by carefully chosen resources. This choice may be made according to values such as ping and/or bandwidth.

Cloud Wastes, 7 Recommendations

Accessing new resources has never been easier than today with public cloud providers offering compute power on an hourly based with varying specs such as GPU, RAM, CPU, etc. However, recent articles over the web have come to the same conclusion, companies are wasting money in the cloud.

“RightScale reveals that cloud users could reduce their spend by an average of 35 percent by making better use of resources.” BetaNews

“The benefits of shifting business applications to Web-friendly cloud services is proving far more complex than lining up a partner and flipping a switch” Wall Street Journal

“More and more companies are migrating to the cloud for computing power, but many are actually wasting too much money on unused services.” CFO.com

To quickly summarize, the main reasons behind these cloud leaks mentioned across these articles are oversized resource, resource not required at all time and algorithm not resource aware.

Links

Tuesday, December 5, 2017

Monday, November 27, 2017

Job Planner

Generate cron expressions

Monday, November 20, 2017

How to

Monday, October 23, 2017

AZURE PoC in the BOX principles

AZURE PoC in the BOX journey

Tuesday, October 17, 2017

Setup the Docker Container

Create a Native Scheduler Node Source

Friday, August 18, 2017

Friday, August 11, 2017

Activeeon Job Planner: in depth

HOW TO USE IT ?

Saturday, August 5, 2017

Benefits of using ProActive as a pilot

How simple is it?

Successful integrations at customers

Thursday, July 27, 2017

Automatization of the full machining process using ProActive workflows

Friday, July 21, 2017

Overall architecture

Monday, July 10, 2017

Job Planner Methods

Friday, June 23, 2017

Introduction with an example : the workflow lifecycle management

A few use cases :

Now let’s try some of these functions :

Tuesday, April 4, 2017

Resource consumption optimization and cloud leverage

Background and Specifications

Friday, March 24, 2017

Wednesday, January 18, 2017

Wednesday, January 11, 2017

Resource selection based on network properties

Friday, January 6, 2017

7 Recommendations and Best Practices