Friday, August 18, 2017

Orchestration in the Machine Learning and Big Data Ages

Modern big data and Machine Learning ecosystems are growing fast in a multi-layered and heterogeneous systems. Businesses are not considering a unique technology but an architecture of interconnected solutions that would bring value to a company. It is expected to consolidate as the market matures but at the moment, tech articles promote multiple solutions to answer specific use cases.

Fortunately, most tech companies are now embracing open development. This supports initial integration and setup through Rest API, SDK, CLI, GraphQL, etc. Then managing diverse solutions at scale quickly leads to new challenges for longer term control and governance. Indeed, each company deploys specific data flows and manages access differently, therefore only custom solutions are suitable.

To control and orchestrate these new environments, ActiveEon developed a product to support business IT, sysadmins and data scientists. The concepts followed by the team behind the solution are:

  • “A picture paints a thousand words.” This is why through the workflow studio, users can easily manage dependencies in order to visualize and design business processes. Advanced controls such as replication simplifies use of advanced behaviors.
  • Flexibility and agility drive successful businesses. Integration at customer’s is supported by an open Rest API, CLI and SDK with a resource agnostic solution. Moreover, users can leverage their preferred choice of language from bash scripts, to python through R.
  • Stability through control and governance drive further innovation. IT admins build and test universal workflows. Operators monitor progress of workflows and manage errors at a granular level.
  • Resource consumption directly impact projects’ ROI. Resource aware applications improves utilization and reduce overprovisioning. Granular management of resources leads to better distribution.

Those diverse values carried in ActiveEon ProActive specifically benefit architecture made of heterogeneous applications with substantial resource consumption. The main challenges addressed here are:

  • Clear control over global execution to gain confidence on expected result delivery time;
  • Visual management of dependencies to ease maintenance, knowledge sharing and further integration;
  • Awareness of resource consumption and resource availability to optimize overall resource requirement;
  • Overall error management to highlight issues and implement automated recovery;
  • Granular workflows to improve distribution, parallelization and offer better error containment.
The final result is a solution that can manage various Machine Learning and Big Data solutions such as Hadoop, Spark, SAS, TIBCO SpotFire, GreenPlum, Python Anaconda, elastic, etc. together with capacity control and resource management.

This short article aims to explore some of the use cases supported by ActiveEon ProActive.

Government - Criminality Analysis

To reduce the criminality rate in the UK, the government is more and more relying on Big and Small Data technologies to compute meaningful information and focus their efforts.

This specific architecture lies on 4 different environments separated by firewalls. ProActive offers a single pane view to manage workflows in all those environments. The features mentioned in the previous use case are also relevant here. In addition, ProActive handles Docker containers to run Python scripts within consistent environments across the different hosts.

The variety of applications that a company has to handle is significant: data storage systems such as Hadoop and Greenplum, data analytics such as Kibana, ElasticSearch, custom scripts in Python with Anaconda and Docker and finally data visualization with Spotfire. At scale, orchestration becomes a bottleneck than can be handled by ActiveEon ProActive.

Public Sector - Visa Application

In order to assess individuals requesting Visas to enter the UK, the Home Office has been heavily relying on ETL technologies transforming data from multiple sources.

The current architecture lies on 7 different environments for security purposes and mainly relies on transforming the data from heterogeneous places to extract and compute key information. In this use case, from a web portal, users can trigger business workflows. The logic and flow of these ones are handled by ProActive which ensures efficient distribution of the load as well as the prioritization of tasks to be processed.

One key feature of such complex multi environment systems is the ability to handle errors at a higher level. It enables consistent behavior and better stability. A single pane dashboard gives operators the ability to setup policies on errors, to contain errors while keeping intermediary work and to analyze logs quickly.

Distribution / Supply Chain Analytics

To improve performances in an ever competing market, supply chain dependent businesses need to combine Big Data, Machine Learning and IoT approaches to follow the lean principles. Tracking each of the 7 wastes is their key to efficiency.

To do so, companies invest in state of the art technologies in order to build new insights. To deal with the variety of input, architectures include:

  • Streaming processes with Kafka, Spark Streaming, Storm;
  • Data back-end with ElasticSearch and Cassandra;
  • Batch processing with Spark, Hadoop;
  • Other data services such as Hive, Spark SQL;
  • Devops tools such as Ansible, Grafana, Logstash;
  • Container management systems with Docker, Swarm, Kubernetes or Mesos.
Connecting those tools and handling dependencies is critical to ensure the various processes run consistently. In this architecture, each element performs a key function of the overall system and each of them offer a way to be controlled.

The challenges arise when those tools have dependencies between each other. This is where an open orchestration system such as ActiveEon ProActive supports overall integration, control and inter-connectivity at scale.

Telecom - Heavy virtualization

From hardware to function virtualization, Telecom companies require multiple level of abstraction to ensure sustainable growth. Physical hardware and individual systems properly abstracted leads to competitive advantages.

On the Telecom ecosystem it is common to have such architecture where:

  • Compute, storage and network are abstracted to enhance capabilities, ensure SLAs and improve ROI,
  • Another layer above such as OpenStack will merge those individual systems into one single pane for unified utilization. This enables software to run efficiently no matter the underlying architecture,
  • Connecting those software will meet business requirements / functions. This is where VNF (Virtual Network Functions) or NFV (Network Function Virtualization) come into play,
  • Those final functions need to communicate and rely on each other. A final layer of orchestration and meta-scheduling will ensure proper connections and flow of information.
In this final use case, the same challenges are faced by businesses. Applications and functions need to take into consideration the impact on the overall resource pool. Moreover, managing dependencies between applications and function is critical at scale.

Machine Learning for Prescriptive Maintenance & Smart Factory

As the Big Data ecosystem matures, pressure to deliver real value increases. Machine Learning solutions enable this value creation by analyzing large amount of data. The insight finally produced needs to drive actions. For instance, businesses expect a migration from predictive to prescriptive maintenance. Data would be collected on a regular basis from various differents applications and data sources to enrich the Machine Learning models. From those regular updates, the overall system will have to automatically edit the rules and actions required to prevent issues.

This use case gathers similar ideas and requirements from the above ones. An orchestrator will have to connect to heterogeneous applications and collect data. It will have to manage various dependencies between solutions and drive actions as required by the Machine Learning model.

In another use case, within Industry 4.0, ActiveEon ProActive is used to orchestrate Machine Learning image analysis to automatically detect defects on satellites under construction. One can imagine the cost being saved when detecting as early as possible the satellite flaws.

ActiveEon ProActive supports and orchestrates libraries such as Tensorflow, Scikit, MLib, Keras, third party applications such as Watson and custom scripts in Python, R, Java, etc. This enables users to code in their favorite environment while connecting to the overall business architecture in a controlled way.


The complexity brought by Big Data and Machine Learning environment lead companies to focus more and more on control and governance while keeping resource consumption under scrutiny. ProActive from ActiveEon is an open source orchestrator and meta-scheduler aiming at letting users being as agile and possible while giving them control at scale.

Do not hesitate to experience it on our online try platform or contact us.

No comments:

Post a Comment