Tuesday, January 16, 2018

Machine Learning Industrialization

The Machine Learning Open Studio (ML-OS) from Activeeon is a complete platform for machine learning industrialization. The main objective is to improve the time to automate, deploy and govern/control workflows and execution.

Simplified Deployment and Integration

Data scientists and devops engineers have developed different interests and skills over time. A platform which is aimed for deployment needs to acknowledge it and develop interfaces to support each role. This section presents a few features to ease the deployment tasks for data scientists.

Language Support

To fit any preference of languages, Activeeon users are able to build workloads using multiple languages such as Python, R, Bash, Groovy, Javascript, …

Docker Support

In the past few years, Docker has supported many developers and data scientists to ship code faster. With Activeeon, they will be able to request their algorithm to run within a specific container. This ensures reliability of execution on any machine and environment.

Variable and file sharing

In most cases, the overall analytic workflow is split into multiple phases to collect the data, prepare it, transform it and send results. Thus, there is a need for a simple system to share variables and files. With Activeeon, any user can share this data through the overall workflow and within containers. This facilitate the integration within the overall architecture.

Resource selection

As described above, the overall business process is split in multiple phases with different resource requirements. For instance, a task training a model will prefer a resource with GPU, a task connecting to a database will require low latency, … With Activeeon, while building the overall workflow, users can define the type of resource required to launch their tasks without the need to know the underlying infrastructure.

Data connectors

In their initial phase, data scientists have to connect to various and heterogeneous data sources. To ease the process, Activeeon’s solution includes multiple connectors. This accelerate development and deployment phases by providing standardized and custom output. In the image below, a few connectors supported are presented. Those connectors can also be used to store final results and share them with other teams.

Parallelization on premises and any cloud

IT services and devops engineers are requested to stay more and more flexible regarding the resource provider. They will then adopt strategies such as multi-cloud, hybrid-cloud, cloud bursting, etc. and ensure that any workload can run everywhere.

Nevertheless, data scientists are not expert in each platform and require consistency for faster deployment. Activeeon’s solution includes a Resource Manager that provide an abstraction layer over the resource. Devops teams can then connect heterogeneous resources to the platform without affecting any workload.

For instance, in some cases, workflows are developed on-prem and then moved to the cloud for scalability purposes. With Activeeon, no extra work will be required to ensure consistency.

Control of execution

Without going in more depth in this article, industrialization also includes control and management of the execution. With Activeeon, companies are able to monitor, automate error response, setup alerts, and plan regular execution. Those features leverage the full potential of the workflow and algorithm over its lifetime: history can be saved for analysis, alerts are sent when require, human intervention is controlled, …

Customization of the Dev Interface

Finally, in data science like in any other dev environments, people are specialized in different domains and require different interfaces. With ML-OS, data scientists and data analysts can setup and use their environment according to their knowledge. A beginner will simply use drag and drop functions built by others while experts will leverage those drag and drop functions to get things done faster and then deep dive within the code to optimize their algorithm.

Those libraries of functions are customizable. Activeeon provide a few standard ones to get started but libraries can be extended to fit more use cases. The whole studio can therefore be customized for better efficiency.

A full RBAC system is included in the solution to ensure only the right person can access those libraries.

In conclusion, on one hand, data scientists can define compute requirements, setup distribution rules, dockerize, visualize intermediary and final results, setup customized toolkit, access to trained models. On the other hand, IT and devops engineers setup cloud strategies (multi, hybrid, etc.), setup alerts and monitoring, provide relevant resources, control costs and ease deployment for any other users.

For more information, check the documentation here, start on our trial platform or check our latest video.

No comments:

Post a Comment