Activeeon Team Blog: 2018

Tuesday, December 4, 2018

OCR and Text Enrichment at Scale

Have you ever felt the need to copy an information from a document such as a business card, a paper form or copy part of a pdf presentation? Or to valuate tons of documents who sleep somewhere in your storage repositories? The need for processing documents is more and more common as the world is moving towards even more Big Data.

Optical Character Recognition (OCR) algorithms have evolved and can now answer most of those needs. The next challenges lie in the following steps such as content enrichment and its indexation.

A solution like QWAMText Analytics enables you to extract key elements and indicators from text, from any nature (Web crawl, documents from your repositories, Your RSE), from any format (OCR, PDF, Office, Html, Plain text), in four languages (English, French, German, Spanish).

Key indicators are:

Named entities: Persons, Companies, Organizations, Events, Locations, Countries, Regions, Cities, Products, Objects, which are simply cited in the document or the main document subject
Concepts: Text expression frequent on the web, frequent in any category (Business, Politics, Environment, Technology, Justice ….), or simply frequent in your corpus
Relations between entities: Company buys another company, Company hires a Person, Company takes part of an Event
Sentiment analysis: Sentiment on your products or services (Voice of the customer), on life inside your company (Voice of the employee), or any topic in your business

In this article, we will go through a complete use case from document download, OCR processing in a distributed architecture and text enrichment with QWAM Text Analytics also parallelized. This processing will be done at scale using Activeeon scheduler for the distribution.

Smart Resource Policies for Cloud Savings

As organizations adopt cloud technologies through hybrid and multi-cloud strategies, workloads and applications require more focus on the management of resources. In order to reduce the cloud bill, elastic resource policies needs to be configured and setup.

Traditional schedulers and orchestrators focus on executing workloads without much focus on the underlying resource. Activeeon ProActive includes a Resource Manager that enables admins to setup smart behaviors on the resource pool. For instance, elasticity can be setup based on specific working times or in more advanced cases setup based on the load waiting in the queue. Once configured, business line users can focus on their processes and how to add value.

Infrastructures and Policies

Activeeon ProActive includes in its Resource Manager two key concepts: infrastructure and policy. The infrastructure defines the mechanism to interact with the actual resource. This could vary from ssh to Rest API depending on the resource provider. The policy defines triggers and which actions to perform.

Infrastructure

The solution natively supports multiple infrastructures: Azure, AWS, IBM Softlayer, OpenStack, SSH, etc. For a complete list, create an account and go on the resource manager interface.

Policy

The solution includes policies useful for multiple purposes. Below are a few examples:

Cron Policy: Deploy resources at 8am and close them at 8pm

Total Core Policy: Keep specific number of resources available

Load Based Policy: Deploy new resources if the queue is above a threshold and close them when they have not been used for 5min

Beyond traditional use cases

Now, let's explore some ways Activeeon customers are leveraging those features to improve their service levels and improve efficiency.

Analytics on Energy Consumption

The energy industry is evolving to face new challenges such as population growth, clean energy, consumption growth with electric vehicles, etc. Analyzing user consumption is key to plan the future.

Recently, Activeeon acquired real user consumption data from a lead. We took this opportunity to leverage our solution and analyze it.

In this quick video, we will present 4 different steps for a proper analysis:

Performing data fusion: Merging different data sources into one usable one with Activeeon embedded data connectors
Clustering electrical consumption: Gathering user consumption pattern in multiple groups
Detecting anomalies in electricity consumption: Automatically detect anomalies based on each user's normal behavior
Analyzing electricity consumption with ELK stack: Manage the lifecycle of ELK for real time data analysis

Friday, April 6, 2018

Low-Code IoT Anomaly Detection with ActiveEon Workflows

The IoT world and datacenter world are known to generate information about anything from sensor data to software logs. This data carries information about environmental parameters, system health, software behavior, etc. and their previous state. It can consequently be used to train predictive models and alert IT operation in case of current and potential future issues. This predictive maintenance model or predictive analytics help companies prevent issues and save money.

In this specific blog post, we will focus on the parallelization of model training for machine learning. It will be done at scale with low-code, and little or no need to configure the underlying infrastructure. We will take you through the main steps to setup a training workflow, edit in seconds the training model, the input, the output and finally we will parallelize it with Activeeon Workflow & Scheduling.

The current blog complement a previous one, showing how we can achieve a simple scalable solution with fewer code.

To keep it simple, as shown below, we will focus on IoT sensors, simulated by tasks that generates random logs at fixed intervals.

Grey arrows represent the flow of the data
Orange arrows represent the interaction of Activeeon Workflow & Scheduling with the various components.

Why Docker for Automation and Analytics

Since the introduction of Docker and its rapid growth, technologies have focused on continuous delivery, integration, etc. or orchestration of containers with two main leaders Kubernetes and Swarm. In early 2015, Activeeon has started working on building more and more integrated features to fit its unique use cases and leverage the core values of containers.

Indeed, most Activeeon users support business line needs for scheduling regular jobs, automating processes, improving the speed of analytics, etc. The objective is to provide consistent and reliable execution no matter the environment. Activeeon and Docker consequently share the same goal and benefit from each other technology.

Docker for Scheduling, Automation, Analytics, etc.

With the rapid growth of the cloud, the computing resource is evolving and may impact the actual execution. Activeeon has an edge to face this trend since it includes a Resource Manager and focus on abstracting away the resource. With Docker containers, Activeeon also enables business lines to execute their jobs within an environment containing the relevant libraries. Thus, it also leverages the values of consistency, reliability and fast startup time from Docker.

To clarify, if a job executes once a semester and require specific packages, Activeeon will abstract away the computing resource and libraries will be included in a specific Docker image. IT operation can then change provider and/or leverage multi-cloud strategies without impacting the job execution.

In conclusion, with Docker and Activeeon, business line users focus on automating and improving time to result in their processes and analytics while IT operation is evolving.

Let’s get technical now

Machine Learning Industrialization

The Machine Learning Open Studio (ML-OS) from Activeeon is a complete platform for machine learning industrialization. The main objective is to improve the time to automate, deploy and govern/control workflows and execution.

Simplified Deployment and Integration

Data scientists and devops engineers have developed different interests and skills over time. A platform which is aimed for deployment needs to acknowledge it and develop interfaces to support each role. This section presents a few features to ease the deployment tasks for data scientists.

Links