Thursday, December 8, 2016

Big Data, Languages and Software solutions

Inspired from an article from ThinkR

Big Data is a common term that everyone uses without really having a clear definition of what it is. It is used everywhere and in every industry from transportation, human resources, healthcare, etc. which makes it more difficult to specify. For this article, let’s agree that big data is when it is not possible to handle the information with a single computer even though these days, computers are available with 1Tb of ram. In some more and more common cases, all the data for a computation might not even be in one place which requires advanced distributed algorithms and software.

Big Data for Hadley Wickham

Hadley Wickam well known developer for the R language, who developed multiple librairies used in most projects, defined 3 main categories for Big Data:

Monday, December 5, 2016

ProActive Calendar Step-by-Step




Calendars offer a nice user friendly interface which allow to plan ahead, manipulate events, duplicate them and much more.

By extending traditional applications, more than just events can be handled.




Introducing our calendar service. With this, ProActive Workflows&Scheduling offers more flexibility and a way to automatically synchronize the scheduler with your favorite calendar manager (Google Calendar, Thunderbird, Outlook, Apple Calendar, etc). This offer a new UI adapted for repetitive tasks, enabling more intuitive control through better visualization and simplified job handling (drag&drop). To use our calendar service, navigate to the scheduler, click on “calendar”, retrieve the url (generate it if needed) and use it to create a new calendar. It is possible to test this through our free online test platform.


Tuesday, November 29, 2016

Risk Model Optimization and Traceability for Solvency II

Solvency II and Basel III, the new standards for insurance and bank regulations, codify and unify capital, data management, and disclosure requirements for insurers and banks in order to protect companies and consumers from risk.

More specifically, with Solvency II the effect of every insurance contract capital market relation, optionally and risk source has to be modeled and measured. The goal is to ensure insurances have enough equity to cope with any forecasted risk. Considering the amount of heterogeneous assets and liabilities owned by an insurance company, algorithms such as Monte Carlo are suitable to estimate the result of different scenarios with a given accuracy and get metrics such as VaR (value at risk).

A typical process using a Monte Carlo algorithm regularly involves multiple steps:

  1. Defining a model for each asset, liability, economic scenarios, etc. using tools such as Quantlib, Bloomberg solution, Wall Street system, Apollo, etc.
  2. Create scenarios based on these models (It can easily go above 2 millions scenarios)

New Feature! Preview Intermediary and Final Results.

The “Task Preview” feature allowed you to access the results of individual tasks as soon as the task ended. It worked fine for common types but could be cumbersome for more complex cases. This is why we decided to improve it.

Goodbye Task Preview long live Task Result!

Now (as of Proactive Workflow&Scheduling 7.20), each task comes with a resultMetadata map, related to the “result”, which can contain the following informations:

  1. file.extension,
  2. file.name,
  3. content.type specifying how the browser should open the result.

After the task’s execution, the result can then be opened in a browser or downloaded from the Preview tab in Scheduler.

See also the documentation



You can see this feature in action by using this workflow on our online platform: try.activeeon.com.

Friday, November 11, 2016

Optimized Algorithm Distribution

With the growth rate of devices and sensors, analyzing collected data becomes more complex. Data scientists and analysts have to deal with heterogeneous sources and leverage distributed environment to improve analysis time. To achieve this, the ability to split a complex algorithm into smaller units is fundamental to take advantage of distributed environments. This article will detail ActiveEon’s approach.

To distribute efficiently with ProActive, ActiveEon has defined a task as the smallest unit of work that can be executed. A composition of tasks is called a workflow and includes additional features such as dependencies, variable propagation and shared resources. By following this principle, distribution can be taken into consideration by design since dependencies are explicit. The workflow designer is therefore in control of the distribution and can heavily optimize his/her resource utilization and reduce the execution time.

Terminology

A task is the smallest unit of work.
A workflow is composition of tasks which includes additional features such as dependencies, variable propagation and shared resources.
A job is a workflow that has been submitted to the scheduler. (Multiple jobs can be created from the same workflow.)

It is important to distinguish Workflows and Jobs. A Workflow is a directed graph of Tasks where Tasks represent the smallest units of work that can be distributed and executed on nodes. A Workflow can be seen as a model or a template. A Job is an instance of a Workflow which has been submitted to the Scheduler and is managed by this last. Jobs differ from Workflows in many ways: Jobs may have variables value update at runtime, controls such as loops which are expanded at runtime, etc.

Monday, November 7, 2016

Automated Workflow Translation to modern and Open Source Schedulers

Digital Transformation and Migration to the Cloud with Open Solutions

In the past couple of years new trends emerged in IT, more precisely, it could be noticed that companies are investigating new ways to leverage their Big Data, hybrid cloud infrastructure and IoT devices. This trend is leading to new business requirements which are redefining IT architecture with greater focus on flexibility and automation.

Current legacy software struggles to answer those needs which makes companies investigating open solutions. The value is evolving from infrastructure optimization to include platform connectors and Open Rest Api. Consequently, open source solutions have an edge since they offer a comprehensive communication system which allows a complete integration with existing tools. In conclusion, a migration towards open solution will ease IT digital transformation, migration to the Cloud, etc. and support future business needs.

To support the migration from one scheduler to another, ActiveEon offers its service to automatically translate workflows between Control-M (BMC), Autosys (CA), Dollar Universe (Automic), One Automation (Automic) ProActive (ActiveEon), etc.

This service will particularly benefit big companies with thousands of workflows such as Banks, Insurance companies, Financial institutions, Telecoms, Government agencies, etc. Indeed, at this scale, automation is advised for a quick migration process.

Migration Tool

Monday, October 31, 2016

Leverage Hybrid Infrastructure with SPOT instances or Preemptible VMs

Cloud computing allows companies to accelerate their business processes, optimize infrastructure costs and scale more quickly. However, integrating these new services with an existing infrastructure could be complex and not fully leverage this new opportunity. This article focuses on unstable instances like SPOT (AWS) instances or Preemptible VMs (GCP) which offers cheaper computational power.

What is a SPOT instance and a Preemptible VM?

AWS offers a service called Spot instances. It allows customers to bid on unused EC2 resources in any availability zone. GCP offers a similar service called Preemptible VMs. Customers can then use compute capacity with no upfront commitment at an hourly rate lower than the on-demand rate. The main drawback is that GCP and AWS can withdraw this instance at any time with little upfront warning depending on the market price of the resource and the bidding price.

Workloads Requirements

As explained in those two descriptions, instances can be withdrawn from customers at any time. The workloads leveraging the computing capacity require an advanced error management tool to support uncertainty on the lifecycle, otherwise any previous work will be lost and defeat the overall purpose. These workloads also require to be split in smaller tasks to ensure computation can be perform in a short period of time. This will ensure any work done to be saved for other dependent tasks.

When to leverage those?

One the common use cases of this services is when time is less a constraint than cost. For instance, a workload running at night could take as much time as required as long as it is completed at 8am.

Another common use case is when demand spikes in the current infrastructure occur which generate a large queue. This could often be seen in R&D environment using a single HPC or a limited infrastructure. In that case, time constraints have to be balanced with price. Spot instances offer the ability to unload the queue with cheaper than on-demand price instances.

Some business applications require a stable environment to perform efficiently. However, this stable resource might be taken when needed. Leveraging these unstable resources will enable to free up future stable resources beforehand.

Many other use cases can be found on GCP and AWS websites or on the Netflix blog (e.g. Netflix Blog)

What offers ProActive in these situations?