Tuesday, April 4, 2017

Legal & General Use Case

Resource consumption optimization and cloud leverage

Financial institutions are heavy consumers of computer resources for calculation of risks, opportunities, etc. They will take advantage of schedulers to ensure good distribution of workloads onto their existing infrastructure and minimize computing needs. Today let’s focus on Legal & General (L&G) case study and their transition to the new generation of open source scheduler.

Background and Specifications

Legal & General Group plc is a British multinational financial services company headquartered in London (UK). Its products include life insurance, general insurance, pensions and investments. It has operations in the United Kingdom, Egypt, France, Germany, the Gulf, India, the Netherlands and the United States. Their market capitalisation is at £13.5bn and they have £746bn assets under management.

Technologically, L&G used to base its Economic Capital and Solvency II simulation on IBM AlgoBatch. Their objective was to migrate from a private datacenter and Tibco DataSynapse to Azure Cloud and hybrid scheduling solution. Part of this migration the specifications were to handle Solvency II analysis on 2.5 million Monte Carlo scenarios, dynamically define and prioritize workloads and minimize time to delivery of results.

The algorithm principles are described in Fig 1.

  • All RiskWatch simulations needs 18,000 CPU hours and very little I/O.
  • Each Mark-to-Future aggregation and reporting task requires mild CPU requirements but are I/O constrained.
On every host a CPU-intensive task and a I/O intensive one can be run in parallel to maximise throughput. Moreover, each task requires 10 to 60 GB RAM and must not exceed host capacity.


Fig 1: Algorithm principles

Implementation

After a few discussions and a successful PoC, L&G selected ProActive from ActiveEon as their main scheduler.

The first step, now completed, was to replace Tibco DataSynapse from their existing infrastructure. This included adapting the workflows and processes to leverage ProActive capabilities and configuring the solution to optimize the algorithm through better resource consumption and prioritization. For more information on the capabilities leveraged, please, visit this other article on risk management.

The second step now in progress is to leverage Azure Cloud “infinite” capacity and answer one of the requirements: minimize time to results. This step is currently in progress. ActiveEon and L&G are closely working with the Azure team to build a fully integrated solution and optimize the outcome.

Figure 2 represents the architecture planned to be implemented. The system relies on some static nodes and a more elastic part that will contract and expand according to setup policies and actual load. This hybrid system is handled by the resource manager included in ProActive Workflows and Scheduling.As it can be noticed, multiple features have been highlighted such as error handling, task monitoring, ProActive Cloud Automation, hybrid resource handling, etc. which will all play a role in the final outcome of this phase.


Fig 2: Hybrid and elastic architecture for optimized resource consumption and fast results

A few results

As part of the first step, a quantitative analysis has been made.

In Figure 3, two main points are worth noting. First of all, some order has been achieved with ProActive. The scheduler is in charge of allocating the required resource for each task and is prioritizing appropriately. The second observation is related to the granularity of the workflows and an efficient distribution of load. The overall risk simulation is now consuming around 2 hours less of CPU on the same infrastructure. The dependencies between tasks is better handled and ensure that every available resource is used.


Fig 3: ProActive improvement with prioritization and granular workflows

In Figure 4, the same information is represented but the additional value highlighted here is the prioritization and organization. The most important information are received after 5 hours whereas it used to be necessary to wait for all simulations to be completed to receive the results (16 hours). For those critical reports the gain is around 70%.


Fig 4: Phase 1 quantitative results

The previous results have been achieved on the same infrastructure and just by changing the scheduler solution. The current objective is to leverage the cloud “infinite” capacity to distribute the workload as much as possible and receive critical results as soon as possible and the full report in a few hours. The final time achieved will be added to this article as soon as the second phase is completed.

Do not hesitate to try these features by yourself on our Try platform and contact us for customized demos.

No comments:

Post a Comment