Friday, March 24, 2017

Orchestration and Meta-Scheduling for Big Data Architecture

MAPREDUCE, HADOOP, ETL, ELT, INFRASTRUCTURE, CLOUD, WORKFLOW, META-SCHEDULER AND COST OPTIMISATION

In today’s world, the adoption of Big Data is critical for most company survival. Storing, processing and extracting value from the data are becoming IT department's’ main focus. The huge amount of data, or as it is called Big Data, have four properties: Volume, Variety, Value and Velocity. Systems such as Hadoop, Spark, Storm, etc. are de facto the main building blocks for Big Data architectures (e.g. data lakes), but are fulfilling only part of the requirements. Moreover, in addition to this mix of features which represents a challenge for businesses, new opportunities will add even more complexity. Companies are now looking at integrating even more sources of data, at breaking silos (variety is increasing with structured and unstructured data), and at real-time and actionable data. All those are becoming key for decision makers.

Multiple solutions in the market have been supporting Big Data strategies, but none of them fits every company’s use cases. Consequently, each of these solutions will be responsible for extracting some meaning from the data. Although this mix of solutions adds complexity to infrastructure management, it also leverages the full information that can be extracted from the data. New questions are then raised like: How do I break company silos? How to make sense of this pool of unrelated and unstructured data? How to leverage supervised and unsupervised machine learning? How do I synchronize and orchestrate the different Big Data solutions? How can I allocate relevant resources? How do I ensure critical reports get prioritized? How do I enforce data locality rules and spread of the information? How do I monitor the whole data journey?

This article highlights two points on technical, operational and economic challenges around orchestration solutions. To leverage Big Data, companies will address those in order to optimize its infrastructure, extract faster and deeper insight into the data, and thus get a competitive edge. For a more detail and complete document, do not hesitate to download the full white paper.