Friday, March 24, 2017

Orchestration and Meta-Scheduling for Big Data Architecture

MAPREDUCE, HADOOP, ETL, ELT, INFRASTRUCTURE, CLOUD, WORKFLOW, META-SCHEDULER AND COST OPTIMISATION

In today’s world, the adoption of Big Data is critical for most company survival. Storing, processing and extracting value from the data are becoming IT department's’ main focus. The huge amount of data, or as it is called Big Data, have four properties: Volume, Variety, Value and Velocity. Systems such as Hadoop, Spark, Storm, etc. are de facto the main building blocks for Big Data architectures (e.g. data lakes), but are fulfilling only part of the requirements. Moreover, in addition to this mix of features which represents a challenge for businesses, new opportunities will add even more complexity. Companies are now looking at integrating even more sources of data, at breaking silos (variety is increasing with structured and unstructured data), and at real-time and actionable data. All those are becoming key for decision makers.

Multiple solutions in the market have been supporting Big Data strategies, but none of them fits every company’s use cases. Consequently, each of these solutions will be responsible for extracting some meaning from the data. Although this mix of solutions adds complexity to infrastructure management, it also leverages the full information that can be extracted from the data. New questions are then raised like: How do I break company silos? How to make sense of this pool of unrelated and unstructured data? How to leverage supervised and unsupervised machine learning? How do I synchronize and orchestrate the different Big Data solutions? How can I allocate relevant resources? How do I ensure critical reports get prioritized? How do I enforce data locality rules and spread of the information? How do I monitor the whole data journey?

This article highlights two points on technical, operational and economic challenges around orchestration solutions. To leverage Big Data, companies will address those in order to optimize its infrastructure, extract faster and deeper insight into the data, and thus get a competitive edge. For a more detail and complete document, do not hesitate to download the full white paper.

Wednesday, January 18, 2017

Build Your Employee Self-Service Portal With ProActive

 In the past few years, people are consuming more and more services to perform their jobs properly. Managing them is the role of the IT department. However some employees might want to bypass this department to be faster which can create what is called “shadow IT”. Some others might be frustrated at being queued and waiting for additional checks to be performed.

 A solution for this situation needs to balance IT and non-IT needs by providing a user friendly interface which is fast for non-IT (from another department) users and provides governance for IT users. More precisely, this could be achieved by joining all possible applications and services into a single platform. This way, application and service lifecycles can be easily managed and custom templates can be made available for all to use (and create). This would allow for faster and more agile deployment as well as improve governance by giving visibility over the current services and by using a common standard.

 ProActive Cloud Automation offers a solution through a self-service portal to monitor and manage application and service lifecycles. The IT department can easily create templates which follow business policies and could be made available to user groups.

Wednesday, January 11, 2017

Network State Selection for Data Transfer

 Many applications such as online meetings and video streaming need to transmit large amount of data as quickly as possible. Good signal is consequently required for those transmissions, otherwise you might experience web conferences cutting or video pausing at unexpected time. Other applications need to send content with appropriate quality according to signal strength.

Resource selection based on network properties

 However, the quality of said signal can vary due to external parameters. Fortunately, networks are redundant which allow multiple paths with different properties to lead to the same device. This is why there is a use for flows to pass by carefully chosen resources. This choice may be made according to values such as ping and/or bandwidth.


Friday, January 6, 2017

Cloud Wastes, 7 Recommendations

Accessing new resources has never been easier than today with public cloud providers offering compute power on an hourly based with varying specs such as GPU, RAM, CPU, etc. However, recent articles over the web have come to the same conclusion, companies are wasting money in the cloud.

“RightScale reveals that cloud users could reduce their spend by an average of 35 percent by making better use of resources.” BetaNews
“The benefits of shifting business applications to Web-friendly cloud services is proving far more complex than lining up a partner and flipping a switch” Wall Street Journal
“More and more companies are migrating to the cloud for computing power, but many are actually wasting too much money on unused services.” CFO.com

To quickly summarize, the main reasons behind these cloud leaks mentioned across these articles are oversized resource, resource not required at all time and algorithm not resource aware.

7 Recommendations and Best Practices

Tuesday, December 20, 2016

Resource Reservation



Each IT infrastructure includes heterogeneous resources which serve different purposes. Some resources might be more suited for RAM consuming tasks, other might include a secured database, other offer low latency to the customer and other might have bigger bandwidth.

This article will take as an example the need to access a database or any specific resource for a given task. A selection script will ensure this task will be executed on the server hosting this resource but other requirements have to be considered.
What if there are other tasks waiting and yours has higher priority? What if your task requires the whole capacity of the machine?


Thursday, December 8, 2016

Big Data, Languages and Software solutions

Inspired from an article from ThinkR

Big Data is a common term that everyone uses without really having a clear definition of what it is. It is used everywhere and in every industry from transportation, human resources, healthcare, etc. which makes it more difficult to specify. For this article, let’s agree that big data is when it is not possible to handle the information with a single computer even though these days, computers are available with 1Tb of ram. In some more and more common cases, all the data for a computation might not even be in one place which requires advanced distributed algorithms and software.

Big Data for Hadley Wickham

Hadley Wickam well known developer for the R language, who developed multiple librairies used in most projects, defined 3 main categories for Big Data:

Monday, December 5, 2016

ProActive Calendar Step-by-Step




Calendars offer a nice user friendly interface which allow to plan ahead, manipulate events, duplicate them and much more.

By extending traditional applications, more than just events can be handled.




Introducing our calendar service. With this, ProActive Workflows&Scheduling offers more flexibility and a way to automatically synchronize the scheduler with your favorite calendar manager (Google Calendar, Thunderbird, Outlook, Apple Calendar, etc). This offer a new UI adapted for repetitive tasks, enabling more intuitive control through better visualization and simplified job handling (drag&drop). To use our calendar service, navigate to the scheduler, click on “calendar”, retrieve the url (generate it if needed) and use it to create a new calendar. It is possible to test this through our free online test platform.