Apple and Samsung have merged. Cloudera + Horton?
It seems like an incredible headline, right? Well, something similar just happened in the world of BigData platforms.
On the night of October 3rd, we found out at 4:00 in the morning on the Asian exchanges, big news would break: Cloudera and HortonWorks came together to become a single company, in a ratio of 60/40, respectively. It’s okay. I’m not going to overwhelm you with data, statistics or market projections for data in a digital context. But, it is noteworthy that the joining of these two forces could easily take 90% of the market. Why? Against what or whom? Let’s see if we can understand it a little better.
Cloudera started in 2009 with a clear purpose: to be the first company to meet the needs of a full, secure and solid Hadoop ecosystem bundle with business features, which make its use viable for large companies. Until then, the Hadoop ecosystem was nothing more than a cluster of Apache projects with their own lives and, for example, almost no functionality or security. It was an immense, very technical and unreliable project to take a plunge and invest in this technology for this context. Cloudera promised all of this, a proprietary SW layer to help manage, operate and deploy security in more than 10 Apache projects, ensuring that they worked in an integrated way.
Cloudera is the first to reach the market with a solid speech for safety, compliance, PCI, ease of including this new ecosystem in the applications arena and with a great processing and storage capacity, flexibility, unstructured data, with a reduced price compared to the large players of the time, such as Teradata, Exadata and Netezza.
Cloudera is the first to reach the market with a solid speech, safety, compliance, and PCI with a lower price than the large players of the time and with greater flexibility.
As an open ecosystem, they began with HDFS, Zookeper, MapReduce, etc., but began evolving to Spark, Mesos, SOLR, Impala, although many of them were in the Apache world, they clearly had a blue background.
Their last big bet, Cloudera DataScientist Workbench, a module that the market needed for DataScientist environments were viable (although not completely 100%). And KUDU, a new Storage software for meeting the great demand of real time Analytics, which the old batch-oriented HDFS could not support.
Horton was born two years later, in 2011. Those who know about finance say that a mature market has two dominant players, and Horton understood which role it would play. Its strategy was very different, little proprietary SW, 100% aligned with the community, lower prices, and perhaps its best feature, greater flexibility for including and adapting to the community’s evolution. Cloudera always took more time to include changes, precisely because it had to integrate the changes with its extra management layer. Its great benefit was also its Achilles heel.
At first, Cloudera had the dominant position, the discussion on security arrived directly to the big banks and the last thing it wanted were mistakes in its new investments. But the community began to evolve safety and operation features, until almost reaching the same level as Cloudera. Although it wasn’t as solid and sturdy, the Horton bundle worked well and it was cheaper. Even though Horton bet on Storm, and not on Spark, at first they managed to react and recover the market. Despite not having impala, Hive evolved to be competitive for interactive query. In its latest version, it even removed Solr from the bundle, becoming closer to Cloudera. But they had a great hit, HDF (Hortonworks Data Flow). Instead of betting on analytics, it saw that the market demanded a tool to orchestrate, organize and accelerate the intake and real time analytics processes. In Cloudera, this is feasible, but by coding; and all of us who lived in the DWH era, before ETL tools, know what all this coding ends up. They created HDF, an independent solution created on Apache NIFI, to orchestrate the intakes, with an IDE visual, and with operation and monitoring features, Kafka, and SAM, a visual environment for developing processes on Storm. Being a separate module could even be used to ingest in Cloudera. So much so that its competitor had to react by announcing future integrations with StreamSets.
Instead of betting on analytics, it saw that the market demanded a tool to orchestrate, organize and accelerate the ingest processes.
At this point, it seemed to be a fair fight between two great vendors, with different bets and strategies, which consumed the market with almost no competition. Until the large Cloud vendors with their SaaS platforms appeared. Amazon, Google and Microsoft bet on Big Data SaaS components, equivalent, complementary and additional to those of Cloudera and Horton, and with a very powerful speech that didn’t need to be explained. Contract what you want to use, experience it, cancel it; that is, the cloud will transmit a very powerful message “with us, you can make mistakes without compromising your budget”. Curiously, IBM has not managed to be a Cloud Big Data player. They were wrong from the beginning with BigInsights and its proprietary Hadoop, that in the end it allied with Horton and bet on investing in the Spark community.
“With SaaS, you can make mistakes, without compromising your budget.”
The nature of analytical initiatives has a lot to do with experimenting, trying, being wrong, and trying again. Additionally, the response times and TTM of the HW providers are high, complicated, slow, and not very flexible. Doing cost simulations, the Cloud starts out much cheaper and there is a “break even” point at two or three years. But let’s face it, who in this day and age makes two or three year plans for a new initiative? How do you know what size of subscription to buy for a new project, in batch and real time, without knowing the case uses, sizing, or how it really works or the return it will have?
Exactly. Horton and Cloudera’s big competitors are not one another, but big players such as Cloud, Amazon, Microsoft and Google (remember where the papers came from that inspired Hadoop years ago) that have been stealing high percentages of Big Data platform market shares since the end of 2017 and the beginning of this year. Their dominant position was clearly at risk, and it’s possible that individually in this market, they will not be able to win.
And we have reached the point of the merger. Although we do not know the details, and what it really means Cloudera having 60% and Horton 40%. What we can analyze are the data in the press release:
- Establishes the next generation data platform leader with increased scale and resources to deliver the industry’s first enterprise data cloud, providing the ease of use and elasticity of the public cloud from the data center, to the Edge and everywhere in between.
The first message is that they are looking to create the enterprise leader for data cloud platforms, which would provide elasticity between the cloud, the DataCenter and the edge (IoT devices). This message aims at attacking the Achilles heel of Cloud solutions, the portability, the vendor locking. If you build on any of their SaaS modules, you cannot leave your products or your Cloud, you have an eternal vendor locking. But what would happen if the new company manages to provide a single platform offered in the Cloud as SaaS in a manageable mode, and the same code could be migrated directly On-Premises without cost?
What if the new company manages to provide a single platform, offered in the Cloud as SaaS in a manageable mode, and the same code could be migrated directly On-Premises without cost?
Kubernetes, the world of containers, is there. The discussion could be very powerful when proposing a hybrid platform that offers both benefits. On one hand, SaaS would allow you to start rapidly and make mistakes and, if the case use works, you will it have for three years. Simply move it to your DataCenter, since in the medium term it will be cheaper. You won’t “marry” any cloud, and although you use their new platform, which belongs to Apache in some way, there would always be theoretical ways out.
- Creates a superior unified platform and clear industry standard from the Edge to AI, substantially benefiting customers, partners and the community
- Expands market opportunity with complementary offerings, including Hortonworks DataFlow and Cloudera Data Science Workbench
We can consider the messages by combining them. Joining differential HDF modules for ingest and Edge capacity with NIFI, integration, etc. along with analytical capabilities, called AI, they achieve an end-to-end, solid and complete platform.
- Accelerates market development and fuels innovation in IoT, streaming, data warehouse, hybrid cloud, machine learning/AI
- The message here is that the solution is complete, solid, and functions for all data case uses required by companies, in a hybrid model that currently does not exist.
- Enhances partnerships with public cloud vendors and systems integrators
Joining them allows you as a company to negotiate with customers, partners, especially in the cloud, with more strength to control the community (if they let you, of course).
In the absence of validation by all the entities of the merger, we are facing a great earthquake in the world of Big Data. It seems that years ago there were certain approaches when Horton was not going through a good period, but clearly the balance of forces is now totally different. This is a movement that in hindsight and seeing the facts, seems logical, but perhaps one week ago, nobody had it in their options.
- It will be necessary to see how the community reacts and if the new company wants to have too much control, because it has been demonstrated that the open source world distrusts these controls.
- It will be necessary to see how clients react, because they always want to have several options for comparison, and perhaps this is the opportunity for others, like MapR or Stratio, to appear strongly in the panorama. We must also see the issue of price, as they are clearly different.
- We must also see how the Cloud vendors react, who were somehow ally-rivals until now, and they will possibly have to see how they will continue in the future.
What is certain is that the data market will evolve and the tools and platforms that companies will have at their disposal will be more complete and better able to continue strengthening company’s business processes, permeating each corner of the company with analytics.
Now more than ever: “Put Intelligence to Work”.