On June 15, IBM announced significant backing for Apache Spark, the open source framework for Hadoop-based analytics. Apache Spark facilitates the development of Hadoop-based applications that specialize in interactive analytics, real-time analytics, machine learning and stream processing. IBM intends to integrate Spark into its analytics and commerce platforms as well as the IBM Watson Health Cloud and its IBM System ML machine learning technology. Moreover, Big Blue plans to offer Spark as a Service as part of its IBM BlueMix Platform as a Service, and commit 3500 developers to work on Spark-related projects. IBM also announced plans to open a Spark Technology Center in San Francisco to facilitate the development of innovative, data-centric, intelligent applications. IBM’s support of Apache Spark represents a huge coup for Spark and startups that rely heavily on its analytics framework to build analytics applications. That said, IBM’s backing of Spark also bolsters the industry of analytics frameworks built for Hadoop more generally such as the recently open sourced DataTorrent platform that offers a production-grade alternative to Apache Spark and Apache Storm. IBM’s support for Apache Spark comes in tandem with the announcement of the general availability of the Databricks cloud platform for Apache Spark that simplifies the application of Spark to Big Data use cases. Revealed roughly a year ago, the Databricks platform supports the automation of job processes and pipelines that leverage Spark as well as the use of the popular programming language R on Spark clusters. While IBM BlueMix’s Spark offering may well compete directly with the DataBricks cloud, the larger momentum for the open source Apache Spark project has swung hugely in Apache Spark’s direction and promises to continue doing so, assuming IBM can capitalize on its early investment in Spark integration into its array of platforms and use cases. IBM’s support of Spark also serves to differentiate its cloud platform from Amazon Web Services and Microsoft as the race for differentiation in the IaaS space intensifies.
Databricks, the company founded by the team that developed Apache Spark, recently announced the finalization of $33M in Series B funding in a round led by New Enterprise Associates with existing participation from Andreessen Horowitz. The company also revealed plans for commercializing Apache Spark by means of the newly launched Databricks Cloud that simplifies the data pipeline for data storage, ETL processing and thereupon running analytics and data visualizations on cloud-based Big Data. Powered by Apache Spark, the Databricks Cloud leverages Spark’s array of capabilities for operating on Big Data such as its ability to operate on streaming data, perform graph processing, offer SQL on Hadoop as well as its machine learning functionality. The platform aims to deliver a streamlined data pipeline for ingesting, analyzing and visualizing Hadoop-based data in a way that dispels the need to utilize a combination of heterogeneous technologies. Databricks will initially offer the Databricks Cloud on Amazon Web Services but plans to expand its availability to other clouds in subsequent months.