Cloudera Announces Enterprise-Grade Support For Apache Spark

Cloudera recently announced the general availability of Apache Spark for Cloudera Enterprise. First developed at UC Berkeley, Apache Spark is a parallel data processing framework that supplements Apache Hadoop by facilitating the development of big data applications related to machine learning, interactive analytics and real-time analytics. Spark allows users to write parallel sets of code in Java, Scala and Python that operate on Hadoop clusters with a speed up to 100 times faster than MapReduce. Moreover, applications developed in Spark tend to require 2 to 10 ten times less code than a corresponding MapReduce application. Spark Streaming, an add-on to Spark, enables analytics to be run on streaming datasets such that developers can derive analytic insights within seconds of data ingestion. Cloudera will offer enterprise-grade support for Spark in partnership with Databricks, the primary sponsor of the open source Apache Spark project, via its Data Hub Edition and Cloudera Enterprise Flex Edition. This release features support for Spark 0.9.0 with CDH 4. Support for Cloudera Enterprise 5, with CDH 5 and YARN, will be forthcoming in subsequent releases. Spark contributes to the Cloudera platform as illustrated by the highlighted blocks in orange below:

Image Source: “Apache Spark — Welcome To The CDH Family”

Advertisements