On Thursday, Hortonworks announced that Apache Spark is “YARN Ready” and compatible with the multiple workloads and additional CPU processing-demands specific to Spark applications. As a result of the compatibility of Apache Spark with YARN, Hadoop users can now use one Hadoop cluster with a single repository of data for a variety of purposes rather than having to segment workloads such that some data is dedicated to Apache Spark. More specifically, Hadoop users can now rest assured that YARN-based applications work collaboratively with applications that leverage Spark’s capabilities to facilitate real-time analytics, interactive analytics, machine learning and stream processing. Hortonworks introduced Apache Spark to the Hortonworks Data Platform as a technology preview download in May but today announces the integration of Spark with YARN, its recent acquisition, XA Secure, for authentication and data security purposes, as well as Ambari toward the larger goal of delivering an integrated, turnkey, enterprise-grade Hadoop platform. Thursday’s announcement by Hortonworks responds to similar statements by competitors MapR regarding the integration of Spark into its Hadoop distribution, and Cloudera’s announcement of its enterprise-grade support for Apache Spark.
The following graphic illustrating the integration of Spark into YARN originated from the Hortonworks blog post Making Apache Spark YARN Ready.