MapR Announces Support For All Five Components Of Apache Spark In Its Hadoop Distribution

On Thursday, MapR Technologies announced that it will be adding Apache Spark to its Hadoop distribution by means of a partnership with Databricks, the principal steward behind Apache Spark. Apache Spark facilitates the development of big data applications that specialize in interactive analytics, real-time analytics, machine learning and stream processing. In contrast to MapReduce, Apache Spark provides a greater range of data operators such as “mappers, reducers, joins, group-bys, and filters” that permit the modeling of more complex data flows than are available simply via map and reduce operations. Moreover, because Spark stores the results of data operators in memory, it enables low latency computations and increased efficiencies on iterative calculations that operate on in memory computational results. Spark is additionally known for its ability to automate the parallelization of jobs and tasks in ways that optimize performance and correspondingly relieve developers of the responsibility of sequencing the execution of jobs. Apache Spark can improve application performance by a factor of between 5 and 100 while its programming abstraction framework, which is based on distributed unchanging aggregations of data known as Resilient Distributed Datasets, reduces the amount of code required by 80%. MapR will support all five components of the Spark stack, namely, Shark, Spark Streaming, MLLib, GraphX and Spark R. The five components of Apache Spark illustrate the versatility of Apache Spark insofar as they can support applications that interface with streaming datasets, machine learning and graph-based applications, R and SQL. MapR’s decision to support the entire Spark stack diverges from its competitor Cloudera, which does not support Shark, the SQL on Hadoop component of Apache Spark that competes with Cloudera’s Impala product, as reported in GigaOM. All told, today’s announcement represents a small but significant attempt by MapR to reclaim the relevance of its Hadoop distribution in the wake of Cloudera’s $900M funding announcement and the $100M in funding recently secured by Hortonworks. That said, we should expect MapR to follow suit with a similar capital raise soon, even though its CMO Jack Norris claims that “with 500 paid customers the company is profitable and able to continue being successful from its current position.”