MapR has been granted a patent from the USPTO for a converged data architecture that brings together “open source, enterprise storage, NoSQL, and event streams” with enterprise-grade security and disaster recovery functionality. MapR’s converged data architecture supports open source APIs such as POSIX, NFS, LDAP, ODBC, REST, and Kerberos while enabling real-time analytics on data in motion and data at rest. The platform delivers the power of Hadoop and Spark in conjunction with read-write and update functionality that can produce analytics for mission-critical applications and computationally intensive workloads, at scale. The MapR Converged Data Platform empowers customers to avoid data siloes by running analytics on multiple workloads housed within one cluster. Meanwhile, the platform’s enterprise-grade reliability allows customers to ingest, process and analyze big data from a multitude of sources while enjoying the benefits of production-grade data protection and disaster recovery. The innovation of the platform consists in its ability to support storage and analytics from a multitude of data formats and acquisition modalities such as batch uploads as well as streaming data. Wednesday’s patent announcement affirms the innovation specific to the architecture of MapR’s converged big data infrastructure. Expect to hear more details about MapR’s Converged Data Platform as use cases proliferate and differentially illustrate the platform’s ability to support big data analytics in mission critical environments for data from relational databases, NoSQL, Hadoop and streaming data sources, alike.
MapR has declined the invitation to participate in the Open Data Platform (ODP) after careful consideration, as noted in a recent blog post by John Schroeder, the company’s CEO and co-founder. Schroeder claims that the Open Data Platform is redundant with the governance provided by the Apache Software Foundation, that it purports to “solve” Hadoop-related problems that do not require solving and that it fails to accurately define the core of the Open Data Platform as it relates to Hadoop. With respect to software governance, Schroeder notes that the Apache Software Foundation has done well to steward the development of Apache Hadoop as elaborated below:
The Apache Software Foundation has done a wonderful job governing Hadoop, resulting in the Hadoop standard in which applications are interoperable among Hadoop distributions. Apache governance is based on a meritocracy that doesn’t require payment to participate or for voting rights. The Apache community is vibrant and has resulted in Hadoop becoming ubiquitous in the market in only a few short years.
Here, Schroeder credits the Apache Software Foundation with creating a Hadoop ecosystem in which Hadoop-based applications interoperate with one another and wherein the governance structure is based on a meritocracy that does not mandate monetary contributions in order to garner voting rights. In addition, the blog post observes that whereas the Open Data Platform defines the core of Apache Hadoop as MapReduce, YARN, Ambari and HDFS, other frameworks such as “Spark and Mesos, are gaining market share” and stand to complicate ODP’s definition of the core of Hadoop. Meanwhile, Cloudera’s Chief Strategy Officer Mike Olson explained why Cloudera also declined to join the Open Data Platform by noting that Hadoop “won because it’s open source” and that the partnership between Pivotal and Hortonworks was “antithetical to the open source model and the Apache way.” Given that 75% of Hadoop implementations use either MapR or Cloudera, ODP looks set to face some serious challenges despite support from IBM, Pivotal and Hortonworks, although the precise impact of the schism over the Open Data Platform on the Hadoop community remains to be seen.
MapR recently announced that MediaHub Australia has deployed MapR to support its digital archive that serves 170+ broadcasters in Australia. MediaHub delivers digital content for broadcasters throughout Australia in conjunction with its strategic partner Contexti. Broadcasters provide MediaHub with segments of programs, live feeds and a schedule that outlines when the program in question should be delivered to its audiences. In addition to scheduled broadcasts, MediaHub offers streaming and video on demand services for a variety of devices. MediaHub’s digital archive automates the delivery of playout services for broadcasters and subsequently minimizes the need for manual intervention from archival specialists. MapR currently manages over 1 petabyte of content for the 170+ channels that it serves, although the size of its digital archive is expected to grow dramatically within the next two years. MapR’s Hadoop-based storage platform also provides an infrastructure that enables analytics on content consumption that help broadcasters make data-driven decisions about what content to air in the future and how to most effectively complement existing content. MediaHub’s usage of MapR illustrates a prominent use case for MapR, namely, the use of Hadoop for storing, delivering and running analytics on digital media. According to Simon Scott, Head of Technology at MediaHub, one of the key reasons MediaHub selected MapR as the big data platform for its digital archive concerned its ability to support commodity hardware.
On Monday, MapR Technologies announced the finalization of $110M in funding based on $80M in equity financing and $30M in debt financing. Google Capital led the equity funding in collaboration with Qualcomm Incorporated, Lightspeed Venture Partners, Mayfield Fund, NEA and Redpoint Ventures while MapR’s debt funding was financed by Silicon Valley Bank. The funding will be used to spearhead MapR’s explosive growth in the Hadoop distribution and analytics space as illustrated by a threefold increase in bookings in Q1 of 2014 as compared to 2013. Gene Frantz, General Partner at Google Capital, commented on Google Capital’s participation in the June 30 funding raise as follows:
MapR helps companies around the world deploy Hadoop rapidly and reliably, generating significant business results. We led this round of funding because we believe MapR has a great solution for enterprise customers, and they’ve built a strong and growing business.
Monday’s announcement comes soon after MapR’s news of its support for Apache Hadoop 2.x and YARN in addition to all five components of Apache Spark, the open source technology used for big data applications that specialize in interactive analytics, real-time analytics, machine learning and stream processing. The additional $110M in funding strongly positions MapR with respect to competitors Cloudera and Hortonworks given that Cloudera recently raised $900M and Hortonworks finalized $100M in funding. The news of MapR’s $110M funding also coincides with a recent statement from Hortonworks certifying the compatibility of YARN with Apache Spark as part of a larger announcement about the integration of Spark into the Hortonworks Data Platform (HDP) alongside its Hadoop security acquisition XA Secure and Apache Ambari for the provisioning and management of Hadoop clusters. With a fresh round of capital in the bank and backing from Google, the creators of MapReduce, MapR signals that the battle for Hadoop market share features a three horse race that is almost certain to intensify as vendors compete to streamline and simplify the operationalization of Big Data. In the meantime, Big Data-related venture capital continues to flow like water bursting out of a fire hydrant as the Big Data space tackles problems related to big data analytics, streaming big data and Hadoop security.
Hadoop vendor MapR recently reported record growth marked by a threefold increase in Q1 2014 bookings as compared to Q1 2013. MapR’s announcement of impressive growth comes in conjunction with recent news of its integration with the HP Vertica platform, support for Apache Hadoop 2.x and YARN, as well as support for all five components of Apache Spark in its Hadoop distribution. MapR additionally noted that it now claims customers from the financial services, networking/computers, software, online/web, ad media, telecom and market research verticals that have spent more than $1M on MapR products, in addition to one customer that has generated over a billion dollars of revenue that can be attributed to the usage of MapR’s technology. The announcement of MapR’s impressive Q1 growth is particularly notable given the hefty capital raises finalized by competitors Hortonworks and Cloudera on the order of $100M and $900M respectively within the last two months. As the battle for Hadoop market share shakes out, MapR will also need to contend with the implications of the nascent partnership between Cloudera and NoSQL market leader MongoDB.
On Thursday, MapR Technologies announced that it will be adding Apache Spark to its Hadoop distribution by means of a partnership with Databricks, the principal steward behind Apache Spark. Apache Spark facilitates the development of big data applications that specialize in interactive analytics, real-time analytics, machine learning and stream processing. In contrast to MapReduce, Apache Spark provides a greater range of data operators such as “mappers, reducers, joins, group-bys, and filters” that permit the modeling of more complex data flows than are available simply via map and reduce operations. Moreover, because Spark stores the results of data operators in memory, it enables low latency computations and increased efficiencies on iterative calculations that operate on in memory computational results. Spark is additionally known for its ability to automate the parallelization of jobs and tasks in ways that optimize performance and correspondingly relieve developers of the responsibility of sequencing the execution of jobs. Apache Spark can improve application performance by a factor of between 5 and 100 while its programming abstraction framework, which is based on distributed unchanging aggregations of data known as Resilient Distributed Datasets, reduces the amount of code required by 80%. MapR will support all five components of the Spark stack, namely, Shark, Spark Streaming, MLLib, GraphX and Spark R. The five components of Apache Spark illustrate the versatility of Apache Spark insofar as they can support applications that interface with streaming datasets, machine learning and graph-based applications, R and SQL. MapR’s decision to support the entire Spark stack diverges from its competitor Cloudera, which does not support Shark, the SQL on Hadoop component of Apache Spark that competes with Cloudera’s Impala product, as reported in GigaOM. All told, today’s announcement represents a small but significant attempt by MapR to reclaim the relevance of its Hadoop distribution in the wake of Cloudera’s $900M funding announcement and the $100M in funding recently secured by Hortonworks. That said, we should expect MapR to follow suit with a similar capital raise soon, even though its CMO Jack Norris claims that “with 500 paid customers the company is profitable and able to continue being successful from its current position.”