Cloudera recently acquired big data security vendor Gazzang to strengthen its security offerings for its Hadoop distribution and related offerings. Cloudera’s acquisition of Gazzang will provide “enterprise-grade data encryption and key management.” In addition, the Gazzang team will constitute the foundation of the Cloudera Center for Security Excellence dedicated to the development of comprehensive Hadoop security solutions. Cloudera’s acquisition of Gazzang comes weeks after the announcement of the acquisition of XA Secure by Hortonworks to obtain access to a comprehensive security solution for Hadoop that addresses issues such as user authentication, authorization and audit and control. That Cloudera and Hortonworks acquired dedicated Hadoop security companies in the space of a month illustrates the intensity of the need in the Big Data space to package proven Hadoop security technologies in conjunction with Hadoop deployments and third party tools for optimizing Hadoop analytics and data management. Cloudera, for example, actively contributes to the open source initiative Project Rhino that seeks to augment the data protection functionality of Hadoop and contribute the resulting code back to the Apache Software Foundation. The bottom line is that Hadoop security has suddenly emerged as an urgent vertical within the Big Data space that testifies to the increasing prevalence and scale of the deployment of Hadoop distributions in the enterprise.
Trifacta today announced that its Trifacta Data Transformation Platform has been certified for use with Hortonworks Data Platform 2.1 (HDP) by means of the Hortonworks Certified Technology Program. The certification ensures the compatibility of the Trifacta Data Transformation Platform with the latest Hortonworks Data Platform and thereby positions Trifacta’s technology to integrate with enterprise-grade deployments of the Hortonworks Hadoop distribution. Today’s announcement further validates the value of the Trifacta Data Transformation Platform as a technology platform that facilitates the derivation of actionable business intelligence from Hadoop by rendering it easier for analysts to visualize and engage with Hadoop-based data in conjunction with machine learning-based suggestions regarding data transformations and analytics. Trifacta’s partnership with Hortonworks builds upon recent news of its $25M Series C raise and the finalization of an analogous collaboration with Hadoop vendor Cloudera. In March, Trifacta announced a partnership with Cloudera that ensures the compatibility of Trifacta’s Data Transformation Platform with the Cloudera Hadoop ecosystem.
Now that Trifacta has inked deals to certify its Data Transformation Platform with the two Hadoop market share leaders, Cloudera and Hortonworks, the Big Data space should expect enterprise deployments of its platform to accelerate as Trifacta solidifies its branding as the de facto platform for the transformation, cleansing and guided exploration of Hadoop-based data. The platform’s value proposition consists in the reduction of time to insight with respect to actionable business intelligence derived from Hadoop-based data, its ability to enhance analyst productivity and to iteratively deliver more nuanced guidance regarding data transformations of interest by means of its machine learning-based technology. Expect Trifacta to continue expanding its range of strategic partnerships in the forthcoming months as it leverages its recent funding to position itself at the forefront of enterprise technologies regarding the effective operationalization of Big Data.
Trifacta, the data transformation company, today announced the finalization of $25M in Series C funding. The funding round was led by a new investor, Ignition Partners, with additional participation from existing investors Greylock Partners and Accel Partners. As a result of the investment, Frank Artale, Managing Director of Ignition Partners, will join the Trifacta board of directors. The Trifacta Data Transformation Platform enhances the productivity of data analysts and scientists by transforming Big Data into a structure that renders it easier to analyze, visualize and manipulate. The Trifacta platform’s predictive interaction technology allows users to visualize Big Data, interact with different data visualizations and take advantage of machine-learning based predictions regarding data transformations and analytics of interest. The platform aims to deliver transparency regarding the data in question, agility with respect to the user’s ability to interact with Big Data, predictive intelligence based on machine learning about the efficacy of user interactions with data and scalability marked by the ability to interact with large, heterogeneous datasets. As told to Cloud Computing Today by Trifacta CEO Joe Hellerstein, Trifacta customers can take advantage of the platform’s ability to cleanse and organize Big Data in conjunction with other enterprise software platforms such as SAS, for example. That said, Trifacta itself offers its own universe of tools for facilitating insights with respect to Big Data and, unlike many business intelligence or analytics platforms, is designed specifically for the purpose of transformation, data discovery and visualization of massive datasets. Today’s announcement about Trifacta’s Series C funding comes hot on the heels of a March 2014 partnership with Cloudera to jointly deliver the Trifacta Data Transformation platform in conjunction with Cloudera’s Hadoop distribution. To date, Trifacta has raised a total of approximately $45M in funding. Given partnerships such as Cloudera in hand, and $25M in Series C funding that comes roughly 6 months after its Series B capital raise of $12M, the industry should expect Trifacta’s traction amongst Big Data customers to skyrocket as news about its ability to transform Big Data into a usable form that accelerates the development of actionable business intelligence proliferates.
Cloudera and MongoDB recently announced a strategic partnership designed to allow customers to take advantage of Cloudera’s Hadoop distribution and MongoDB’s NoSQL platform. Details of the partnership remain scant although we do know that both companies are working on enhancing the current version of the MongoDB connector for Hadoop, which is certified to run on Cloudera Enterprise 5. The MongoDB Connector for Hadoop “is a plugin for Hadoop that provides the ability to use MongoDB as an input source and/or an output destination.” In other words, the MongoDB Connector for Hadoop enables Hadoop users to output data to MongoDB and conversely, to receive MongoDB within a Hadoop environment. Cloudera’s Chief Strategy Officer Mike Olsen commented on the partnership by noting:
Volume, variety and velocity all strain traditional operational databases, calling for a fundamental reconsideration of how companies store and process data. A Hadoop-powered enterprise data hub is an alternative center for data storage and analytics, and together with MongoDB, we empower companies to keep all of their data in full fidelity and at minimal cost, in order to power the data needs of all connected applications and IT infrastructure.
One direction for the partnership consists of the delivery of a turnkey Big Data solution with the analytic capabilities to mine both structured and unstructured data. From a product development standpoint, the obvious question concerns how much both vendors will invest in querying, analytic and predictive modeling capabilities that span both Hadoop and NoSQL. That said, the Big Data and cloud landscape has witnessed a proliferation of partnerships that lead to amalgamations of heterogeneous technology components within a larger institutional framework, but rarely result in genuine innovation and breakthrough technologies as noted in IBM’s Acquisition of Cloudant and The Walmart Effect In Tech. All this is to say that while the Cloudera-MongoDB partnership holds tremendous, even disruptive promise for the Big Data industry, partnerships represent a markedly prevalent fashion in contemporary tech based on the principles of collage and montage that sometimes result in innovation and disruptive technology platforms, but all too often deliver varied combinations of elemental technologies that disappoint in proportion to the capital and human talent brought together by the collaboration in question. Cloudera’s Mike Olsen will present further details regarding the partnership in his keynote address at MongoDB World in NYC on June 24.
In a stunning move that is likely to shape the Big Data space for years, Intel recently decided to partner with Cloudera to support its Hadoop distribution rather than enhancing Intel’s own Hadoop distribution. Cloudera will optimize its Hadoop distribution (CDH) to work with Intel’s hardware technology and Intel, conversely, will promote CDH as the Hadoop distribution of choice of enterprise Big Data analytics and the internet of things. Meanwhile, Intel will contribute insights from its own Hadoop distribution to Cloudera’s Hadoop distribution (CDH) and the resulting integration will be rendered available as part of Cloudera’s open source Hadoop initiatives. The partnership between Intel and Cloudera also featured an equity investment by Intel between $740M to $760M that translates into an 18% ownership stake in Cloudera. The $740M invested by Intel brings Cloudera’s recent funding raises to roughly $900M subsequent to its $160M funding raise in mid-March. Intel will join Cloudera’s board of directors and become “Cloudera’s largest strategic shareholder.” According to its press release, Intel’s investment in Cloudera represents Intel’s “single largest data center technology investment in its history.” Intel’s strong presence in countries such as India and China where Cloudera has thus far failed to gain traction means that the partnership stands to dramatically expand Cloudera’s global market share significantly. More importantly, however, Intel’s deep integration with the technologies in almost every datacenter worldwide render it a formidable ally for Cloudera to fulfill its aspiration of becoming the leading Hadoop distribution in the world in ways that promise to transform computing hardware as well as the Hadoop distributions that integrate with Intel’s Xeon technology.
Cloudera recently announced the general availability of Apache Spark for Cloudera Enterprise. First developed at UC Berkeley, Apache Spark is a parallel data processing framework that supplements Apache Hadoop by facilitating the development of big data applications related to machine learning, interactive analytics and real-time analytics. Spark allows users to write parallel sets of code in Java, Scala and Python that operate on Hadoop clusters with a speed up to 100 times faster than MapReduce. Moreover, applications developed in Spark tend to require 2 to 10 ten times less code than a corresponding MapReduce application. Spark Streaming, an add-on to Spark, enables analytics to be run on streaming datasets such that developers can derive analytic insights within seconds of data ingestion. Cloudera will offer enterprise-grade support for Spark in partnership with Databricks, the primary sponsor of the open source Apache Spark project, via its Data Hub Edition and Cloudera Enterprise Flex Edition. This release features support for Spark 0.9.0 with CDH 4. Support for Cloudera Enterprise 5, with CDH 5 and YARN, will be forthcoming in subsequent releases. Spark contributes to the Cloudera platform as illustrated by the highlighted blocks in orange below:
Image Source: “Apache Spark — Welcome To The CDH Family”
Digital music service Spotify recently announced that it will migrate its Hadoop cluster from Cloudera’s Hadoop distribution to the Hortonworks Data Platform because of the Hortonworks commitment to open source development and technologies. Spotify also noted that the migration was partly due to the impressive contribution made by Hortonworks to the Apache Hive project for querying Hadoop data. Spotify began its use of Hadoop on the Amazon Web Services EMR platform with a cluster sized at approximately 30 nodes. The company subsequently decided to bring its Hadoop cluster in house, starting with a 60 node cluster. Spotify’s Hadoop distribution is now sized at 690 nodes and stores data for its 24 million users and 6 million subscribers. Its 690 node Hadoop cluster is widely regarded as one of the largest implementations of Hadoop in Europe. In addition to providing Spotify with a production-grade Hadoop distribution, Hortonworks will perform bi-annual health assessments of its Hadoop infrastructure.