Trifacta today announced that its Trifacta Data Transformation Platform has been certified for use with Hortonworks Data Platform 2.1 (HDP) by means of the Hortonworks Certified Technology Program. The certification ensures the compatibility of the Trifacta Data Transformation Platform with the latest Hortonworks Data Platform and thereby positions Trifacta’s technology to integrate with enterprise-grade deployments of the Hortonworks Hadoop distribution. Today’s announcement further validates the value of the Trifacta Data Transformation Platform as a technology platform that facilitates the derivation of actionable business intelligence from Hadoop by rendering it easier for analysts to visualize and engage with Hadoop-based data in conjunction with machine learning-based suggestions regarding data transformations and analytics. Trifacta’s partnership with Hortonworks builds upon recent news of its $25M Series C raise and the finalization of an analogous collaboration with Hadoop vendor Cloudera. In March, Trifacta announced a partnership with Cloudera that ensures the compatibility of Trifacta’s Data Transformation Platform with the Cloudera Hadoop ecosystem.
Now that Trifacta has inked deals to certify its Data Transformation Platform with the two Hadoop market share leaders, Cloudera and Hortonworks, the Big Data space should expect enterprise deployments of its platform to accelerate as Trifacta solidifies its branding as the de facto platform for the transformation, cleansing and guided exploration of Hadoop-based data. The platform’s value proposition consists in the reduction of time to insight with respect to actionable business intelligence derived from Hadoop-based data, its ability to enhance analyst productivity and to iteratively deliver more nuanced guidance regarding data transformations of interest by means of its machine learning-based technology. Expect Trifacta to continue expanding its range of strategic partnerships in the forthcoming months as it leverages its recent funding to position itself at the forefront of enterprise technologies regarding the effective operationalization of Big Data.
On Thursday, Hortonworks announced the acquisition of XA Secure, a company that delivers security solutions for Hadoop. XA Secure provides an integrated suite of Hadoop security solutions that addresses security concerns involving Administration, Authentication, Authorization, Audit and Data Protection as illustrated below:
Source: Hortonworks Blog and InformationWeek Hortonworks’ Security Buy
To date, Hortonworks has worked on security issues related to access control in HDFS, DBA-compatible grant or revoke functionality for Apache Hive and a single point of secure access to Hadoop clusters by means of Apache Knox. XA Secure provides an integrated security solution for Hadoop that centralizes security management, authenticates users, integrates access control policies, provides a comprehensive audit trail and protects data in motion and at rest. The XA Secure solution works with batch Hadoop solutions in addition to real-time Hadoop solutions and interactive SQL. True to its open-source roots, Hortonworks plans to enhance XA Secure and then return the results to the open source community by incubating a new project within the Apache Software Foundation in the latter half of 2014. Hortonworks will make XA Secure available to Hortonworks customers under the branding HDP Security in late June 2014. The decision by Hortonworks to purchase XA Secure with the intent of open-sourcing the technology once it has matured represents an astute play to carve out a leadership position in the rapidly evolving Hadoop security space by delivering one of the first turnkey Hadoop security solutions available today, Cloudera’s Apache Sentry admitted and notwithstanding. In the meantime, however, Hortonworks customers stand to enjoy enhanced security functionality as the XA Secure acquisition becomes progressively integrated into the Hortonworks Data Platform and experiences a slew of enhancements slotted into an aggressive product development timeline.
Today, Concurrent, Inc. announces the release of Cascading 3.0, the latest version of the popular open source framework for developing and managing Big Data applications. Widely recognized as the de facto framework for the development of Big Data applications on platforms such as Apache Hadoop, Cascading simplifies application development by means of an abstraction framework that facilitates the execution and orchestration of jobs and processes. Compatible with all major Hadoop distributions, Cascading sits squarely at the heart of the Big Data revolution by streamlining the operationalization of Big Data applications in conjunction with Driven, a commercial product from Concurrent that provides visibility regarding application performance within a Hadoop cluster.
Today’s announcement extends Cascading to platforms and computational frameworks such as local in-memory, Apache MapReduce and Apache Tez. Going forward, Concurrent plans for Cascading 3.0 to ship with support for Apache Spark, Apache Storm and other computational frameworks by means of its customizable query planner, which allows customers to extend the operation of Cascading to compatible computational fabrics as illustrated below:
The breakthrough represented by today’s announcement is that it renders Cascading extensible to a variety of computational frameworks and data fabrics and thereby expands the range of use cases and environments in which Cascading can be optimally used. Moreover, the customizable query planner featured in today’s release allows customers to configure their Cascading deployment to operate in conjunction with emerging technologies and data fabrics that can now be integrated into a Cascading deployment by means of the functionality represented in Cascading 3.0.
Used by companies such as Twitter, eBay, FourSquare, Etsy and The Climate Corporation, Cascading boasts over 150,000 applications a month, more than 7,000 deployments and 10% month over month growth in downloads. The release of Cascading 3.0 builds on Concurrent’s recent partnership with Hortonworks whereby Cascading will be integrated into the Hortonworks Data Platform and Hortonworks will certify and support the delivery of Cascading in conjunction with its Hadoop distribution. Concurrent, Inc. also recently revealed details of a strategic partnership with Databricks, the principal steward behind the Apache Spark project, that allows it to “operate over Spark…[the] next generation Big Data processing engine that supports batch, interactive and streaming workloads at scale.” In an interview with Cloud Computing Today, Concurrent CEO Gary Nakamura confirmed that Concurrent plans to negotiate partnerships analogous to the agreement with Hortonworks with other Hadoop distribution vendors in order to ensure that Cascading consolidates its positioning as the framework of choice for the development of Big Data applications. Overall, the release of Cascading 3.0 represents a critical product enhancement that positions Cascading to operate over a broader pasture of computational frameworks and consequently assert its relevance for Big Data application development in a variety of data and computational frameworks. More importantly, however, the product enhancement in Cascading 3.0, in conjunction with the partnership with Databricks regarding Apache Spark, suggests that Cascading is well on its way to becoming the universal framework of choice for developing and managing applications in a Big Data environment, particularly given its compatibility with a wide range of Hadoop distributions and data and computational frameworks.
Concurrent and Hortonworks recently revealed a deepening of their strategic relationship whereby Cascading SDK will now be integrated into the Hortonworks Data Platform. Moreover, Hortonworks will certify, deliver and support Cascading, the application framework for developing Hadoop-based applications. A Java-based, open source alternative to MapReduce, Cascading provides developers with a framework for constructing complex, repeatable data processing tasks within a Hadoop cluster. Cascading features an abstraction platform which uses plumbing metaphors such as taps, pipes, data flows, cascades and sinks to allow developers to design, visualize and execute jobs and processes on Hadoop-based data without having to master the intricacies of MapReduce. Forthcoming releases of Cascading will support Apache Tez, an initiative that represents the next step after the addition of YARN to Hadoop that allows for Hadoop-based data to “meet demands for fast response times and extreme throughput at petabyte scale.” The partnership between Concurrent, the developer of Cascading, and Hortonworks, represents a huge coup for Concurrent given that the collaboration stands to rapidly accelerate Cascading’s adoption in enterprise environments. Hortonworks, meanwhile, benefits from packaging its Hadoop distribution with Cascading, one of the industry’s most well respected frameworks for Big data management and application development that boasts enterprise users such as Twitter, LinkedIn, eBay and Nokia. The obvious question now is whether Concurrent will finalize similar partnerships with other Hadoop vendors such as Cloudera and MapR or whether Concurrent’s partnership with Hortonworks enables the latter to improve its positioning in the battle for Hadoop market share, particularly in light of Cloudera’s remarkable $900 capital raise and partnership with Intel.
On Tuesday, Hortonworks announced the general availability of version 2.0 of the Hortonworks Data Platform for Windows. Hortonworks Data Platform 2.0 for Windows is the first distribution of Apache Hadoop 2.0 certified for Windows Server 2008 R2 and Windows Server 2012. Today’s announcement means that YARN (Yet Another Resource Negotiator), a key feature of Hadoop 2.0, is now available to Windows-based development environments. With HDP 2.0, developers in Windows shops can take advantage of YARN’s transformation of Hadoop from an infrastructure for batch processing to batch and real-time data processing. Moreover, HDP 2.0 features the NameNode High Availability functionality automates failovers and ensures the availability of the full HDP stack. Hortonworks collaborated closely with Microsoft in order to ensure the HDP 2.0 release achieved production-grade status within Windows environments. The release of HDP 2.0 marks yet another milestone in the story of the democratization of Apache Hadoop, the Big Data platform that is being rendered increasingly available to wider circles of users by means of initiatives such as Stinger (Hortonworks), Lingual (Concurrent) and Impala (Cloudera) that allow users to access and manipulate data stored in a Hadoop cluster using SQL.
Digital music service Spotify recently announced that it will migrate its Hadoop cluster from Cloudera’s Hadoop distribution to the Hortonworks Data Platform because of the Hortonworks commitment to open source development and technologies. Spotify also noted that the migration was partly due to the impressive contribution made by Hortonworks to the Apache Hive project for querying Hadoop data. Spotify began its use of Hadoop on the Amazon Web Services EMR platform with a cluster sized at approximately 30 nodes. The company subsequently decided to bring its Hadoop cluster in house, starting with a 60 node cluster. Spotify’s Hadoop distribution is now sized at 690 nodes and stores data for its 24 million users and 6 million subscribers. Its 690 node Hadoop cluster is widely regarded as one of the largest implementations of Hadoop in Europe. In addition to providing Spotify with a production-grade Hadoop distribution, Hortonworks will perform bi-annual health assessments of its Hadoop infrastructure.