Concurrent and Hortonworks recently revealed a deepening of their strategic relationship whereby Cascading SDK will now be integrated into the Hortonworks Data Platform. Moreover, Hortonworks will certify, deliver and support Cascading, the application framework for developing Hadoop-based applications. A Java-based, open source alternative to MapReduce, Cascading provides developers with a framework for constructing complex, repeatable data processing tasks within a Hadoop cluster. Cascading features an abstraction platform which uses plumbing metaphors such as taps, pipes, data flows, cascades and sinks to allow developers to design, visualize and execute jobs and processes on Hadoop-based data without having to master the intricacies of MapReduce. Forthcoming releases of Cascading will support Apache Tez, an initiative that represents the next step after the addition of YARN to Hadoop that allows for Hadoop-based data to “meet demands for fast response times and extreme throughput at petabyte scale.” The partnership between Concurrent, the developer of Cascading, and Hortonworks, represents a huge coup for Concurrent given that the collaboration stands to rapidly accelerate Cascading’s adoption in enterprise environments. Hortonworks, meanwhile, benefits from packaging its Hadoop distribution with Cascading, one of the industry’s most well respected frameworks for Big data management and application development that boasts enterprise users such as Twitter, LinkedIn, eBay and Nokia. The obvious question now is whether Concurrent will finalize similar partnerships with other Hadoop vendors such as Cloudera and MapR or whether Concurrent’s partnership with Hortonworks enables the latter to improve its positioning in the battle for Hadoop market share, particularly in light of Cloudera’s remarkable $900 capital raise and partnership with Intel.
Concurrent Inc., the primary sponsor behind Cascading, today announces the release of Driven, an application performance management solution for Big Data applications. Driven enables developers to quickly identify and remediate application failures and performance issues specific to applications built using Hadoop. Available as a plug-in for the Cascading infrastructure, Driven solves a key problem in the Hadoop industry related to the management of Hadoop-based applications. The use of Driven allows developers to confirm the successful execution of application jobs and data processing algorithms, in addition to facilitating the optimization of application performance. Developers can monitor and trend application metrics such as runtime parallelization for both operational and R&D purposes. Moreover, because Driven is part of the Java-based Cascading framework for building analytics and data management applications on Apache Hadoop, Driven users can take advantage of Cascading’s collaboration functionality to communicate with Driven communities all over the world.
Chris Wensel, founder and CTO, Concurrent, Inc., remarked on the significance of Driven as follows:
Driven is a powerful step forward in delivering on the full promise of connecting business with Big Data. Gone are the days when developers must dig through log files for clues to slow performance or failures of their data processing applications. The release of Driven further enables enterprise users to develop data oriented applications on Apache Hadoop in a more collaborative, streamlined fashion. Driven is the key to unlock enterprises’ ability to drive differentiation through data. There’s a lot more to come – this is only the beginning.
Here, Wensel notes the way in which Driven responds to the opacity of Hadoop by providing developers with an alternative to sloughing through volumes of log files to understand the performance of their applications. Concurrent CEO Gary Nakamura elaborated on Wensel’s remarks by noting that “One of the big problems in Hadoop today is it’s just a black box,” and that Driven provides a way to expeditiously navigate to lines of code that are responsible for application failure. Because of its positioning as part of the Cascading infrastructure, Driven stands to significantly enhance the value of Cascading by providing developers with an extra layer of insight into application performance that complements Cascading’s indigenous framework for big data analytics and data management. Expect Driven to vault the status of Cascading within the Big Data industry even further and ultimately confirm its place as the go to application for Hadoop analytics, data and application management. Driven is currently available in public Beta whereas its commercial variant, Driven Enterprise, will be available in Q2 via an annual subscription.
Concurrent Announces Release Of Cascading 2.5 and Lingual 1.0 To Simplify Application Development Using Hadoop
Today, Concurrent elaborates on the release of Cascading 2.5, the open source framework for facilitating the development of applications on Apache Hadoop. Cascading 2.5 supports the recent released Hadoop 2.0 distribution including YARN and its other features. Cascading users that are interested in upgrading to Hadoop 2.0 can do so by means of Cascading 2.5. Similarly, applications that leverage the Scalding, Cascalog and PyCascading languages can migrate to Hadoop 2.0 as well by means of the Cascading 2.5 framework. The latest release of Cascading also features “complex join operations and optimizations to dynamically partition and store processed data more efficiently on HDFS,” according to the Concurrent’s press release. Finally, the release deepens its compatibility with other Hadoop distributions and Hadoop as a Service vendors such as Cloudera, Hortonworks, MapR, Intel, Altiscale, Qubole and Amazon EMR.
Cascading 2.5 represents one of the few products in either the commercial or open source ecosystem for simplifying the development of Hadoop applications while integrating with a rich and varied ecosystem of products as illustrated below:
The graphic shows how Cascading 2.5 supports all major Hadoop distributions in addition to an impressive list of development languages, database platforms and cloud platforms. In an interview with Cloud Computing Today, Concurrent CEO Gary Nakamura and CTO Chris Wensel noted the uniqueness of Cascading in the Big Data landscape, particularly given its iterative refinement in collaboration with the likes of Twitter, eBay and The Climate Corporation over a period of more than five years.
Today’s announcement regarding the general availability of Cascading 2.5 is accompanied by news of the general availability of Lingual, an ANSI-compliant SQL interface that allows developers to use SQL commands to query data stored in Hadoop clusters. Unlike Apache’s Hive project, Lingual’s ANSI-standard SQL interface enables developers to deploy authentic SQL commands as opposed to HIVE’s SQL-like syntax. Cascading Lingual also allows for the migration of legacy SQL workloads onto Hadoop clusters, the export of Hadoop data onto BI tools such as Jaspersoft, Pentaho and Talend, and the ability to leverage the power of Cascading in conjunction with SQL to orchestrate the execution of multiple SQL queries instead of several, discrete disparate queries. The Big Data space should expect more from Concurrent as it continues to build out tools for simplifying application development on Hadoop, particularly as more and more Hadoop developers come to terms with Cascading’s advantages over MapReduce.
Today, Concurrent Inc. announces the release of Pattern, an open source tool designed to enable developers to build machine-learning applications on Hadoop by leveraging the Predictive Model Markup Lanaguage (PMML), the standard export format for popular predictive modeling tools such as R, MicroStrategy and SAS. Data scientists can use Pattern to export applications to Hadoop clusters and thereby run them against massive data sets. Pattern simplifies the process of building predictive models that operate on Hadoop clusters and lowers the barrier to the adoption of Apache Hadoop for advanced data mining and modeling use cases.
An example of a use case for Pattern includes evaluating the efficacy of models for a “predictive marketing intelligence solution” as illustrated below by Antony Arokiasamy, Senior Software Architect at AgilOne:
Pattern facilitates AgilOne to deploy a variety of advanced machine-learning algorithms for our cloud-based predictive marketing intelligence solution. As a self-service SaaS offering, Pattern allows us to evaluate multiple models and push the clients’ best models into our high performance scoring system. The PMML interface allows our advanced clients to deploy custom models.
Here, Arokiasamy remarks on the way in which Pattern facilitates scoring of predictive models that enables the selection of one model amongst others. AgilOne uses Pattern to run multiple predictive models in parallel against large data sets and additionally illustrates the efficacy of Pattern’s operation on a Hadoop cluster deployed in a cloud-based environment.
Pattern runs on the popular Cascading framework for simplifying the deployment and management of Hadoop clusters that is used by the likes of Twitter, eBay, Etsy and Razorfish. A free, open source application, Pattern constitutes yet another pillar in Concurrent’s array of applications for streamlining the use of Apache Hadoop alongside Cascading and Lingual, the ANSI-standard interface that enables developers to leverage SQL to query Hadoop clusters without having to learn MapReduce. The release of Pattern consolidates the positioning of Concurrent as a pioneer in the Big Data management space given its thought leadership in designing applications that facilitate enterprise adoption of Hadoop. Enterprises can now use Concurrent’s Cascading framework to operate on Hadoop clusters using JAVA APIs, SQL and predictive models written in PMML compatible analytics applications.
Today, Concurrent Inc. announced the finalization of $4 million in Series A funding led by True Ventures and Rembrandt Venture Partners. The investment is intended to accelerate product development and expand the core team as part of the company’s larger project of simplifying application development within the Hadoop space. In conjunction with news of the funding, Concurrent also announced the appointment of Gary Nakamura as CEO. Nakamura comes to Concurrent with an illustrious tenure at Terracotta as Senior Vice President and General Manager and VP of World Wide Sales & Field Operations. Chris Wensel, Concurrent’s Founder and former CEO, will assume the role of CTO. Concurrent’s $4 million in Series A funding builds upon an initial seed investment of $900,000 in August 2011 that was similarly financed by True Ventures and Rembrandt Venture Partners. The Series A funding points to the success of Concurrent’s Cascading 2.1 platform for simplifying application development and management on Hadoop clusters.
Cascading delivers a framework that empowers developers to use Java languages to develop applications that operate on Hadoop instead of MapReduce. Used by the likes of Twitter, eBay and The Climate Corporation, Cascading joins forces with Concurrent’s platform Lingual, which provides a SQL interface for operating on Hadoop, in a concerted initiative to democratize developer access to Hadoop. In an interview with Cloud Computing Today, CEO Gary Nakamura noted that Concurrent intends to build on its initial momentum by delivering platforms that simplify and streamline application development on Hadoop as opposed to opting for the strategy of releasing a Hadoop distribution in the vein of Intel, EMC and others.
Concurrent already boasts partnerships with the likes of Amazon Web Services and Microsoft Azure for managing application development and management within Hadoop infrastructures. Its Cascading framework is compatible with all Apache Hadoop distributions and claims more than 75,000 downloads per month. Given Concurrent’s notable acccomplishments with modest funding to date, the company is likely to expand its footprint in the space dedicated to simplifying Hadoop application development as a result of its new funding and CEO Gary Nakamura’s deep experience with enterprise software. As Hadoop distributions proliferate, expect to see the demand for simplified Hadoop development and management products skyrocket within the enterprise. Enterprise concerns about data security and consistency of application lifecycle management are additionally likely to fuel the demand for Hadoop management platforms, particularly given the increasing convergence between Big Data and cloud-based infrastructures.
This week, Concurrent Inc. announced details of Lingual, a project designed to facilitate adoption of Apache Hadoop by empowering SQL users to leverage their SQL skills to create applications applications that run on Hadoop without training in MapReduce. Lingual presents developers with an ANSI-standard SQL interface that is compatible with all major Hadoop distributions. Using Lingual, developers can utilize SQL code to run against data stored within Hadoop clusters. Moreover, developers and data scientists can use Lingual to export data directly into BI tools. Developers can also use Lingual to create new Hadoop-based applications using the platform’s JDBC interface or Cascading APIs and languages, such as Scalding and Cascalog. Lingual runs on Concurrent’s Cascading platform for simplifying Hadoop development for Java developers. Cascading allows developers to use Java languages to create processes and applications within a Hadoop cluster without learning the intricacies of MapReduce. Lingual represents a fitting extension of Cascading’s mission to facilitate the development of applications that run against Hadoop clusters by expanding the required developer skill-set from Java to include SQL.
Etsy, Airbnb And The Climate Corporation Use Concurrent’s Cascading Big Data Application For Hadoop Programming
Concurrent Inc. has recently announced that enterprise customers such as Airbnb, Etsy and The Climate Corporation are using Concurrent’s Big Data management application Cascading in combination with Amazon Elastic MapReduce to manage Big Data processing in Hadoop. Cascading is a Big Data processing application that allows developers to use an API to construct data processing and analytic operations on Apache Hadoop clusters without leveraging advanced programming languages such as Pig and Hive. In comparison to Pig and Hive, Cascading enables programmers to write Hadoop-related code with comparable granularity and superior job orchestration and management capabilities. A Java application, Cascading can be used within both a private data center environment as well as a cloud based development ecosystem. Airbnb uses Cascading to “determine factors driving room bookings as well as user drop-off” whereas Etsy’s Cascading deployment “powers all A/B analysis, a variety of analytics and dashboards, behavioral inputs to our search index.”
Cascading’s use across of a number of industry verticals for Apache Hadoop programming and analytics points to a quiet revolution in the Big Data world marked by the increasing currency of programming frameworks that simplify and streamline the construction of data processing tasks within a Hadoop cluster. Speaking of the milestone constituted by Cascading’s usage by customers such as Etsy and Airbnb, Concurrent CEO Chris Wensel noted that Cascading “has been battle tested in rigorous production environments for many years. Developers rely on Cascading and the growing ecosystem of community sponsored projects to build complex data intensive applications that drive their business.” Expect more and more enterprises to leverage Cascading to simplify Hadoop-programming both within cloud environments and traditional data center infrastructures as the demand for big data analytics intensifies both in scope and business urgency.