Today, Concurrent, Inc. announces the release of Cascading 3.0, the latest version of the popular open source framework for developing and managing Big Data applications. Widely recognized as the de facto framework for the development of Big Data applications on platforms such as Apache Hadoop, Cascading simplifies application development by means of an abstraction framework that facilitates the execution and orchestration of jobs and processes. Compatible with all major Hadoop distributions, Cascading sits squarely at the heart of the Big Data revolution by streamlining the operationalization of Big Data applications in conjunction with Driven, a commercial product from Concurrent that provides visibility regarding application performance within a Hadoop cluster.
Today’s announcement extends Cascading to platforms and computational frameworks such as local in-memory, Apache MapReduce and Apache Tez. Going forward, Concurrent plans for Cascading 3.0 to ship with support for Apache Spark, Apache Storm and other computational frameworks by means of its customizable query planner, which allows customers to extend the operation of Cascading to compatible computational fabrics as illustrated below:
The breakthrough represented by today’s announcement is that it renders Cascading extensible to a variety of computational frameworks and data fabrics and thereby expands the range of use cases and environments in which Cascading can be optimally used. Moreover, the customizable query planner featured in today’s release allows customers to configure their Cascading deployment to operate in conjunction with emerging technologies and data fabrics that can now be integrated into a Cascading deployment by means of the functionality represented in Cascading 3.0.
Used by companies such as Twitter, eBay, FourSquare, Etsy and The Climate Corporation, Cascading boasts over 150,000 applications a month, more than 7,000 deployments and 10% month over month growth in downloads. The release of Cascading 3.0 builds on Concurrent’s recent partnership with Hortonworks whereby Cascading will be integrated into the Hortonworks Data Platform and Hortonworks will certify and support the delivery of Cascading in conjunction with its Hadoop distribution. Concurrent, Inc. also recently revealed details of a strategic partnership with Databricks, the principal steward behind the Apache Spark project, that allows it to “operate over Spark…[the] next generation Big Data processing engine that supports batch, interactive and streaming workloads at scale.” In an interview with Cloud Computing Today, Concurrent CEO Gary Nakamura confirmed that Concurrent plans to negotiate partnerships analogous to the agreement with Hortonworks with other Hadoop distribution vendors in order to ensure that Cascading consolidates its positioning as the framework of choice for the development of Big Data applications. Overall, the release of Cascading 3.0 represents a critical product enhancement that positions Cascading to operate over a broader pasture of computational frameworks and consequently assert its relevance for Big Data application development in a variety of data and computational frameworks. More importantly, however, the product enhancement in Cascading 3.0, in conjunction with the partnership with Databricks regarding Apache Spark, suggests that Cascading is well on its way to becoming the universal framework of choice for developing and managing applications in a Big Data environment, particularly given its compatibility with a wide range of Hadoop distributions and data and computational frameworks.
Not to be outdone by the slew of product and price announcements from Google, Amazon Web Services and Microsoft over the past week, EMC-VMware spinoff Pivotal announced a new product offering branded the Pivotal Big Data Suite on Wednesday. The platform delivers Pivotal Greenplum Database, Pivotal GemFire, Pivotal SQLFire, Pivotal GemFire XD and Pivotal HAWQ, in addition to unlimited use of Pivotal’s Hadoop distribution Pivotal HD. Because the Pivotal Big Data Suite is priced on the basis of an annual subscription for all software and services, in addition to per core pricing for computing resources, customers need not fear additional fees related to software licensing or customer support over and beyond the subscription price. Moreover, customers essentially have access to a commercial-grade Hadoop distribution for free as part of the subscription price. Pivotal compares the Big Data Suite to a “swiss army knife for Big Data” that enables customers to “use whatever tool is right for your problem, for the same price.” Customers have access to products such as Greenplum’s massively parallel processing (MPP) architecture-based data warehouse, GemFire XD’s in-memory distributed Big data store for real-time analytics with a low latency SQL interface and HAWQ’s SQL-querying ability for Hadoop. Taken together, the Pivotal Big Data Suite edges towards the realization of Pivotal One, an integrated solution that performs Big Data management and analytics for ecosystems of applications, real-time data feeds and devices that can serve the data needs of the internet of things, amongst other use cases. More importantly, the Pivotal Big Data Suite represents the most systematic attempt to productize Big Data solutions in the industry at large, even if it is composed of an assemblage of heterogeneous products under one roof. The combination of access to a commercial grade Hadoop distribution (Pivotal HD), a data warehouse designed to store petabytes of data (Pivotal Greenplum) and closed loop real-time analytics solutions (Pivotal GemFire XD) within a unified product offering available via an annual subscription and per core pricing constitutes an offer not easy to refuse for anyone seriously interested in exploring the capabilities of Big Data. The bottom line is that Pivotal continues to push the envelope with respect to Big Data technologies although it now stands to face the challenge posed by cash flush Cloudera, which recently finalized $900M in funding and a strategic and financial partnership with Intel.
Concurrent Inc., the primary sponsor behind Cascading, today announces the release of Driven, an application performance management solution for Big Data applications. Driven enables developers to quickly identify and remediate application failures and performance issues specific to applications built using Hadoop. Available as a plug-in for the Cascading infrastructure, Driven solves a key problem in the Hadoop industry related to the management of Hadoop-based applications. The use of Driven allows developers to confirm the successful execution of application jobs and data processing algorithms, in addition to facilitating the optimization of application performance. Developers can monitor and trend application metrics such as runtime parallelization for both operational and R&D purposes. Moreover, because Driven is part of the Java-based Cascading framework for building analytics and data management applications on Apache Hadoop, Driven users can take advantage of Cascading’s collaboration functionality to communicate with Driven communities all over the world.
Chris Wensel, founder and CTO, Concurrent, Inc., remarked on the significance of Driven as follows:
Driven is a powerful step forward in delivering on the full promise of connecting business with Big Data. Gone are the days when developers must dig through log files for clues to slow performance or failures of their data processing applications. The release of Driven further enables enterprise users to develop data oriented applications on Apache Hadoop in a more collaborative, streamlined fashion. Driven is the key to unlock enterprises’ ability to drive differentiation through data. There’s a lot more to come – this is only the beginning.
Here, Wensel notes the way in which Driven responds to the opacity of Hadoop by providing developers with an alternative to sloughing through volumes of log files to understand the performance of their applications. Concurrent CEO Gary Nakamura elaborated on Wensel’s remarks by noting that “One of the big problems in Hadoop today is it’s just a black box,” and that Driven provides a way to expeditiously navigate to lines of code that are responsible for application failure. Because of its positioning as part of the Cascading infrastructure, Driven stands to significantly enhance the value of Cascading by providing developers with an extra layer of insight into application performance that complements Cascading’s indigenous framework for big data analytics and data management. Expect Driven to vault the status of Cascading within the Big Data industry even further and ultimately confirm its place as the go to application for Hadoop analytics, data and application management. Driven is currently available in public Beta whereas its commercial variant, Driven Enterprise, will be available in Q2 via an annual subscription.
On Tuesday, Oracle announced plans to acquire Big Data player Endeca just weeks after unveiling its Big Data appliance featuring Apache Hadoop and an Oracle NoSQL Database. Endeca’s proprietary MDEX technology powers two products: (1) Endeca InFront; and (2) Endeca Latitude. Endeca InFront enables users to understand customer trends and histories by examining online pages viewed, search terms and conversion rates. Endeca Latitude delivers a business intelligence platform for running analytics on structured and unstructured data.
According to Endeca’s website, Endeca Latitude claims the following differentiators from traditional BI solutions:
• No Data Left Behind:
Endeca Latitude incorporates a range of structured and unstructured data including unstructured data from web searches and Facebook and Twitter feeds in addition to traditional, relationally structured data.
• Consumer Ease of Use
Whereas traditional BI is based around reports and dashboards, Latitude’s analytics are delivered through interactive, web-based applications that provide a greater range of user drill-down and customization options.
• Agile Delivery
Endeca Latitude claims faster customization of its product to enterprise requriements than that of its competitors Autonomy and Attivio. Moreover, Endeca Latitude allows for iterative refinement of its analytics through its interactive application that enables enhanced collaboration between technology and business stakeholders.
One of the distinctive features of Endeca’s MDEX analytics engine is its lack of a data schema. Instead of a predefined data model, MDEX leverages a Faceted Data Model in which the schema emerges and morphs in relation to the characteristics of the data. Endeca InFront will be integrated with the Oracle ATG commerce engine to deliver analytics that improve online customer experience and conversion rates. Endeca Latitude will take its place alongside Oracle’s suite of BI tools and its forthcoming Big Data Appliance to analyze massive amounts of structured and unstructured data.
According to a June press release, Endeca “was used to power one of the largest eDiscovery clusters in the world exceeding 20 billion objects for interactive discovery – comparable in size to leading web search indexes of a few years ago.” The company’s customers include IBM, IEEE, Toyota, Ford, Walmart, The Home Depot and The U.S. Department of Justice out of a total of 600. Oracle did not disclose the terms of the acquisition of the Cambridge, Massachusetts based company although GigaOm reports that Endeca took in $65 million in venture capital over the course of four capital raises. The acquisition is expected to be completed by the end of 2011.