Cascading 2.0 Streamlines Hadoop-based Big Data Analysis And Development

This week, Concurrent Inc. announced the release of Cascading 2.0, an application framework that streamlines the process of creating Hadoop applications for Java developers. An open source alternative to MapReduce, the product provides an API and framework for constructing complex data processing tasks within a Hadoop cluster. Cascading features an abstraction platform wherein data captured from raw data sources is channeled into “pipes” that execute data analysis jobs and processes. In combination with data sources, “pipes” and data sources and data sinks are referred to as a “data flow.” Flows can converge into a “cascade” that can be managed and scheduled using Cascading 2.0’s scheduling system. Cascading 2.0 also allows developers to detach applications by running them in memory and testing them on smaller data sets.

The Cascading 2.0 API also empowers developers to:

• Model and explore structured and unstructured data
• Transfer applications from development to production environments
• Use familiar Java languages to develop applications and processes within a Hadoop cluster without learning MapReduce

Cascading is licensed under version 2.0 of an Apache Software License. Concurrent’s CEO Chris Wensel elaborated on Cascading 2.0’s value proposition for organizations building Hadoop-based applications as follows:

Building applications on Hadoop, despite its growing adoption in the enterprise, is notoriously difficult. We are driving the future of application development and management on Hadoop, by allowing enterprises to quickly extract meaningful information from large amounts of distributed data and better understand the business implications. We make it easy for developers to build powerful data processing applications for Hadoop, without requiring months spent learning about the intricacies of MapReduce.

As Wensel suggests, Cascading stands poised to play a pivotal role in the big data revolution by transforming the way in which developers create and manage Hadoop-based applications. With enterprises such as Etsy, Razorfish, Trulia and Twitter using Cascading for data analysis and discovery, Cascading has garnered an early foothold in the market for software that streamlines development. Expect enterprises to deploy software such as Cascading 2.0 as developers and data scientists gravitate toward simplified ways of managing data processing in a Hadoop cluster.