This week, Concurrent Inc. announced details of Lingual, a project designed to facilitate adoption of Apache Hadoop by empowering SQL users to leverage their SQL skills to create applications applications that run on Hadoop without training in MapReduce. Lingual presents developers with an ANSI-standard SQL interface that is compatible with all major Hadoop distributions. Using Lingual, developers can utilize SQL code to run against data stored within Hadoop clusters. Moreover, developers and data scientists can use Lingual to export data directly into BI tools. Developers can also use Lingual to create new Hadoop-based applications using the platform’s JDBC interface or Cascading APIs and languages, such as Scalding and Cascalog. Lingual runs on Concurrent’s Cascading platform for simplifying Hadoop development for Java developers. Cascading allows developers to use Java languages to create processes and applications within a Hadoop cluster without learning the intricacies of MapReduce. Lingual represents a fitting extension of Cascading’s mission to facilitate the development of applications that run against Hadoop clusters by expanding the required developer skill-set from Java to include SQL.
Concurrent Inc. has recently announced that enterprise customers such as Airbnb, Etsy and The Climate Corporation are using Concurrent’s Big Data management application Cascading in combination with Amazon Elastic MapReduce to manage Big Data processing in Hadoop. Cascading is a Big Data processing application that allows developers to use an API to construct data processing and analytic operations on Apache Hadoop clusters without leveraging advanced programming languages such as Pig and Hive. In comparison to Pig and Hive, Cascading enables programmers to write Hadoop-related code with comparable granularity and superior job orchestration and management capabilities. A Java application, Cascading can be used within both a private data center environment as well as a cloud based development ecosystem. Airbnb uses Cascading to “determine factors driving room bookings as well as user drop-off” whereas Etsy’s Cascading deployment “powers all A/B analysis, a variety of analytics and dashboards, behavioral inputs to our search index.”
Cascading’s use across of a number of industry verticals for Apache Hadoop programming and analytics points to a quiet revolution in the Big Data world marked by the increasing currency of programming frameworks that simplify and streamline the construction of data processing tasks within a Hadoop cluster. Speaking of the milestone constituted by Cascading’s usage by customers such as Etsy and Airbnb, Concurrent CEO Chris Wensel noted that Cascading “has been battle tested in rigorous production environments for many years. Developers rely on Cascading and the growing ecosystem of community sponsored projects to build complex data intensive applications that drive their business.” Expect more and more enterprises to leverage Cascading to simplify Hadoop-programming both within cloud environments and traditional data center infrastructures as the demand for big data analytics intensifies both in scope and business urgency.
This week, Concurrent Inc. announced the release of Cascading 2.0, an application framework that streamlines the process of creating Hadoop applications for Java developers. An open source alternative to MapReduce, the product provides an API and framework for constructing complex data processing tasks within a Hadoop cluster. Cascading features an abstraction platform wherein data captured from raw data sources is channeled into “pipes” that execute data analysis jobs and processes. In combination with data sources, “pipes” and data sources and data sinks are referred to as a “data flow.” Flows can converge into a “cascade” that can be managed and scheduled using Cascading 2.0’s scheduling system. Cascading 2.0 also allows developers to detach applications by running them in memory and testing them on smaller data sets.
The Cascading 2.0 API also empowers developers to:
• Model and explore structured and unstructured data
• Transfer applications from development to production environments
• Use familiar Java languages to develop applications and processes within a Hadoop cluster without learning MapReduce
Cascading is licensed under version 2.0 of an Apache Software License. Concurrent’s CEO Chris Wensel elaborated on Cascading 2.0’s value proposition for organizations building Hadoop-based applications as follows:
Building applications on Hadoop, despite its growing adoption in the enterprise, is notoriously difficult. We are driving the future of application development and management on Hadoop, by allowing enterprises to quickly extract meaningful information from large amounts of distributed data and better understand the business implications. We make it easy for developers to build powerful data processing applications for Hadoop, without requiring months spent learning about the intricacies of MapReduce.
As Wensel suggests, Cascading stands poised to play a pivotal role in the big data revolution by transforming the way in which developers create and manage Hadoop-based applications. With enterprises such as Etsy, Razorfish, Trulia and Twitter using Cascading for data analysis and discovery, Cascading has garnered an early foothold in the market for software that streamlines development. Expect enterprises to deploy software such as Cascading 2.0 as developers and data scientists gravitate toward simplified ways of managing data processing in a Hadoop cluster.