HPCC Announces Availability of ETL Cluster On Amazon Web Services

HPCC (High Performance Computing Cluster) systems from Lexis-Nexis announced that its Thor Data Refinery Cluster, an apparatus designed for big data processing, is available to enterprises on the cloud via the Amazon Web Services platform. The availability of the Thor Data Refinery Cluster on the Amazon Web Services platform renders it easier for developers to evaluate the efficacy of HPCC as an alternative to Hadoop and legacy systems. The Thor Data Refinery Cluster has the capability to load, transform, link and index massive amounts of data by leveraging HPCC’s parallel processing capability dispersed across different nodes. Developers may elect to use HPCC’s Thor Data Refinery Cluster on AWS for proof of concept or trial purposes while enjoying the convenience of not having to deploy any hardware to harness the power of the extract, transform and load component of the HPCC supercomputer.

HPCC is an open source Big Data technology platform that specializes in the processing of massive amounts of structured and unstructured data. Branded as a Hadoop alternative, HPCC’s technology platform is composed of (1) Thor Data Refinery Cluster, its extract, transform and load component; (2) Roxie, its query engine; and (3) ECL, the Enterprise Control Language responsible for the manipulation of Big Data across both the Thor and Roxie clusters. Only the Thor Data Refinery Cluster is available on AWS at present.

HP Delivers Integrated Big Data Product To Compete With Oracle and Microsoft Big Data Appliances

At HP Discover in Vienna, HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. HP acquired Vertica in February and Autonomy in August. Vertica features a data warehousing and analytics platform known as the Vertica Analytics Platform that specializes in the high speed analysis of large-scale structured data sets. The Vertica Analytics Platform boasts real-time loading and querying that minimizes the time-lag between data loading and the delivery of business intelligence insights. Moreover, the Vertica Analytics Platform features analytic optimization tools that deliver maximum performance while minimizing the need for manual adjustments from users. Vertica also claims bi-directional connectors to Hadoop and Pig for the purpose of managing “big data” in structured form.

HP’s acquisition Autonomy complements Vertica by providing a platform for the processing of unstructured data such as video, audio, social media, email and web-related content and search results. Autonomy IDOL 10 features the following attributes:

• Autonomy’s capabilities for processing unstructured data
• Vertica’s ability to rapidly process large-scale structured data sets
• A NoSQL interface for loading and analyzing structured and unstructured data
• Solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals

HP’s Autonomy IDOL 10 competes with its own more specialized Vertica and Autonomy products, in addition to Oracle’s Hadoop and NoSQL Big Data Platform and Microsoft’s forthcoming Hadoop-based, Big Data appliance. Hadoop represents the common thread between all three Big Data products even as non-Hadoop based Big Data products such as HPCC from Lexis-Nexis gained publicity this week with the announcement of the availability of its ETL platform on the Amazon Web Services EC2 infrastructure. Autonomy IDOL 10 is available worldwide as of December 1, 2011.

Karmasphere Partners To Offer Big Data Analytics on Apache Hadoop-based Hortonworks Data Platform

Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. Karmasphere’s partnership with Hortonworks is set to further stoke the embers of the emerging battle between Cloudera and Hortonworks for control of market share in the Hadoop distribution space. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform. As a result of the Hortonworks collaboration, Karmasphere will receive technical support, training and certification on Apache Hadoop deployment. Karmasphere’s partnership with Hortonworks represents its second major business partnership announcement this month. On November 1, the company announced a relationship with Amazon Web Services whereby its Big Data analytics would be available through the Amazon Elastic MapReduce service. Karmasphere’s partnership with Amazon Web Services to deploy its Big Data analytics applications on Amazon Elastic MapReduce allows enterprises to investigate a pay as you go solution for Big Data intelligence without the commitment of a long term subscription or initial capital investment in technical infrastructure.

Cloudera Raises $40 Million in Series D Funding; Partners with NetApp for Hadoop Distribution

On Monday, Cloudera continued its aggressive efforts to expand its distribution channels by announcing a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture. NetApp Open Solution for Hadoop is intended to grant enterprises seeking to implement Hadoop enhanced ease of deployment, improved scalability, superior performance and reduced costs. Part of the reduced enterprise costs enabled by the NetApp Open Solution for Hadoop derives from NetApp’s state of the art backup and replication capabilities that minimize downtime in the event of disk failure.

Speaking of the NetApp Open Solution, Rich Clifton, senior vice president and general manager, NetApp Technology Enablement and Solutions Organization, remarked:

“Customers are looking for business advantages from the wealth of their unstructured data. Today, it’s like finding a needle in a haystack. NetApp Open Solution for Hadoop will help customers get answers fast and process more, as well as provide the reliability and performance that our customers have come to expect from NetApp.”

GigaOm reports that one of the key attributes of the NetApp Open Solution is its separation of the compute and storage layers of a Cloudera Hadoop installation. Separating the compute and storage layers enables enhanced performance, scalability and reduced downtime in the event of the failure of a disk within either the compute or storage layer. Cloudera’s partnership with NetApp comes roughly three weeks after its announcement of a reseller deal with SGI, whereby SGI will distribute Cloudera’s Apache Hadoop (CDH) alongside its rackable servers and provide level 1 technical support, while Cloudera will provide level 2 and level 3 technical support. Both deals seek to consolidate Cloudera’s market share within the Hadoop distribution space in relation to competitors such as MapR, its partner EMC and Yahoo-spinoff Hortonworks.

In conjunction with the deal with NetApps, Cloudera announced that it had successfully completed a Series D funding raise worth $40 million, led by Frank Artale of Ignition Partners with the support of existing investors Accel Partners, Greylock Partners, Meritech Capital Partners and In-Q-Tel. The latest funding round takes the total of Cloudera’s financing to $76 million and supports Cloudera’s explosive growth in the Big Data space with particular emphasis on marketing, sales operations and strategic business development. Cloudera brands itself as the first company to deliver enterprise grade deployments of Apache Hadoop, the disruptive technology framework for analyzing massive amounts of structured and unstructured data. Apache Hadoop is used by enterprises such as eBay, Yahoo, Facebook, LinkedIn, eHarmony and Twitter to make strategic business decisions on the basis of large-scale structured and unstructured data sets. Cloudera’s announcement of its Series D funding and partnership with NetApp came on the eve of Hadoop World 2011, the world’s largest conference of Hadoop practitioners.

Cloudera Founder Launches Odiago and Big Data Product WibiData

This week, Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. Bisciglia launched WibiData with the backing of Google Chairman Eric Schmidt, Cloudera CEO Mike Olson, and SV Angel, the Silicon Valley-based angel fund. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices. WibiData leverages an Hbase and Hadoop technology platform that features the following attributes: (1) All data specific to a single user/machine/mobile device is organized within one Hbase row; (2) “Produce,” an analytic operator that functions on individual rows. Produce maps data from individual rows into interactive user applications. Produce also performs analytic operations such as classification and weightage of different rows in conjunction with an analytic rules engine; (3) “Gather”, an analytic operator that operates on all rows combined.

WibiData’s “Produce” and “Gather” components operate within a single table database structure in which the schema can dynamically evolve over time. Whereas most relational databases hold a single value in a cell, WibiData’s non-relational database structure allows for an entire table to be stored within a cell. Moreover, WibiData features fewer data manipulation language capabilities for retrieving, updating, inserting and deleting data than SQL. Curt Monash provides a terrific technical overview of WibiData in his blog DBMS2. For more about the company’s founders, see TechCrunch.

Informatica Releases World’s First Hadoop Parser

Informatica released the world’s first Hadoop parser on Wednesday in a move that boldly signalled its entry into the hotly contested Big Data analytics space. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation. HParser enables the transformation of textual data, Facebook and Twitter feeds, web logs, emails, log files and digital interactive media into a structured or semi-structured schema that allows businesses to more effectively mine the data for actionable business intelligence purposes.

Key features of HParser include the following:

• A visual, integrated development environment (IDE) that streamlines development via a graphical interface.
• Support for a wide range of data formats including XML, JSON, HL7, HIPAA, ASN.1 and market data.
• Ability to parse proprietary machine generated log files.
• Use of the parallelism of MapReduce to optimize parsing performance across massive structured and unstructured data sets.

Informatica’s HParser is available in a both a free and commercial edition. The free, community edition can parse log files, Omniture Web analytics data, XML and JSON. The commercial edition additionally supports HL7, HIPAA, SWIFT, X12, NACHA , ASN.1, Bloomberg, PDF, XLS or Microsoft Word formats. Informatica’s HParser builds upon the company’s June 2011 deployment of Informatica 9.1 for Big Data, which featured “connectivity to big transaction data from traditional transaction databases, such as Oracle and IBM DB2, to the latest optimized for purpose analytic databases, such as EMC Greenplum, Teradata, Teradata Aster Data, HP Vertica and IBM Netezza,” in addition to Hadoop.