Oracle Partners With Cloudera For Newly Available Big Data Appliance

On Tuesday, Oracle declared the availability of the Big Data appliance that it introduced to the world at its October conference Oracle Open World. The appliance runs on Linux and features Cloudera’s version of Apache Hadoop (CDH), Cloudera Manager for managing the Hadoop distribution, the Oracle NoSQL database as well as an open source version of R, the statistical software package. Oracle’s partnership with Cloudera in delivering its Big Data appliance goes beyond the latter’s selection as a Hadoop distributor to include assistance with customer support. Oracle plans to deliver tier one customer support while Cloudera will provide assistance with tier two and tier three customer inquiries, including those beyond the domain of Hadoop.

Oracle will run its Big Data appliance on hardware featuring 864 GB main memory, 216 CPU cores, 648 TB of raw disk storage, 40 Gb/s InfiniBand connectivity and10 Gb/s Ethernet data center connectivity. Oracle also revealed details of four connectors to its appliance with the following functionality:

• Oracle Loader for Hadoop to load massive amounts of data into the appliance by using the MapReduce parallel processing technology.
• Oracle Data Integrator Application Adapter for Hadoop which provides a graphical interface that simplifies the creation of Hadoop MapReduce programs.
• Oracle Connector R which provides users of R streamlined access to the Hadoop Distributed File System (HDFS)
• Oracle Direct Connector for Hadoop Distributed File System (ODCH), which supports the integration of Oracle’s SQL database with its Hadoop Distributed File System.

Oracle’s announcement of the availability of its Big Data appliance comes as the battle for Big Data market share takes shape in a landscape dominated by the likes of Teradata, Microsoft, IBM, HP, EMC, Informatica, MarkLogic and Karmasphere. Oracle’s selection of Cloudera as its Hadoop distributor indicates that it intends to make a serious move into the world of Big Data. For one, the partnership with Cloudera gives Oracle increased access to Cloudera’s universe of customers. Secondly, the partnership enhances the credibility of Oracle’s Big Data offering given that Cloudera represents that most prominent distributor of Apache Hadoop in the U.S.

In October, Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Whereas Oracle chose Cloudera for Hadoop distribution, Microsoft partnered with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. In late November, HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, which features the ability to process large-scale structured data sets in addition to a NoSQL interface for loading and analyzing structured and unstructured data. In December, EMC released its Greenplum Unified Analytics Platform (UAP) marked by the ability to load structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and Chorus, a collaboration and productivity software tool. Bolstered by its partnership with Cloudera, Oracle is set to compete squarely with HP’s Autonomy IDOL 10, EMC’s Greenplum Chorus and IBM’s BigInsights until Microsoft’s appliance officially enters the Big Data doohyoo (土俵) qua sumo ring as well.

Big Data 2011: The Year in Review

If 2011 was the year of Cloud Computing, then 2012 will surely be the year of Big Data. Big Data has yet to arrive in the way cloud computing has, but the framework for its widespread deployment as a commodity emerged with style and unmistakable promise. For the first time, Hadoop and NoSQL gained currency not only within the developer community, but also amongst bloggers and analysts. More importantly, Big Data garnered for itself a certain status and meaning in the technology community even though few people asked about the meaning of big in “Big Data” in a landscape where the circle around the meaning of “big” with respect to “data” is constantly being redrawn. Even though yesterday’s “big” in Big Data morphed into today’s “small” as consumer personal storage transitions from gigabytes to terabytes, the term “Big Data” emerged as a term that everyone almost instantly understood. It was as if consumers and enterprises alike had been searching for years for a long lost term to describe the explosion of data as evinced by web searches, web content, Facebook and Twitter feeds, photographs, log files and miscellaneous structured and unstructured content. Having been speechless, lacking the vocabulary to find the term for the data explosion, the world suddenly embraced the term Big Data with passion.

Below are some of the highlights of 2011 with respect to big data:

March
•Teradata finalized a deal to acquire Big Data player Aster Data Systems for $263 million.

July
•Yahoo revealed plans to create Hortonworks, a spin-off dedicated to the commercialization of Apache Hadoop.

September
Teradata announced the Teradata Aster MapReduce Platform that combines SQL with MapReduce. The Teradata Aster MapReduce Platform empowers business analysts who know SQL to leverage the power of MapReduce without having to write scripted queries in Java, Python, Perl or C.

October
Oracle announced plans to launch a Big Data appliance featuring Apache Hadoop, Oracle NoSQL Database Enterprise Edition and an open source distribution of R. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL.
Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constituted the key difference between Microsoft and Oracle’s Big Data platforms.
IBM announced the release of IBM Infosphere BigInsights application for analyzing “Big Data.” The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform.

November
•Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices.
Cloudera announced a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture.
•Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform.
Informatica released the world’s first Hadoop parser. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation.
MarkLogic announced support for Hadoop, the Apache open source software framework for analyzing Big Data with the release of MarkLogic 5.
HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. Autonomy IDOL 10 features Autonomy’s capabilities for processing unstructured data, Vertica’s ability to rapidly process large-scale structured data sets, a NoSQL interface for loading and analyzing structured and unstructured data and solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals.

December
EMC announced the release of its Greenplum Unified Analytics Platform (UAP). The EMC Greenplum UAP contains the The EMC Greenplum platform for the analysis of structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.

The widespread adoption of Hadoop punctuated the Big Data story of the year so far. Hadoop featured in almost every Big Data story of the year, from Oracle to Microsoft to HP and EMC, while NoSQL came in a close second. Going into 2012, one of the key questions for the Big Data space concerns the ability of OpenStack to support Hadoop, NoSQL, MapReduce and other Big Data technologies. The other key question for Big Data hinges on the user friendliness of Big Data applications for business analysts in addition to programmers. EMC’s Greenplum Chorus, for example, democratizes access to its platform via a user interface that promotes collaboration amongst multiple constituents in an organization by transforming questions into structured queries. Similarly, the Teradata Aster MapReduce Platform allows business analysts to make use of its MapReduce technology by using SQL. That said, as Hadoop becomes more and more mainstream, the tech startup and data intensive spaces are likely to witness a greater number of data analysts trained in Apache Hadoop in conjunction with efforts by vendors to render Hadoop more accessible to programmers and non-programmers alike.

Big Data Goes Social With EMC’s Greenplum Unified Analytics Platform

EMC announced the release of its Greenplum Unified Analytics Platform (UAP) on Thursday. The Greenplum Unified Analytics Platform, a unified platform for processing structured and unstructured data, represents EMC’s latest move to consolidate its positioning in the Big Data space and compete squarely with Big Data offerings recently elaborated by Oracle, Microsoft and HP. EMC’s announcement comes scarcely two weeks after HP’s disclosure of the integration of its Autonomy and Vertica offerings within a unified Next Generation Information Platform called Autonomy IDOL 10 that specializes in the processing of structured and unstructured data. EMC’s Unified Analytics Platform features integration with Hadoop, the software framework for analyzing massive amounts of structured and unstructured data.

The EMC Greenplum UAP contains the following three components:

• The EMC Greenplum platform for the analysis of structured data.
• Enterprise-grade Hadoop for analyzing structured and unstructured data.
• EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.

EMC Greenplum Chorus recognizes the way in which Big Data scientists and analysts may be geographically dispersed across different enterprise locations, even as they need to collaborate to deliver enterprise-wide analysis that integrates structured and unstructured data from different data sets. GigaOM reports data exploration represents one of the most significant features of Chorus because it provides users with a Facebook-like user interface which enables data scientists to “launch a sandbox environment and start analyzing the data with just a few clicks.” According to EMC’s press release, Chorus facilitates collaboration amongst Big Data teams as follows:

EMC Greenplum Chorus opens data science teams up to an entirely new way to collaborate across dispersed geographies and with very large data sets. Through the Chorus interface, users get ready access to tools, data and supporting resources that enable enterprise-wide Big Data productivity. Frictionless and rapid collaboration across data science teams helps to ensure useful insights get back to the business in time to take the right actions, thus increasing agility and innovation.

Like IBM’s artificial intelligence supercomputer Watson, Chorus provides an interface for translating human questions into queries that run against petabytes of data. Chorus also allows users to share results from and refine approaches to data analysis. The social networking component of EMC’s Unified Analytics Platform ensures that diverse constituents can examine Big Data and iteratively refine their approach to data analysis as a collective. Chorus, the collaborative platform of UAP, profoundly differentiates EMC’s Big Data offering from competing products from HP, Oracle, Microsoft, Cloudera and Odiago.

EMC’s Unified Analytic Platform represents the convergence of the hottest trends in technology today: cloud computing, Big Data, virtualization and social networking. The question now is whether social networking and Big Data represents a fad that will pass, or an innovation that forever changes the landscape of products in the Big Data space.

HP Delivers Integrated Big Data Product To Compete With Oracle and Microsoft Big Data Appliances

At HP Discover in Vienna, HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. HP acquired Vertica in February and Autonomy in August. Vertica features a data warehousing and analytics platform known as the Vertica Analytics Platform that specializes in the high speed analysis of large-scale structured data sets. The Vertica Analytics Platform boasts real-time loading and querying that minimizes the time-lag between data loading and the delivery of business intelligence insights. Moreover, the Vertica Analytics Platform features analytic optimization tools that deliver maximum performance while minimizing the need for manual adjustments from users. Vertica also claims bi-directional connectors to Hadoop and Pig for the purpose of managing “big data” in structured form.

HP’s acquisition Autonomy complements Vertica by providing a platform for the processing of unstructured data such as video, audio, social media, email and web-related content and search results. Autonomy IDOL 10 features the following attributes:

• Autonomy’s capabilities for processing unstructured data
• Vertica’s ability to rapidly process large-scale structured data sets
• A NoSQL interface for loading and analyzing structured and unstructured data
• Solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals

HP’s Autonomy IDOL 10 competes with its own more specialized Vertica and Autonomy products, in addition to Oracle’s Hadoop and NoSQL Big Data Platform and Microsoft’s forthcoming Hadoop-based, Big Data appliance. Hadoop represents the common thread between all three Big Data products even as non-Hadoop based Big Data products such as HPCC from Lexis-Nexis gained publicity this week with the announcement of the availability of its ETL platform on the Amazon Web Services EC2 infrastructure. Autonomy IDOL 10 is available worldwide as of December 1, 2011.

Puppet Labs Secures $8.5 Million in Series C Funding With New Investors Cisco, Google Ventures and VMware

Puppet Labs announced the closure of a Series C funding round valued at $8.5 million. As a result of the funding raise, new investors Cisco, Google Ventures and VMware join existing investors Kleiner Perkins Caufield & Byers, True Ventures, and Radar Partners. Since its formation in 2005, the company has now raised a total of $15.75 million. The new round of funding is intended to support a market in which “demand for our products is outstripping our ability to satisfy it through organic growth alone,” according to Puppet’s CEO Luke Kanies. Kanies further noted that VMware, Google and Cisco represented ideal partners to accelerate the adoption of its IT automation and management software because of their deep relationships in the virtualization, cloud computing and IT vendor landscape.

Puppet Labs provides software that enables system administrators to effectively manage increasingly heterogeneous IT environments featuring legacy systems, private clouds, virtual machines and public clouds, all of which collectively serve the needs of multiple constituencies with varying application needs and role-based access privileges. Puppet Enterprise 2.0 delivers a visually intuitive graphical interface that enables system administrators to discover existing resources, benchmark resource utilization against a desired baseline, configure and deploy new resources to increase scale, and launch critical updates, all within a matter of seconds, without adding headcount. Puppet Enterprise 2.0 also features provisioning capability for Amazon EC2 and VMware instances as well as unauthorized access and change of setting tracking for compliance purposes.

The Series C funding raise marks the culmination of a momentous year for the company. Puppet Labs outgrew its open source roots in January with the launch of the first commercial edition of its product, Puppet Enterprise. In September, the company launched Puppet Enterprise 2.0 and now claims over 250 customers including Twitter, Zynga, Oracle/Sun, Match.com and Constant Contact.

Karim Faris, partner at Google Ventures, remarked on the promise of Puppet Labs and its recent investment as follows:

“Global companies need efficient solutions to manage their on-premise and cloud infrastructures. The Puppet Labs team has demonstrated the market traction and leadership to capitalize on this tremendous opportunity, and we’re looking forward to working with them to grow the business.”

The widespread commercial interest in Puppet Labs underscores the need for technology to manage increasingly complex IT environments that feature a combination of traditional and cloud based applications. The success of Puppet Labs over the past year suggests that, alongside cloud security and mobile device management, Puppet’s specialization in technology orchestration and management increasingly ranks as one of the auxiliary technologies likely to mushroom alongside the proliferation of virtualization and cloud computing in contemporary enterprise IT environments.

MarkLogic 5 Features Hadoop Connector For Enhanced Big Data Analytics

With the November 1 release of MarkLogic 5, MarkLogic consolidated its position in the Big Data space by announcing support for Hadoop, the Apache open source software framework for analyzing massive amounts of structured and unstructured data. For over a decade, MarkLogic has delivered analytics that enable actionable intelligence on data for organizations such as JP Morgan Chase, Lexis Nexis and the U.S. Army. MarkLogic 5 features a connector for Hadoop that integrates Hadoop’s capabilities for processing petabytes of data with MarkLogic’s proprietary applications for analyzing Big Data. In addition to a Hadoop connector, MarkLogic 5 includes enhanced capabilities to store, tag and analyze textual data and digital interactive media. The latest release of MarkLogic also features superior database replication capabilities and functionality for monitoring the performance of enterprise level Big Data installations.

The release of MarkLogic 5 testifies to the explosion of commercial interest in non-relational databases for storing and mining unstructured data. Microsoft’s Big Data platform plans to integrate Hadoop with Windows Server and Windows Azure, with connectors to SQL Server 2012. Oracle, meanwhile, recently revealed the basic components of its Big Data appliance that features Hadoop in addition to its Oracle NoSQL database.

IBM Releases Big Data Software On SmartCloud; Cognos for iPad

On Monday, IBM announced the release of the Infosphere BigInsights application for analyzing massive volumes of structured and unstructured data on its SmartCloud environment. The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform. Over the past month, Oracle and Microsoft have revealed plans to release cloud based Big Data applications that leverage Apache Hadoop, although in the case of both companies, plans for a live release are scheduled for 2012. BigInsights was previously accessed via the IBM Smart Business Development and Test Cloud environment that served as the testing ground for IBM’s SmartCloud which was deployed in April 2011.

IBM developed its Big Data analytics platform because organizations across a number of verticals are drowning in the sea of unstructured data such as Facebook and Twitter feeds, internet searches, log files and emails. IBM’s press release quantified the size of the emerging big data space as follows:

Organizations of all sizes are struggling to keep up with the rate and pace of big data and use it in a meaningful way to improve products, services, or the customer experience. Every day, people create the equivalent of 2.5 quintillion bytes of data from sensors, mobile devices, online transactions, and social networks; so much that 90 percent of the world’s data has been generated in the past two years. Every month people send one billion Tweets and post 30 billion messages on Facebook. Meanwhile, more than 1 trillion mobile devices are in use today and mobile commerce is expected to reach $31 billion by 2016.

IBM customers in the banking, insurance and communications verticals are currently using BigInsights to more effectively understand trends from web analytics, social media feeds, text messages and other forms of unstructured data. The availability of BigInsights via IBM’s SmartCloud is likely to accelerate enterprise adoption of the product given enterprise familiarity with the SmartCloud offering and recent publicity about its October 12 upgrade. The deployment of BigInsights on SmartCloud also gives IBM early traction in the Big Data space, with competition from Amazon Elastic MapReduce from Amazon Web Services, EMC, Teradata and HP. Granted, Oracle and Microsoft are set to join the Big Data party soon, but IBM should have at least six months to consolidate its market positioning ahead of its West coast based competitors. The enterprise version of BigInsights is priced at 60 cents per cluster per hour whereas the basic version is free.

Key features of enterprise level IBM Infosphere BigInsights include the following:

• Advanced text analytics to mine massive amounts of textual data
• A spreadsheet-like interface called BigSheets that allows users to create and deploy analytics without writing code
• Web-based management console
• Jaql, a query language for querying structured and unstructured data through an interface that resembles SQL

In tandem with the release of BigInsights on the SmartCloud, IBM announced the availability of IBM Cognos Mobile on the iPad and iPhone. iPad users can now leverage Cognos to run analytics on data and obtain access to a suite of visually rich dashboards. The combination of Cognos on the iPad and BigInsights clearly indicates that portability of access to data analytics constitutes a key component of IBM’s big data strategy. The big question now concerns how Oracle and Microsoft will differentiate themselves from BigInsights in their respective, forthcoming Big Data offerings.