Big Data 2011: The Year in Review

If 2011 was the year of Cloud Computing, then 2012 will surely be the year of Big Data. Big Data has yet to arrive in the way cloud computing has, but the framework for its widespread deployment as a commodity emerged with style and unmistakable promise. For the first time, Hadoop and NoSQL gained currency not only within the developer community, but also amongst bloggers and analysts. More importantly, Big Data garnered for itself a certain status and meaning in the technology community even though few people asked about the meaning of big in “Big Data” in a landscape where the circle around the meaning of “big” with respect to “data” is constantly being redrawn. Even though yesterday’s “big” in Big Data morphed into today’s “small” as consumer personal storage transitions from gigabytes to terabytes, the term “Big Data” emerged as a term that everyone almost instantly understood. It was as if consumers and enterprises alike had been searching for years for a long lost term to describe the explosion of data as evinced by web searches, web content, Facebook and Twitter feeds, photographs, log files and miscellaneous structured and unstructured content. Having been speechless, lacking the vocabulary to find the term for the data explosion, the world suddenly embraced the term Big Data with passion.

Below are some of the highlights of 2011 with respect to big data:

March
•Teradata finalized a deal to acquire Big Data player Aster Data Systems for $263 million.

July
•Yahoo revealed plans to create Hortonworks, a spin-off dedicated to the commercialization of Apache Hadoop.

September
Teradata announced the Teradata Aster MapReduce Platform that combines SQL with MapReduce. The Teradata Aster MapReduce Platform empowers business analysts who know SQL to leverage the power of MapReduce without having to write scripted queries in Java, Python, Perl or C.

October
Oracle announced plans to launch a Big Data appliance featuring Apache Hadoop, Oracle NoSQL Database Enterprise Edition and an open source distribution of R. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL.
Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constituted the key difference between Microsoft and Oracle’s Big Data platforms.
IBM announced the release of IBM Infosphere BigInsights application for analyzing “Big Data.” The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform.

November
•Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices.
Cloudera announced a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture.
•Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform.
Informatica released the world’s first Hadoop parser. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation.
MarkLogic announced support for Hadoop, the Apache open source software framework for analyzing Big Data with the release of MarkLogic 5.
HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. Autonomy IDOL 10 features Autonomy’s capabilities for processing unstructured data, Vertica’s ability to rapidly process large-scale structured data sets, a NoSQL interface for loading and analyzing structured and unstructured data and solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals.

December
EMC announced the release of its Greenplum Unified Analytics Platform (UAP). The EMC Greenplum UAP contains the The EMC Greenplum platform for the analysis of structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.

The widespread adoption of Hadoop punctuated the Big Data story of the year so far. Hadoop featured in almost every Big Data story of the year, from Oracle to Microsoft to HP and EMC, while NoSQL came in a close second. Going into 2012, one of the key questions for the Big Data space concerns the ability of OpenStack to support Hadoop, NoSQL, MapReduce and other Big Data technologies. The other key question for Big Data hinges on the user friendliness of Big Data applications for business analysts in addition to programmers. EMC’s Greenplum Chorus, for example, democratizes access to its platform via a user interface that promotes collaboration amongst multiple constituents in an organization by transforming questions into structured queries. Similarly, the Teradata Aster MapReduce Platform allows business analysts to make use of its MapReduce technology by using SQL. That said, as Hadoop becomes more and more mainstream, the tech startup and data intensive spaces are likely to witness a greater number of data analysts trained in Apache Hadoop in conjunction with efforts by vendors to render Hadoop more accessible to programmers and non-programmers alike.

Cloud Computing 2011: The Year in Review

Whereas Time magazine selected “The Protester” as the Person of the Year, the award for Technology of the Year surely goes to Cloud Computing. 2011 marked the year that cloud computing emerged with force and gravitas onto the enterprise landscape. In the case of enterprise CIOs and IT leaders pondering the use of cloud computing infrastructures, the question of the day suddenly morphed from whether to engage the services of a cloud provider to when and how. Over the course of the year, cloud providers grew, emerged, acquired companies or were acquired, raised venture capital and announced products at a dizzying pace.

Within months, the cloud computing landscape transformed from the Amazon, Rackspace, Joyent, Terremark, Savvis show to something radically heterogeneous and complex. As more and more cloud technologies proliferated, analysts and technologists alike began to feel that the term “cloud computing” itself was losing its meaning. Meanwhile, news agencies and blogs struggled to keep up with the pace of innovation and deployment as startups and enterprises alike announced new, exciting and powerful cloud technologies day after day, week after week.

Below are some of the highlights of cloud computing in 2011, the year of the cloud:

• In January and February, Amazon Web Services busted out of the gate in 2011 with the launch of Elastic Beanstalk and CloudFormation. Elastic Beanstalk automates the process of deploying an application on Amazon’s virtual servers. CloudFormation automates the provisioning of virtual resources using templates that streamline the setup of an infrastructure for deployments of new instances.

• In May, Citrix announced plans to launch Project Olympus, an IaaS platform that allows customers to leverage the OpenStack operating system code to create public or private clouds. Project Olympus marked the first commercialization of OpenStack and thereby inaugurated a series of commercial OpenStack deployments throughout the remainder of 2011.

• In May, Red Hat launched IaaS platform CloudForms and PaaS platform OpenShift. CloudForms signaled genuine innovation in the IaaS space because of its Application Lifecycle Management capabilities and hybrid infrastructure flexibility. OpenShift, meanwhile, presented direct competition to Google Apps, Windows Azure and Amazon’s Elastic Beanstalk because of the breadth of its deployment platform and claims about increased portability.

• In June, Apple announced details of iCloud, a software framework that synchronizes files across multiple devices such as iPads, iPhones and personal computers, and pushes software updates to a constellation of devices in unison. In a keynote address at the Apple Worldwide Developer’s Conference (WWDC), Steve Jobs famously remarked that iCloud would “demote the PC and Mac to being a device,” because “we’re going to move the digital hub into the cloud.”

• In August, Amazon Web Services announced the launch of GovCloud, a private cloud for government agencies that complies with regulatory and compliance rules for the Federal government such as FISMA, FIPS 140-2 compliant end points, SAS-70, ISO 27001, and PCI DSS Level 1.

• In September, OpenStack, the open source cloud computing infrastructure that gained the backing of 144 companies including AMD, Canonical, Cisco, Dell, Intel and Citrix, released Diablo, its latest software version since the Cactus release in April 2011. Diablo, the first upgrade to OpenStack released on a 6 month schedule, upgrades its existing Nova, Object Storage and Glance components.

• Also in September, Joshua McKenty’s startup Piston Cloud Computing launched pentOS, one of the first enterprise grade versions of OpenStack for private clouds. With the launch of pentOS, Piston joined HP, Citrix Systems, Nebula and Dell in an elite group of vendors that commercialized the OpenStack platform in the latter half of 2011.

• In October, Rackspace revealed plans to turn over the leadership of OpenStack to an independent foundation. After founding OpenStack with the collaboration of NASA in the summer of 2010, Rackspace decided to hand over trademarks and copyrights to an independent foundation to ensure that OpenStack remains vendor neutral.

The meteoric rise of OpenStack constituted the cloud computing story of the year, by far. Commercial deployments of OpenStack by Piston Cloud Computing and other vendors underscored the emerging power of OpenStack as an increasingly competitive option to Infrastructure as a Service (IaaS) vendors such as Amazon Web Services and Rackspace. Moreover, OpenStack promised global cloud inter-operability and standards resulting from an open source organizational framework for which respect snowballed within the developer and enterprise community alike. Much of the story of cloud computing in 2012 will hinge on the ability of the OpenStack foundation to continue to promote the software framework’s adoption in the private sector and establish itself as a credible counterweight to first mover Amazon Web Services and other proprietary cloud vendors.

HP To Open Source webOS

HP’s announcement on Friday that it will open source its Linux-based mobile operating system webOS means that a competitor to Android and Apple iOS survives, even though it has yet to garner significant attention from the developer community. Meg Whitman’s decision to open source webOS empowers the open source developer network to enhance a product that was widely regarded as highly promising even though it failed to gain traction because of poor sales of HP smartphones and tablets. HP’s decision to open source webOS comes just months after its August announcement terminating sales of webOS products, including the HP TouchPad and webOS phones.

HP is reportedly considering an open source licensing structure through the Apache Software Foundation. In terms of governance, HP is leaning towards a structure similar to Red Hat’s Fedora Project, which would allow HP to retain tighter control over enterprise-grade Linux deployments and provide HP with final voting authority on webOS updates, thereby ensuring the product remains compatible with subsequent versions of HP’s hardware. Whitman does foresee a future for webOS powered tablets developed by HP, but noted such tablets may not materialize until 2013. HP purchased webOS as part of its $1.2 billion acquisition of Palm in April 2010.

HP Delivers Integrated Big Data Product To Compete With Oracle and Microsoft Big Data Appliances

At HP Discover in Vienna, HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. HP acquired Vertica in February and Autonomy in August. Vertica features a data warehousing and analytics platform known as the Vertica Analytics Platform that specializes in the high speed analysis of large-scale structured data sets. The Vertica Analytics Platform boasts real-time loading and querying that minimizes the time-lag between data loading and the delivery of business intelligence insights. Moreover, the Vertica Analytics Platform features analytic optimization tools that deliver maximum performance while minimizing the need for manual adjustments from users. Vertica also claims bi-directional connectors to Hadoop and Pig for the purpose of managing “big data” in structured form.

HP’s acquisition Autonomy complements Vertica by providing a platform for the processing of unstructured data such as video, audio, social media, email and web-related content and search results. Autonomy IDOL 10 features the following attributes:

• Autonomy’s capabilities for processing unstructured data
• Vertica’s ability to rapidly process large-scale structured data sets
• A NoSQL interface for loading and analyzing structured and unstructured data
• Solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals

HP’s Autonomy IDOL 10 competes with its own more specialized Vertica and Autonomy products, in addition to Oracle’s Hadoop and NoSQL Big Data Platform and Microsoft’s forthcoming Hadoop-based, Big Data appliance. Hadoop represents the common thread between all three Big Data products even as non-Hadoop based Big Data products such as HPCC from Lexis-Nexis gained publicity this week with the announcement of the availability of its ETL platform on the Amazon Web Services EC2 infrastructure. Autonomy IDOL 10 is available worldwide as of December 1, 2011.

HP Selects Ubuntu As Lead Host and Guest OS For OpenStack

One of the major announcements at this week’s OpenStack Conference in Boston was HP’s decision to use Ubuntu as the “lead host and guest operating system” for its OpenStack-based Public cloud. HP’s selection of Ubuntu marked a huge affirmation for Canonical, Ubuntu Linux’s parent company. As commercial grade OpenStack deployments proliferate, HP’s decision to choose Ubuntu positions Canonical strongly to gain traction in the emerging market for commercial grade, host and guest operating systems for OpenStack.

In a blog post, Canonical commented on HP’s selection of Ubuntu by noting: “Both companies share a common commitment to open source and both embrace the OpenStack community. With over 117 member companies the momentum behind OpenStack is truly game changing and promises to position it at the center of the next wave of computing.” Canonical joined the OpenStack project in February and in May, announced that that the 11.10 version of its Ubuntu Enterprise Cloud would be based on OpenStack instead of Eucalyptus.

HP’s Support of OpenStack Affirms Open Source, Inter-Operable Cloud Solutions

HP became the latest technology behemoth to support OpenStack, joining company with the likes of AMD, Canonical, Cisco, Dell, Intel and Citrix on July 27. Emil Sayegh, HP’s VP of Cloud Services, announced Hewlett Packard’s support for OpenStack in a blog post featuring the following highlights:

• Recognition of the importance of open source, inter-operable solutions for the cloud computing industry
• Active participation in the OpenStack community
• Sponsorship of the OpenStack Design Summit and OpenStack Conference in October 2011.
• Belief that collaboration with OpenStack marks an “opportunity to enable customers, partners and developers with unique infrastructure and development solutions across public, private and hybrid cloud environments.”

The last bullet point indicates that HP is likely to deliver hardware that comes pre-loaded with OpenStack software that can support customers seeking to build public, private and hybrid cloud computing deployments. HP’s affirmation of OpenStack arrived in conjunction with analogous but different affirmations from Nebula and Dell. Nebula announced its intent to launch an appliance pre-loaded with OpenStack software while Dell revealed details of an OpenStack Cloud Solution that enables customers to quickly deploy OpenStack based cloud solutions using a combination of hardware, software and professional services. OpenStack can now claim support from over 90 companies and more than 1200 contributors.