Top 3 Cloud Computing Market Trends for 2011

2011 has been an extraordinary year for cloud computing so far. Amazon Web Services (AWS) set the pace with an aggressive roll-out of products such as Elastic Beanstalk, CloudFormation, Amazon Cloud Player and Amazon Cloud Drive. Just when AWS seemed poised to consolidate its first mover advantage with respect to cloud computing market share, the landscape exploded with a veritable feast of product offerings, business partnerships and acquisitions. Every month another Fortune 500 IT or telecommunications company throws its hat into the cloud computing ring: Dell’s vStart, Dell’s recent partnership with SAP, IBM’s SmartCloud, Apple’s iCloud and HP’s BladeSystem Matrix mark just some of the big names and brands that have entered the cloud computing dohyo, or sumo circle. The cast of new actors has rendered the cloud computing space painfully difficult for analysts to quantify for the purpose of understanding relative market share and growth within the industry. But within this bewildering sea of change, three industry trends have emerged that deserve attention:

1. Outages across the industry signal demand outweighs supply
Demand for cloud computing services has begun to outstrip supply to the point where vendor processes for guaranteeing system uptime have become increasingly challenged. The Amazon Web Services outage of 2011 was the most glaring example of a lack of effective, scalable processes for one of the world’s premier IaaS vendors, but 2011 has witnessed notable outages specific to Sony PlayStation, Twitter, Gmail and Google’s Blogger as well. Expect more outages and service disruptions until the industry fathoms the time to develop processes for delivering on 99.99% SLAs as opposed to merely promising them.

2. Early Consolidation vs. the Proliferation of New Entrants to the Market
The past five months have witnessed Verizon’s acquisition of Terremark, Time Warner Cable’s acquisition of NaviSite, CenturyLink’s acquisition of Savvis and rife speculation that Rackspace lies next on the totem pole of potential buyouts. In tandem with the finalization of these acquistions, a slew of other companies such as Appistry, CA Technologies, Engine Yard, Flexiant, GigaSpaces, RightScale and ThinkGrid have emerged on the landscape and promise to collectively cobble together a non-trivial slice of the market while potentially transforming into significant niche players themselves. Expect new entrants on the scene, particularly in the open source space that will increasingly complicate the IaaS market share dominance of AWS, Eucalyptus, Rackspace, GoGrid and Joyent. Consolidations will continue but the market is unlikely to congeal into a few dominant players for quite some time.

3. The Rise of Open Source Cloud Computing Solutions
Rackspace, Dell and Equinux’s launch of a demonstration environment of OpenStack promises to change the industry by enticing customers to consider toying with its open source platform for free while paying for consultative support services associated with cloud design and management. Meanwhile, Canonical’s decision to change the cloud computing provider for its Ubuntu Enterprise Cloud (UEC) offering from Eucalyptus to OpenStack testifies to the strength of OpenStack and conversely, underscores Eucalyptus’s challenge in defining its value proposition as an Amazon EC2 compatible open source IaaS platform. RedHat’s open source PaaS product called OpenShift marks another leading contender in the open source ring by virtue of its deployment flexibility across the Java, Python, PHP and Ruby environments. Expect that open source IaaS and PaaS offerings will become increasingly robust and scalable. If open source solutions can demonstrate reliable, high quality portability across platforms, the market for less portable, private sector IaaS and PaaS solutions is likely to shrink dramatically. The fortunes of OpenStack, OpenShift and the recently formed Open Virtualization Alliance merit a close watch, in particular.

Advertisement

Cloud Computing Law and the Co-Implication of Amazon Web Services with the Sony PlayStation Outage

Last week’s report by Bloomberg that the outage on the PlayStation Network was caused by a hacker using Amazon Web Services’s EC2 platform raises interesting questions in the newly emerging field of cloud computing law. Can Amazon Web Services be held responsible for the breach? In the event of a violation of security on one cloud infrastructure that stems from another cloud computing platform, can the originating cloud computing vendor be deemed legally responsible for the security violation? Consider the case of HIPAA legislation as it relates to the cloud, for example: as “business associates” of “covered entities” such as provider organizations, cloud computing vendors bear responsibility for security and privacy of patient health information data. A covered entity such as a hospital that stores personal health information on Amazon’s EC2 infrastructure can expect that, as a business associate, Amazon Web Services will demonstrate adherence with HIPAA’s privacy and security regulations that require data encryption, access controls, and processes for data back-up and audit review of access.

What is Amazon Web Services’s degree of liability for the Sony Outage, if any? Sources close to the investigation revealed that hackers rented one of Amazon’s EC2 servers and then deployed the attack on Sony PlayStation’s network that compromised the security of 100 million Sony customers. Amazon Web Services is likely to be subpoenaed in the investigation in order to extract details of the method of payment and the IP addresses used for the attack. That said, one would be hard pressed to imagine making a legal case that Amazon bears responsibility for the attack given that virtually any of its customers could have launched the attack and there currently exists no easy method of differentiating between criminal accounts and legitimate ones. Granted, one could make the argument that cloud computing vendors should develop the IT infrastructure to proactively identify suspicious behavior and curtail it as necessary. Given the recent proliferation of cases where hackers use rented or hijacked servers to launch cyber-attacks, such legislation may not be entirely inconceivable as the cloud computing space evolves. Right now, however, regulatory agencies such as NIST and U.S. CIO Vivek Kundra have their hands full grappling with inter-operability and quality standards for cloud based data storage and transmission, separate from formulating the legally precarious constraint that would mandate cloud computing vendors to develop processes to detect hack-attacks before they happen.

Google’s Blogger tight lipped about reasons for outage as service is restored

Google’s Blogger service experienced a major outage on Thursday May 12 that continued until service was finally restored on Friday, May 13 at 1030 AM PDT. Users were unable to log-in to the dashboard that enables bloggers to publish and edit posts, edit widgets and alter the design templates for their blogs. The outage coincided with the impending launch of a major overhaul to Blogger’s user interface and functionality, but a Blogger tweet asserted the independence of the outage from the upcoming redesign. Most notable about the outage, however, was Google’s tight lipped explanation of the technical reasons responsible for the outage in contradistinction to Amazon Web Service’s (AWS) exhaustively thorough explanation of its own service outage in late April. Blogger’s Tech Lead/Manager Eddie Kessler explained the Blogger outage as follows:

Here’s what happened: during scheduled maintenance work Wednesday night, we experienced some data corruption that impacted Blogger’s behavior. Since then, bloggers and readers may have experienced a variety of anomalies including intermittent outages, disappearing posts, and arriving at unintended blogs or error pages. A small subset of Blogger users (we estimate 0.16%) may have encountered additional problems specific to their accounts. Yesterday we returned Blogger to a pre-maintenance state and placed the service in read-only mode while we worked on restoring all content: that’s why you haven’t been able to publish. We rolled back to a version of Blogger as of Wednesday May 11th, so your posts since then were temporarily removed. Those are the posts that we’re in the progress of restoring.

Routine maintenance caused “data corruption” that led to disappearing posts and the subsequent outage to the user management dashboard. But Kessler resists from elaborating on the error that resulted from “scheduled maintenance” nor does he specify the form of data corruption that caused such a wide variety of errors on blogger pages. In contrast, AWS revealed that the outage was caused by misrouting network bandwidth from a high bandwidth connection to a low bandwidth connection on Elastic Block Storage, the storage database for Amazon EC2 instances. In their post-mortem explanation, AWS described the repercussions of the network misrouting on the architecture of EBS within the affected Region in excruciatingly impressive detail. Granted, Blogger is a free service used primarily for personal blogging, whereas AWS hosts customers with hundreds of millions of dollars in annual revenue. Nevertheless, Blogger users published half a billion posts in 2010 which were read by 400 million readers across the world. Users, readers and cloud computing savants alike would all benefit from learning more about the technical issues responsible for outages such as this one because vendor transparency will only increase public confidence in the cloud and help propel industry-wide innovation. Even if the explanation were not quite as thorough as that offered by Amazon Web Services, Google would do well to supplement its note about “data corruption” with something more substantial for Blogger users and the cloud computing community more generally.

Red Hat Enters IaaS and PaaS Space with CloudForms and OpenShift

At its May 2010 summit in Boston, Red Hat, the world’s leading provider of open source solutions, announced the launch of CloudForms and OpenShift, two products that represent the company’s boldest entrance into the cloud computing space so far. CloudForms marks an IaaS service offering that enables enterprises to create and manage a private or hybrid cloud computing environment. CloudForms provides customers with Application Lifecycle Management (ALM) functionality that enables management of an application deployed over a constellation of physical, virtualized and cloud-based environments. Whereas VMWare’s vCloud enables customers to manage virtualized machines, Red Hat’s CloudForms delivers a more granular form of management functionality that allows users to manage applications. Moreover, CloudForms offers a resource management interface that confronts the problem in the industry known as virtual sprawl wherein IT administrators are tasked with the problem of managing multiple servers, hypervisors, virtual machines and clusters. Red Hat’s IaaS product also offers customers the ability to create integrated, hybrid cloud environments that leverage a combination of physical servers, virtual servers and public clouds such as Amazon EC2.

OpenShift represents Red Hat’s PaaS product that enables open source developers to build cloud computing environments from within a specified range of development frameworks. OpenShift supports Java, Python, PHP and Ruby applications such as Spring, Seam, Weld, CDI, Rails, Rack, Symfony, Zend Framework, Twisted, Django and Java EE. In supporting Java, Python, PHP and Ruby, OpenShift offers the most flexible development environment in the industry as compared to Amazon’s Elastic Beanstalk, Microsoft Azure and Google’s App Engine. For storage, OpenShift features SQL and NoSQL in addition to a distributed file system. Red Hat claims OpenShift delivers greater portability than other PaaS products because customers will be able to migrate their deployments to another cloud computing vendor using the DeltaCloud inter-operability API. The only problem with this marketing claim is that DeltaCloud is by no means the most widely accepted cloud computing inter-operability API in the industry. Red Hat submitted the DeltaCloud API to the Distributed Management Task Force (DMTF) in August 2010, but the Red Hat API faces stiff competition from open source versions of Amazon’s EC2 APIs as well as APIs from the OpenStack project.

In summary, Red Hat’s entrance into the IaaS and PaaS space promises to significantly change the cloud computing landscape. CloudForms signals genuine innovation in the IaaS space because of its Application Lifecycle Management capabilities and hybrid infrastructure flexibility. OpenShift, meanwhile, presents direct competition to Google Apps, Microsoft Azure and Amazon’s Elastic Beanstalk because of the breadth of its deployment platform and claims about increased portability. What makes OpenShift so intriguing is it that constitutes Red Hat’s most aggressive attempt so far to claim DeltaCloud as the standard API for the cloud computing industry.

The Amazon Web Services Outage: A Brief Explanation

On Friday, April 29, 2011, Amazon Web Services issued an apology and detailed technical explanation of the outage that affected its US-1 East Region from April 21, 1 AM PDT to April 24, 730 PM PDT. A complete description of Amazon’s cloud computing technical architecture is elaborated in more detail in the full text of Amazon’s post-mortem analysis of the outage and its accompanying apology. This posting elaborates on the technical issues responsible for Amazon’s outage, with the intent of giving readers a condensed understanding of Amazon’s cloud computing architecture and the kinds of problems that are likely to affect the cloud computing industry more generally. We are impressed with the candor and specificity of Amazon’s response and believe it ushers in a new age of transparency and accountability in the cloud computing space.

Guide to the April 2011 Amazon Web Services Outage:

1. Elastic Block Store Architecture
Elastic Block Store is one of the storage databases for Amazon’s EC2. EBS has two components: (1) EBS clusters, each of which is composed of a set of nodes; and (2) a Control Plane Services platform that accepts user requests and directs them to appropriate EBS clusters. Nodes within EBS clusters communicate with one another by means of a high bandwidth network and a lower capacity network used as a back-up network.

2. Manual Error with Network Upgrade Procedure
The outage began when a routine procedure to upgrade the capacity of the primary network resulted in traffic being directed to EBS’s lower capacity network instead of an alternate router on the high capacity network. Because the high capacity network was temporarily disengaged, and the low capacity network could not handle the traffic that had been shunted in its direction, many nodes in the affected EBS availability zone were isolated.

3. Re-Mirroring of Elastic Block Store Nodes
Once Amazon engineers noticed that the network upgrade had been executed incorrectly, they restored the network to its proper connectivity on the high bandwidth connection. Nodes which had become isolated wanted to search for other nodes through which they could “mirror” or duplicate themselves. But since so many nodes were in the position of looking for a replica, the EBS cluster’s space quickly became used to capacity. Consequently, approximately 13% of nodes within the affected Availability Zone became “stuck”.

4. Control Plane Service Platform Isolated
The full utilization of the EBS storage system by stuck nodes seeking to re-mirror themselves impacted the Control Plane Services platform that directs user requests from an API to EBS clusters. The exhausted capacity of the EBS cluster rendered EBS unable to accommodate requests from the Control Plane Service. Because the degraded EBS cluster began to have an adverse effect on the Control Plane Service through the entire Region, Amazon disabled communication between the EBS clusters and the Control Plane Service.

5. Restoring EBS cluster server capacity
Amazon engineers knew that the isolated nodes had exhausted server capacity within the EBS cluster. In order to enable the nodes to re-mirror themselves, it was necessary to add extra server capacity to the degraded EBS cluster. Finally, the connection between the Control Plane Service and EBS was restored.

6. Relational Database Service Fails to Replicate
Amazon’s Relational Database service manages communication between multiple databases that leverage EBS’s database structure. RDS can be configured to function in one Availability Zone or several. RDS instances that have been configured to operate across multiple Availability Zones should switch to their replica on an Availability Zone unaffected by a service disruption. The network interruption on the degraded EBS cluster caused 2.5% of multi-AZ RDS instances to fail to find their replica due to an unexpected bug.

Amazon Web Services’s Response

In response to the set of issues that prompted the outage, Amazon proposes to take the following steps:

1. Increase automation of the network change/upgrade process that triggered the outage
2. Increase server capacity in EBS clusters to allow EBS nodes to find their replicas effectively in the event of a disruption
3. Develop more intelligent re-try logic to prevent the “re-mirroring storm” that causes EBS nodes to seek and re-seek their replicas relentlessly. While EBS nodes should seek out their replicas after a service disruption, the logic behind the search for replicas should lead to amelioration of an outage rather than its exacerbation.

CenturyLink Acquires Savvis for $2.5 billion

Telecom giant CenturyLink has decided to acquire Savvis for $2.5 billion in a move that signals early consolidation within the cloud computing industry. CenturyLink announced a deal whereby Savvis stock would be acquired for $40/share or 11% above the April 26, 2011 closing share price. CenturyLink’s acquisition of Savvis also involves the assumption of $700 million in debt, resulting in a total deal valuation of $3.2 billion. Under the terms of the transaction, Savvis shareholders would receive $30/share and $10 in shares of CenturyLink’s common stock. The acquisition enables CenturyLink to expand upon its existing hosting and colocation capabilities and include, alongside them, Savvis’s IaaS cloud computing platform. Together, CenturyLink and Savvis will operate a total of 48 data centers worldwide, composed of the union of Savvis’s 32 data centers and CenturyLink’s 16 data centers. CenturyLink announced plans to integrate Savvis as a distinct business unit that retains its current leadership team headed by Savvis chairman and CEO, James Ousley. The telecom company’s acquisition of Savvis comes upon the heels of its recent acquisition of Qwest for $10.6 billion in the increasingly consolidated telecommunications vertical. Savvis is known for its large enterprise customer base and annual revenues that are close to a $1 billion. In an interview with ZDNet’s Larry Dignan, Bill Fathers, President of Savvis, noted that cloud based revenues averaged $8 – $10 million a quarter, with 350 of its 3500 customers using its cloud based platform, Symphony. The remainder of Savvis’s revenues are generated by colocation, managed services such as application hosting and network services. Cloud revenues constitute a subset of Savvis’s managed services revenue. Rumours have swirled about the impending acquisition of Savvis since the recent acquisitions of Terremark by Verizon and NaviSite by TimeWarner. Leading industry analysts such as Gartner’s Lydia Leong contends that the acquisition could bode well for Savvis provided that the company is allowed to run semi-independently. Nevertheless, the larger question posed by this acquisition concerns whether acquired cloud vendors such as Terremark and Savvis can continue to deliver the level of product innovation of highly agile competitors such as Rackspace and Amazon.