Amazon Web Services Follows Microsoft by Eliminating Inbound Data Charges

Amazon Web Services (AWS) promised to eliminate inbound data fees starting July 1 in a move that matched Microsoft’s recent announcement of the same with respect to its Microsoft Azure platform. Moreover, AWS slashed outbound data prices for up to 10 terabytes of outbound traffic per month from 15 cents to 12 cents per GB. After 10 terabytes of outbound data transfer within a month, the next 40 terabytes per month have been discounted from 11 cents to 9 cents (total: 50 terabytes) per GB. And the next 100 terabytes of outbound data transfer per month (total: 150 terabytes) will be discounted from 9 cents to 7 cents per GB. In a blog post, Amazon Web Services remarked: “There is no charge for inbound data transfer across all services in all regions. That means, you can upload petabytes of data without having to pay for inbound data transfer fees. On outbound transfer, you will save up to 68% depending on volume usage. For example, if you were transferring 10 TB in and 10 TB out a month, you will save 52% with the new pricing. If you were transferring 500 TB in and 500 TB out a month, you will save 68% on transfer with the new pricing.”

Microsoft announced its intention last week to eliminate inbound data transfer fees in the context of the case of Press Association Sport, a partner of the Press Association, the national news agency of the UK. Given that the Press Association Sport planned to upload “large amounts of text, data and multimedia content every month,” into Windows Azure, the CTO of the Press Association remarked on the benefits of free inbound data transfers as follows: “Estimating the amount of data we will upload every month is a challenge for us due to the sheer volume of data we generate, the fluctuations of volume month on month and the fact that it grows over time. Eliminating the cost of inbound data transfer made the project easier to estimate and removes a barrier or uploading as much data as we think we may need.” Amazon followed suit a week after Microsoft’s June 22 announcement. In a June 29 blog post, AWS CTO Werner Vogels indicated future price decreases from AWS were forthcoming as the company scaled and rendered its operations more efficient.

Top 3 Cloud Computing Market Trends for 2011

2011 has been an extraordinary year for cloud computing so far. Amazon Web Services (AWS) set the pace with an aggressive roll-out of products such as Elastic Beanstalk, CloudFormation, Amazon Cloud Player and Amazon Cloud Drive. Just when AWS seemed poised to consolidate its first mover advantage with respect to cloud computing market share, the landscape exploded with a veritable feast of product offerings, business partnerships and acquisitions. Every month another Fortune 500 IT or telecommunications company throws its hat into the cloud computing ring: Dell’s vStart, Dell’s recent partnership with SAP, IBM’s SmartCloud, Apple’s iCloud and HP’s BladeSystem Matrix mark just some of the big names and brands that have entered the cloud computing dohyo, or sumo circle. The cast of new actors has rendered the cloud computing space painfully difficult for analysts to quantify for the purpose of understanding relative market share and growth within the industry. But within this bewildering sea of change, three industry trends have emerged that deserve attention:

1. Outages across the industry signal demand outweighs supply
Demand for cloud computing services has begun to outstrip supply to the point where vendor processes for guaranteeing system uptime have become increasingly challenged. The Amazon Web Services outage of 2011 was the most glaring example of a lack of effective, scalable processes for one of the world’s premier IaaS vendors, but 2011 has witnessed notable outages specific to Sony PlayStation, Twitter, Gmail and Google’s Blogger as well. Expect more outages and service disruptions until the industry fathoms the time to develop processes for delivering on 99.99% SLAs as opposed to merely promising them.

2. Early Consolidation vs. the Proliferation of New Entrants to the Market
The past five months have witnessed Verizon’s acquisition of Terremark, Time Warner Cable’s acquisition of NaviSite, CenturyLink’s acquisition of Savvis and rife speculation that Rackspace lies next on the totem pole of potential buyouts. In tandem with the finalization of these acquistions, a slew of other companies such as Appistry, CA Technologies, Engine Yard, Flexiant, GigaSpaces, RightScale and ThinkGrid have emerged on the landscape and promise to collectively cobble together a non-trivial slice of the market while potentially transforming into significant niche players themselves. Expect new entrants on the scene, particularly in the open source space that will increasingly complicate the IaaS market share dominance of AWS, Eucalyptus, Rackspace, GoGrid and Joyent. Consolidations will continue but the market is unlikely to congeal into a few dominant players for quite some time.

3. The Rise of Open Source Cloud Computing Solutions
Rackspace, Dell and Equinux’s launch of a demonstration environment of OpenStack promises to change the industry by enticing customers to consider toying with its open source platform for free while paying for consultative support services associated with cloud design and management. Meanwhile, Canonical’s decision to change the cloud computing provider for its Ubuntu Enterprise Cloud (UEC) offering from Eucalyptus to OpenStack testifies to the strength of OpenStack and conversely, underscores Eucalyptus’s challenge in defining its value proposition as an Amazon EC2 compatible open source IaaS platform. RedHat’s open source PaaS product called OpenShift marks another leading contender in the open source ring by virtue of its deployment flexibility across the Java, Python, PHP and Ruby environments. Expect that open source IaaS and PaaS offerings will become increasingly robust and scalable. If open source solutions can demonstrate reliable, high quality portability across platforms, the market for less portable, private sector IaaS and PaaS solutions is likely to shrink dramatically. The fortunes of OpenStack, OpenShift and the recently formed Open Virtualization Alliance merit a close watch, in particular.

Cloud Computing Law and the Co-Implication of Amazon Web Services with the Sony PlayStation Outage

Last week’s report by Bloomberg that the outage on the PlayStation Network was caused by a hacker using Amazon Web Services’s EC2 platform raises interesting questions in the newly emerging field of cloud computing law. Can Amazon Web Services be held responsible for the breach? In the event of a violation of security on one cloud infrastructure that stems from another cloud computing platform, can the originating cloud computing vendor be deemed legally responsible for the security violation? Consider the case of HIPAA legislation as it relates to the cloud, for example: as “business associates” of “covered entities” such as provider organizations, cloud computing vendors bear responsibility for security and privacy of patient health information data. A covered entity such as a hospital that stores personal health information on Amazon’s EC2 infrastructure can expect that, as a business associate, Amazon Web Services will demonstrate adherence with HIPAA’s privacy and security regulations that require data encryption, access controls, and processes for data back-up and audit review of access.

What is Amazon Web Services’s degree of liability for the Sony Outage, if any? Sources close to the investigation revealed that hackers rented one of Amazon’s EC2 servers and then deployed the attack on Sony PlayStation’s network that compromised the security of 100 million Sony customers. Amazon Web Services is likely to be subpoenaed in the investigation in order to extract details of the method of payment and the IP addresses used for the attack. That said, one would be hard pressed to imagine making a legal case that Amazon bears responsibility for the attack given that virtually any of its customers could have launched the attack and there currently exists no easy method of differentiating between criminal accounts and legitimate ones. Granted, one could make the argument that cloud computing vendors should develop the IT infrastructure to proactively identify suspicious behavior and curtail it as necessary. Given the recent proliferation of cases where hackers use rented or hijacked servers to launch cyber-attacks, such legislation may not be entirely inconceivable as the cloud computing space evolves. Right now, however, regulatory agencies such as NIST and U.S. CIO Vivek Kundra have their hands full grappling with inter-operability and quality standards for cloud based data storage and transmission, separate from formulating the legally precarious constraint that would mandate cloud computing vendors to develop processes to detect hack-attacks before they happen.

The Amazon Web Services Outage: A Brief Explanation

On Friday, April 29, 2011, Amazon Web Services issued an apology and detailed technical explanation of the outage that affected its US-1 East Region from April 21, 1 AM PDT to April 24, 730 PM PDT. A complete description of Amazon’s cloud computing technical architecture is elaborated in more detail in the full text of Amazon’s post-mortem analysis of the outage and its accompanying apology. This posting elaborates on the technical issues responsible for Amazon’s outage, with the intent of giving readers a condensed understanding of Amazon’s cloud computing architecture and the kinds of problems that are likely to affect the cloud computing industry more generally. We are impressed with the candor and specificity of Amazon’s response and believe it ushers in a new age of transparency and accountability in the cloud computing space.

Guide to the April 2011 Amazon Web Services Outage:

1. Elastic Block Store Architecture
Elastic Block Store is one of the storage databases for Amazon’s EC2. EBS has two components: (1) EBS clusters, each of which is composed of a set of nodes; and (2) a Control Plane Services platform that accepts user requests and directs them to appropriate EBS clusters. Nodes within EBS clusters communicate with one another by means of a high bandwidth network and a lower capacity network used as a back-up network.

2. Manual Error with Network Upgrade Procedure
The outage began when a routine procedure to upgrade the capacity of the primary network resulted in traffic being directed to EBS’s lower capacity network instead of an alternate router on the high capacity network. Because the high capacity network was temporarily disengaged, and the low capacity network could not handle the traffic that had been shunted in its direction, many nodes in the affected EBS availability zone were isolated.

3. Re-Mirroring of Elastic Block Store Nodes
Once Amazon engineers noticed that the network upgrade had been executed incorrectly, they restored the network to its proper connectivity on the high bandwidth connection. Nodes which had become isolated wanted to search for other nodes through which they could “mirror” or duplicate themselves. But since so many nodes were in the position of looking for a replica, the EBS cluster’s space quickly became used to capacity. Consequently, approximately 13% of nodes within the affected Availability Zone became “stuck”.

4. Control Plane Service Platform Isolated
The full utilization of the EBS storage system by stuck nodes seeking to re-mirror themselves impacted the Control Plane Services platform that directs user requests from an API to EBS clusters. The exhausted capacity of the EBS cluster rendered EBS unable to accommodate requests from the Control Plane Service. Because the degraded EBS cluster began to have an adverse effect on the Control Plane Service through the entire Region, Amazon disabled communication between the EBS clusters and the Control Plane Service.

5. Restoring EBS cluster server capacity
Amazon engineers knew that the isolated nodes had exhausted server capacity within the EBS cluster. In order to enable the nodes to re-mirror themselves, it was necessary to add extra server capacity to the degraded EBS cluster. Finally, the connection between the Control Plane Service and EBS was restored.

6. Relational Database Service Fails to Replicate
Amazon’s Relational Database service manages communication between multiple databases that leverage EBS’s database structure. RDS can be configured to function in one Availability Zone or several. RDS instances that have been configured to operate across multiple Availability Zones should switch to their replica on an Availability Zone unaffected by a service disruption. The network interruption on the degraded EBS cluster caused 2.5% of multi-AZ RDS instances to fail to find their replica due to an unexpected bug.

Amazon Web Services’s Response

In response to the set of issues that prompted the outage, Amazon proposes to take the following steps:

1. Increase automation of the network change/upgrade process that triggered the outage
2. Increase server capacity in EBS clusters to allow EBS nodes to find their replicas effectively in the event of a disruption
3. Develop more intelligent re-try logic to prevent the “re-mirroring storm” that causes EBS nodes to seek and re-seek their replicas relentlessly. While EBS nodes should seek out their replicas after a service disruption, the logic behind the search for replicas should lead to amelioration of an outage rather than its exacerbation.

Why Amazon’s Cloud Computing Outage Didn’t Violate Its SLA

Amazon’s cloud computing outage on April 21 and April 22 can be interpreted in one of two ways: (1) either the outage constitutes a reflection on Amazon’s EC2 platform and its processes for disaster recovery situations; or (2) the outage represents a commentary on the state of the cloud computing industry as a whole. The outage began on Thursday and involved problems specific to Amazon’s Northern Virginia data center. Companies affected by the outage include HootSuite, FourSquare, Reddit, Quora and other start-ups such as BigDoor, Mass Relevance and Spanning Cloud Apps. Hootsuite—a dashboard that allows users to manage content on a number of websites such as Facebook, LinkedIn, Twitter and WordPress—experienced a temporary crash on Thursday that affected a large number of sites. The social news website Reddit was unavailable until noon on Thursday, April 21. BigDoor, a 20 person start-up that provides online game and rewards applications, had restored most of its services by Friday evening even though its corporate website remained down. Netflix and Recovery.gov, meanwhile, escaped the Amazon outage either unscathed or with minimal interruption.

Amazon’s EC2 platform currently has five regions: US East (Northern Virginia), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), and Asia Pacific (Tokyo). Each region is composed of multiple “Availability Zones”. Customers who launch server instances in different Availability Zones can, according to Amazon Web Services’s website, “protect [their] applications from failure of a single location.” The Amazon outage underscores how EC2 customers can no longer depend on having multiple “Availability Zones” within a specific region as insurance against system downtime. Customers will need to ensure their architecture plans for duplicate copies of server instances in multiple regions.

Amazon’s SLAs commit to 99.5% system uptime for customers who have deployments in more than one availability zone within a specific region. However, the SLA guarantees only the ability to commit to connect to and provision instances. On Thursday and Friday, Amazon’s US-East customers could still connect to and provision instances, but the outage adversely affected their deployments because of problems with Amazon’s Elastic Block Storage (EBS) and Relational Database Service (RDS) platforms. EBS is a storage database and RDS provides a way of relating multiple databases that store data provisioned on an EC2 platform. Because Amazon’s problems were confined to EBS and RDS in the US East region, Amazon’s SLA for customers affected by the outage was not violated. The immediate consequence here is that Amazon EC2 customers will need to deploy copies of the same server instance in multiple regions to guarantee 100% system uptime, assuming, of course, that the wildly unlikely scenario that multiple Amazon cloud computing regions experience outages at the same time never transpires.

Anyone familiar with the cloud computing industry knows full well that Amazon, Rackspace, Microsoft and Google have all experienced glitches resulting in system downtime in the last three years. The multiple instances of system downtime across vendors points to the immaturity of the technological architecture and processes for delivering cloud computing services. Until the architecture and processes for cloud computing operational management improves, customers will need to seriously consider the costs of redundant data architectures that insure them against system downtime in comparison with the risk and costs of actual downtime.

For a non-technical summary of the technical issues specific to the outage, see Cloud Computing Today’s “Understanding Amazon Web Services’s 2011 Outage“.

Amazon Web Services: Elastic Beanstalk and CloudFormation Explained

Amazon Web Services has recently released Elastic Beanstalk and CloudFormation, two applications that automate the process of provisioning hardware resources and deploying applications on AWS’s flexible, inexpensive development environment. Introduced on January 19, Elastic Beanstalk automates the process of deploying an application on Amazon’s virtual servers once it has been written. Currently in Beta mode for Java applications only, Elastic Beanstalk manages the specifics of provisioning servers, load balancing and auto-scaling for unexpected spikes in the volume of traffic once an application is written. Elastic Beanstalk’s auto-scaling functionality scales horizontally by creating a clone of the original server instance, instead of vertically provisioning a larger server with a correspondingly appropriate memory. Developers retain the flexibility to over-ride Elastic Beanstalk’s auto-scaling features, in which case the application conforms to the scaling parameters indicated by the user.

Like Elastic Beanstalk, CloudFormation fulfills an analogous, but more ambitious function of automating application deployment. Launched on February 25, CloudFormation uses templates to automate creation of an integrated hardware infrastructure for an application containing multiple components. For example, CloudFormation takes the images, storage, security and messaging components of an application, understands their dependencies, and launches them in the right order using the template. In other words, instead of requiring a developer to write discrete scripts for each individual Amazon Machine Instance (AMI), CloudFormation gathers together certain parameters specified by a developer and creates one script for the requisite “stack” of Amazon Machine Instances of servers that collectively specifies elastic IP addresses, message queues, load balancing and auto-scaling. CloudFormation operates through JSON templates that are used to understand an application’s configuration parameters.

In his AWS blog post about CloudFormation, Jeff Barr uses the metaphor of cooking and baking to describe the application’s innovation and importance. While cooking allows for individual discretion and ad hoc changes to a recipe, baking requires precise combinations of ingredients that allow for cookies of the same taste and texture to emerge from the oven time and time again. In the same vein, CloudFormation enables developers to become bakers by automating the creation of complex systems. Moreover, developers may wish to create the same development environment a number of times, and instead of memorizing and repeating the execution of the same set of scripts over and over again, they can now use CloudFormation to automate and scale their development needs. Amazon released CloudFormation with templates for a number of open source applications such as Drupal, WordPress, Gollum and Joomla.

Amazon’s Jeff Barr put it as follows:

First, AWS is programmable, so it should be possible to build even complex systems (sometimes called “stacks”) using repeatable processes. Second, the dynamic nature of AWS makes people want to create multiple precise copies of their operating environment. This could be to create extra stacks for development and testing, or to replicate them across multiple AWS Regions….Today, all of you cooks get to become bakers!

Together with Elastic Beanstalk, CloudFormation goes a long way toward streamlining the process of deploying applications on Amazon’s EC2 environment. Despite Amazon’s lack of managed services, the 2011 first quarter release of both of these applications should render AWS more attractive to both small and enterprise customers alike.