Amazon Web Services June 2012 Outage Explained

Amazon Web Services recently released an explanation of the outage that affected a single Availability Zone in the US East Region (Northern Virginia) on June 14. The outage affected customers such as Quora, Heroku and Hipchat. The chronology below represents a summary distilled from Amazon Web Services’s more complete explanation of the outage. The root cause of the outage was a power failure coupled with an incorrectly configured generator that was unable to handle the load once EC2 instances and EBS volumes failed over to it.

June 14, 2012 to June 15, 2012: US East Region (Northern Virginia). All times listed are PDT.

•8:44 PM: Single Availability Zone in the US East Region transfers to generator power after a cable fault in the power distribution system.
•8:53 PM: A generator that had been used to manage the power failure overheated. Affected EC2 instances and EBS volumes failed over to their secondary back-up power source given the failure of the generator.
•8:57 PM: One of the circuit breakers in this secondary back-up power grid failed due to an incorrect configuration. Affected EC2 instances and EBS volumes now had no primary, secondary or tertiary power source.
•10:19 PM: Generator was repaired and restarted.
•10:50 PM: Majority of EC2 instances and EBS volumes recovered.
•1:05 AM: 99% of all EBS volumes that were in the process of an “inflight write” were brought back in an “impaired” state that allowed customers to verify the consistency of the volume and subsequently resume using it.

Concurrent with the above:
•8:57PM until 10:40PM: Vitiated ability of customers to launch new EC2 instances backed by EBS.

Kudos once again to Amazon’s transparency although its failure to test and correctly configure its back-up power infrastructure is disappointing.

Advertisements

One thought on “Amazon Web Services June 2012 Outage Explained”

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s