Amazon has finally apologized for last week’s Amazon Web Services (AWS) outage that left scores of popular websites inaccessible or only partially operational. Some sites, such as Reddit, were crippled for several days after the initial outage. Others appeared to have been restored in a matter of hours.
“We want to apologize,” a statement from Amazon read. “We know how critical our services are to our customers’ businesses and we will do everything we can to learn from this event and use it to drive improvement across our services.”
In addition to the apology, Amazon also offered a rather long and detailed explanation as to what exactly went wrong. And unless you’re a systems engineer or cloud computing maverick, you’re not likely to be able to make complete sense of the document (if you’re up for the challenge you’ll find the whole thing here). Suffice it to say that problems began at 12:47 AM PDT on April 21 at Amazon’s Elastic Block Store (EBS) which facilitates some of its cloud computing services. The problem was confined to a single Availability Zone, the physical location of Amazon’s cloud servers, on the East Coast and appears to be the result of human error.
Amazon is planning on implementing some changes to prevent another cloud computing outage. One change involves making it easier for customers to take advantage of multiple Availability Zones. Amazon also says it’s also putting into place several measures that will ensure that any future recoveries are performed more quickly.
While an explanation was nice and an apology was deserved, what Amazon’s customers are more likely to be interested in is compensation. As we noted yesterday, some customers may never fully recover from the outage. Amazon is offering those customers and any others who used the affected service — even if their sites weren’t interrupted — a 10-day credit “equal to 100% of their usage of EBS Volumes, EC2 Instances and RDS database.” The credit will automatically be applied to those customers’ next AWS bill.