On Tuesday, February 28, we were all reminded that technology is not infallible when the Amazon S3 service went down for about four hours. The effects were felt throughout the internet; 54 of the top 100 internet retailers were affected by the outage, including three sites that went down completely – Express, Lulu Lemon and One King’s Lane, according to Apica, a website testing, optimization, and monitoring provider.
S3 is Amazon’s largest service, according to Apica, and is used by more than half of its one million plus customers with more than 3-4 trillion pieces of data in it.
AWS reported that the S3 service disruption in the Northern Virginia Region was caused by an S3 team member debugging an issue, which resulted in the S3 billing system to progress more slowly than expected. “Unfortunately, one of the inputs to the command was entered incorrectly, and a larger set of servers was removed than intended,” said AWS.
As a result of the outage, AWS said, it has “added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level. This will prevent an incorrect input from triggering a similar event in the future.”
What should we take away from the outage?
- Remember that failures like this happen in every system. Amazon does not promise that their systems won’t fail, they offer service credits when S3 does fail in accordance with their Service Level Agreement.
- Since failures will happen, it’s important to plan for them. Back-up systems, redundancy—call it what you will, you need to have a plan B.
“Cloud does not replace the need for good strong consulting and vision on how to actually architect for continuous delivery of business applications,” said Jamie Shepard, senior vice president at Lumenate, a security and storage consulting firm.
Another consultant said that he sees public cloud providers as the “Pied Piper” with far too many customers taking a “lemmings” approach with almost a “mindless drive” to public cloud. “Customers aren’t thinking about the risks associated with putting all their eggs in one basket,” he said. “You always need a plan B, a path to recovery in the event there is a failure.”
- Talk to your software consultants to find out if you need any additional protections or diversification. Even if they are not experts in this particular issue, they have access to resources and can research the issue for you.