Last week, Amazon’s S3 storage service suffered stability and availability issues that had a wide-ranging impact on tens of thousands of companies, ranging from small eCommerce merchants and publishers to some of the largest sites on the web. S3 is used to store static assets like images and scripts. When S3 suffers availability issues, those assets may become unavailable.
A pointed example of what happens when a service relied on by so many suffers availability issues comes from Amazon itself — static assets for AWS’s status pages were stored in S3. The service designed to tell users about any problems didn’t function properly because it was hit by the problems it was supposed to be reporting.
Hours-long availability issues can have serious consequences for busy websites and eCommerce stores. For as long as a service isn’t available, its users lose money, customers, and reputation. This isn’t an S3 problem, an AWS problem, or even a cloud problem. Complex systems experience failures from time-to-time — that’s a universal truth and it’s something everyone doing business on the web should understand and account for.
The best way to avoid being bitten by a failure in a platform is to plan ahead. Proactively consider how to deal with failures, rather than reacting when they happen. Does your business’s disaster recovery and business continuity plan include contingencies to handle failures in the services it relies on? If not, here a few things you should be thinking about.
- Design for redundancy or don’t put all your eggs in one basket. The availability of your business’ online presence should not be entirely dependent on one service. Ensure that your data exists in more than one place. Design server clusters such that a failure in any one server doesn’t bring down your site. Avoid single points of failure.
- Use managed services with responsive support. For CEOs and CIOs, one of the most frustrating aspects of downtime on services like S3 is that they have no insight into the problem and there’s nothing they can do except wait for it to be fixed. With a managed hosting solution that includes great support, there will be a trusted advisor you can call when something goes wrong. A managed hosting provider will build redundancy into their systems and help you to create a resilient platform that can better weather failures.
- Cloud isn’t the only option. Cloud storage solutions offer many benefits, but cloud isn’t the only way to go. Consider resilient high-performance alternatives like redundant dedicated servers.
The key lesson to be learned from this most recent outage shouldn’t focus on the reliability of any particular platform or hosting modality.
Instead, eCommerce merchants, publishers, and business site owners should consider the negative consequences of relying too much on the resilience of any one platform. An infrastructure monoculture isn’t good for the web. Infrastructure and vendor diversity are essential to building available, reliable, and stable online services.