An Amazon server outage caused problems for Alexa, Ring, Disney Plus, and deliveries

Robert Cunningham

4 years ago

Problems with some Amazon Web Services cloud servers are causing slow loading or failures for significant chunks of the internet. Amazon’s widespread network of data centers powers many of the things you interact with online, including this website, so as we’ve seen in previous AWS outage incidents, any problem has massive rippling effects. People started noticing problems at around 10:45 AM ET, and just after 6 PM ET the AWS Status showed “Many services have already recovered, however, we are working towards full recovery across services.”AMAZON’S INTERNAL APPS WENT DOWN, TOO, CUTTING OFF SOME DELIVERY DRIVERS AND STALLING WAREHOUSE ROBOTS

While some affected services that rely on AWS have been restored, the internet is still a bit slower and more unsteady than usual. The most important app impacted by the outage might be the ones that Amazon employees use. CNBC points out Reddit posts from Amazon Flex, warehouse, and delivery workers who say the apps that keep track of packages, tell them where to go, and generally keep your items on time went down, too.

There have been reports of outages for Disney Plus and Netflix streaming, as well as games like PUBG, League of Legends, and Valorant. We also noticed some problems accessing Amazon.com and other Amazon products like the Alexa AI assistant, Kindle ebooks, Amazon Music, and security cameras from Ring or Wyze. The DownDetector list of services with simultaneous spikes in their outage reports runs off nearly any recognizable name: Tinder, Roku, Coinbase, both Cash App and Venmo, and the list goes on.

There were reports from network admins everywhere about errors connecting to Amazon’s instances and the AWS Management Console that controls their access to the servers. After about an hour of problems, Amazon’s official status page added an update with messages confirming the outage.

[11:26 AM PST] We are seeing impact to multiple AWS APIs in the US-EAST-1 Region. This issue is also affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates. Services impacted include: EC2, Connect, DynamoDB, Glue, Athena, Timestream, and Chime and other AWS Services in US-EAST-1.
The root cause of this issue is an impairment of several network devices in the US-EAST-1 Region. We are pursuing multiple mitigation paths in parallel, and have seen some signs of recovery, but we do not have an ETA for full recovery at this time. Root logins for consoles in all AWS regions are affected by this issue, however customers can login to consoles other than US-EAST-1 by using an IAM role for authentication.

With the problems coming from the US-EAST-1 AWS region in Virginia, users elsewhere may not have seen as many issues, and even if you were affected, it could manifest as slightly slower loading while the network rerouted your requests somewhere else.