Without Any Human Authorization, AI Changed Codes In AWS, Leading To Global Outage For 13 Hours


Mohul Ghosh

Mohul Ghosh

Feb 22, 2026


In a startling incident that has raised fresh concerns about the reliability of AI infrastructure, Amazon Web Services (AWS) — the world’s largest cloud provider — recently experienced a widespread service outage that lasted approximately 13 hours. According to industry reports, the prolonged outage was triggered by an internal AWS system that utilized AI-powered automation to manage operations — a system that ironically became part of the problem rather than the solution.

The incident affected multiple AWS services, causing disruptions for businesses, developers, and end users that rely on cloud computing for everything from hosting websites to running enterprise software. Given AWS’s dominant position in global cloud infrastructure, even a single outage of this magnitude reverberated across digital ecosystems.

What Went Wrong

At the heart of the disruption was an AI-powered automation tool that AWS used to optimise internal processes. Instead of improving operational efficiency, a misconfiguration or error in the AI system caused it to inadvertently trigger a cascade of failures in AWS’s infrastructure.

Rather than immediately switching to manual control, the automated system continued to make adjustments that compounded the disruption. Engineers eventually had to intervene directly to halt the faulty automation and restore services — a process that took more than half a day to complete.

This sequence of events has sparked debate over the risks of over-reliance on autonomous AI systems, especially in managing mission-critical infrastructure.

Impact on Businesses and Users

AWS supports millions of applications and workloads globally. During the outage, online services ranging from corporate applications to consumer apps experienced degraded performance or complete unavailability. Companies that host e-commerce platforms, financial services, streaming services, and even government systems on AWS were forced to deal with downtime, lost revenue, and frustrated users.

Because AWS operates in multiple availability zones and regions, outages can have far-reaching effects that extend beyond localised disruptions. Businesses that lacked multi-cloud or failover strategies were particularly impacted, underscoring the importance of contingency planning when relying on a single cloud provider.

Questions About AI Oversight

One of the key takeaways from this outage is the question of AI governance and oversight in operations where reliability is paramount. AI tools are increasingly used to automate configuration adjustments, predict failures, and optimise performance. However, when an AI system operates without proper manual safeguards, errors can escalate before human engineers can intervene.

Experts are now debating how to balance the efficiency gains of AI automation with robust controls that prevent autonomous systems from making catastrophic decisions. Some argue for AI systems that operate only in advisory modes unless explicitly authorised by human operators.

Lessons for Cloud Infrastructure

The AWS outage serves as a critical reminder that automation systems — including those powered by AI — must incorporate careful fail-safe mechanisms and transparent decision pathways. For cloud customers, it highlights the need for robust redundancy plans, such as multi-region deployments or hybrid cloud strategies, to mitigate the effects of provider outages.

As AI continues to integrate deeper into operational workflows, organisations will need to refine how they trust, monitor, and contain these tools so that automation complements human oversight rather than undermines it.


Mohul Ghosh
Mohul Ghosh
  • 4667 Posts

Subscribe Now!

Get latest news and views related to startups, tech and business

You Might Also Like

Recent Posts

Related Videos

   

Subscribe Now!

Get latest news and views related to startups, tech and business

who's online