NextFin News - In a revealing disclosure that underscores the unprecedented strain on global artificial intelligence infrastructure, Amazon Web Services (AWS) CEO Matt Garman confirmed that the cloud giant has yet to decommission a single Nvidia A100 server. Speaking at the Cisco AI Summit earlier this week, Garman noted that despite the A100 architecture being nearly six years old, the relentless appetite for compute capacity has rendered even legacy hardware indispensable. According to Data Center Dynamics, Garman stated that AWS is "completely sold out" of A100 capacity, as the industry continues to grapple with a market where demand for Graphics Processing Units (GPUs) consistently outstrips available supply.
The A100, first introduced by Nvidia in 2020, was once the gold standard for large language model (LLM) training. In the fast-moving world of silicon, a six-year lifespan typically marks the end of a server's economic and operational utility. However, Garman explained that the current market dynamics have flipped the traditional hardware lifecycle on its head. This phenomenon is not unique to AWS; last year, Google executives reported similar trends, noting that seven-year-old Tensor Processing Units (TPUs) were still operating at 100% utilization. The persistence of these legacy systems at AWS highlights a critical bottleneck in the AI revolution: the inability of even the world’s largest hyperscalers to build out new capacity fast enough to retire the old.
Beyond the simple scarcity of chips, Garman pointed to specific technical requirements that keep the A100 relevant. While newer architectures like Nvidia’s Blackwell focus heavily on reducing floating-point accuracy to accelerate AI inference, certain High-Performance Computing (HPC) workloads require the higher precision found in older chips. Garman noted that some customers specifically request older hardware because the precision levels in the latest generation of AI-optimized chips do not meet the rigorous demands of traditional scientific calculations. This creates a bifurcated market where the newest chips handle massive LLM training, while the "workhorse" A100s support a broad tail of specialized engineering and research applications.
From a financial perspective, the extended lifespan of the A100 is a boon for AWS’s margins. Typically, cloud providers depreciate server hardware over a three-to-five-year period. By keeping A100s in service for six years or more, AWS has effectively moved these assets into a high-margin phase where the hardware is fully depreciated but still generating significant rental income. To maintain this demand, AWS implemented a strategic price cut in June 2025, reducing the cost of A100 instances by up to 33%. This move successfully positioned the A100 as a cost-effective alternative for startups and researchers who do not require the bleeding-edge performance of the H100 or Blackwell series but need reliable, scalable GPU access.
The broader implications for the industry suggest that the "replacement cycle" for data centers is being fundamentally redefined. Under the administration of U.S. President Trump, there has been an increased focus on domestic technological sovereignty and the rapid expansion of American data center footprints. However, the physical constraints of power delivery and cooling infrastructure mean that hyperscalers cannot simply swap old for new at will. Keeping older, less power-efficient servers like the A100 running alongside newer units puts additional pressure on the power grid, a challenge that U.S. President Trump’s energy policies have sought to address through deregulation and support for nuclear power expansion for tech hubs.
Looking forward, the "never retired" status of the A100 serves as a leading indicator for the secondary and tertiary markets of AI compute. As long as the supply-demand gap remains wide, the industry will likely see a "tiered" cloud ecosystem. In this model, the newest Blackwell chips will command premium pricing for frontier model training, while the A100 and its contemporaries become the foundational layer for everyday AI inference and mid-tier enterprise applications. Garman’s comments suggest that until the industry reaches a point of compute saturation—a milestone that remains years away—the concept of "obsolete" hardware has been effectively suspended in the cloud.
Explore more exclusive insights at nextfin.ai.
