NextFin

When Tokens Outstrip Hardware: Dylan Patel on the Supply and Demand of AI Tokens

Summarized by NextFin AI
  • Dylan Patel, CEO of SemiAnalysis, discusses the rapid growth in AI token demand and the challenges in hardware supply during his podcast interview.
  • His firm has seen a shift from tens of thousands to $7 million annually in AI/cloud spending, driven by non-technical users leveraging large models.
  • Patel highlights the concentration of value among firms that can afford early access to frontier models and tokens, warning of economic disparities.
  • He predicts significant public backlash against AI's disruptive effects, urging the industry to improve communication and showcase positive use cases.

NextFin News - On April 23, 2026, Dylan Patel, founder and CEO of SemiAnalysis, joined Patrick O'Shaughnessy on the Invest Like the Best podcast to discuss the accelerating supply and demand dynamics for AI tokens, the bottlenecks in the hardware and infrastructure stack, and what those trends mean for businesses and society. The conversation was released as an episode of Invest Like the Best and published on April 23, 2026.

(podwise.ai)

Patel speaks from the front lines of the semiconductor and AI-infrastructure market. His remarks describe both concrete customer use cases his firm has built and a broader macroeconomic view about where token-driven value is concentrating and how supply constraints will shape the coming years.

Surging AI spend at SemiAnalysis and new class of users

Patel opens with a description of a dramatic change in his own firm's usage: what had been tens of thousands of dollars of AI/cloud spend last year has scaled into a multi-million-dollar annualized rate. We signed an enterprise contract with Anthropic and it's gone to the point where now... we're spending $7 million a year on cloud code at the current rate, he says, and notes that this is already a very large fraction of the company's payroll expense. He attributes the acceleration to a new class of non-technical users and small teams applying large models to tasks that used to require entire specialized groups.

"One person on the team... with a couple thousand dollars of cloud tokens... created this application that is GPU accelerated... and anytime we send it an image, it's able to... overlay where every single material is."

Concrete use cases: reverse engineering, economic benchmarking, and grid mapping

Throughout the interview Patel lays out worked examples of how token-backed compute has replaced entire teams. He describes a reverse-engineering lab that used microscopes and manual analysis; a single engineer, leveraging GPU-accelerated inference, now extracts layer materials and runs finite-element analysis through a dashboard. He recounts an economist who pulled public and paid APIs, ran thousands of regression-style evaluations across economic task rubrics, and created a new benchmark of language models and a metric he called "phantom GDP."

"Phantom GDP... output can go up, but because cost falls so much, actually GDP theoretically shrinks," Patel explains, describing a new benchmark and a 2,000-eval rubric built with language models.

He also describes an energy-data use case: in a few weeks a team scraped power-plant and transmission data and built a dashboard that maps micro-regional deficits and surpluses—work that, in incumbent firms, had taken large teams and years.

Demand: frontier models, model hoarding, and token economics

Patel repeatedly returns to a single core intuition: access to the most capable models and to inference tokens is increasingly the limiting economic factor. He argues that customers prefer the frontier model even when cheaper alternatives suffice, and that enterprise contracts and rate-limit increases are decisive. What really matters is having an Anthropic rep and having an enterprise contract... because otherwise tokens are ultimately super super in demand, he says.

He warns of a concentration effect: those who can pay for early access and large token allotments will capture outsized value. He offers a hypothetical where a well‑connected firm purchases the first large tranche of tokens for each new model release, effectively capturing early economic advantages.

Implementation has become easy; idea selection is the bottleneck

Patel frames the transformation as an inversion of historic priorities. Where execution used to be difficult and ideas were cheap, modern AI has made implementation fast and relatively straightforward—so the scarce skill becomes choosing the right idea to implement and then monetizing its output. Now ideas are cheap and plentiful but execution is very easy... so really only the good ideas are the ones that can justify the spend on super cheap implementation, he summarizes.

Supply constraints across the stack: GPUs, memory, logic, and fab equipment

On the supply side Patel walks through concrete bottlenecks: GPUs and their renewal cycles, DRAM and NAND capacity limits, wafer‑fab lead times, and specialized upstream equipment. He says useful lives for GPU clusters are extending, prices are rising, and memory capacity cannot be scaled quickly—meaning price rises and margin expansions across the supply chain.

"Memory can only grow capacity low double-digit percentages a year... the incremental supply doesn't come till '28," he warns, adding that ASML, Carl Zeiss and other critical suppliers are sold out and that TSMC's capex plans will whip the downstream supply chain.

CPUs, RL environments, and the continuing appetite for tokens

Patel emphasizes that token demand is not only for GPUs running inference. Reinforcement-learning pipelines and complex simulation environments place heavy demand on CPUs. He explains that environments which generate trajectories for grading and reinforcement learning run on CPU infrastructure, and that deployed applications and surrounding services also consume large numbers of CPU cycles.

"Those environments run on CPUs... and once you have these great models... that useful output... runs on CPUs," he says.

Robotics and the bridge from software to physical scale

Looking beyond software, Patel expects robotics to follow as models become more sample-efficient. He predicts breakthroughs that enable few‑shot or few‑example robot adaptation—pretrained robot models that can be customized quickly—which will create another durable demand curve for tokens and compute on the physical side.

Value capture, phantom GDP, and the risk of a permanent underclass

Patel raises three linked challenges for workers and firms: using more tokens, generating economic value from them, and capturing that value. If individuals or organizations fail to do all three, he argues, they risk permanent under‑employment or marginalization as value concentrates among those who both deploy tokens and convert outputs into durable revenue. At the same time he notes that much of the value created by token-backed analysis is not easily measured by conventional GDP statistics—hence his reference to "phantom GDP."

Public backlash and the industry's communications problem

On social and political risk, Patel predicts an intensifying backlash as AI's disruptive effects spread. He suggests the industry needs to focus on concrete, uplifting present‑day use cases and improve how leaders communicate; otherwise fear and anger will fuel protests and political reactions. He warns that concentrated access to frontier models will intensify perceptions of inequality and accelerate social tension.

Closing prediction

When asked what will happen in the next three months, Patel offers a stark, short forecast: large‑scale protests and rising public anger targeted at major model labs and the companies building the new infrastructure. He calls for clearer public narratives about how AI is used today and for more visible, positive deployments that connect with broader communities.

References

Episode page and show notes: Podwise — The Supply and Demand of AI Tokens | Dylan Patel Interview (Invest Like the Best). (podwise.ai)

Podcast listing: Apple Podcasts — Dylan Patel: The Infinite Demand for Tokens, Claude Mythos, and Supply Constraints. (podcasts.apple.com)

Video and transcript references aggregated online: HackMD summary and original YouTube link. (hackmd.io)

Original episode video: YouTube — The Supply and Demand of AI Tokens | Dylan Patel Interview (Invest Like the Best).

Explore more exclusive insights at nextfin.ai.

Insights

What are AI tokens and their origins?

How do AI tokens impact businesses and society?

What recent trends are observed in AI token demand?

What are the current supply constraints in the AI hardware market?

What latest updates have been made regarding AI token economics?

What challenges do companies face in implementing AI tokens?

How does the concept of 'phantom GDP' relate to AI tokens?

What are key use cases of AI tokens mentioned by Dylan Patel?

What future developments are expected in AI token applications?

How does the AI token market compare with traditional token markets?

What are the implications of model hoarding in AI token economics?

What role do reinforcement learning environments play in AI token demand?

What communication strategies are recommended for the AI industry?

What potential long-term impacts could AI tokens have on employment?

What are the core difficulties in scaling GPU and memory supply?

What are the ethical concerns associated with AI token concentration?

How has SemiAnalysis adapted to the increasing demand for AI tokens?

What are the expected societal reactions to the rise of AI tokens?

What comparisons can be made between AI token strategies across different firms?

Search
NextFinNextFin
NextFin.Al
No Noise, only Signal.
Open App