NextFin News - On April 23, 2026, Dylan Patel, founder and CEO of SemiAnalysis, joined Patrick O'Shaughnessy on the Invest Like the Best podcast to discuss the accelerating supply and demand dynamics for AI tokens, the bottlenecks in the hardware and infrastructure stack, and what those trends mean for businesses and society. The conversation was released as an episode of Invest Like the Best and published on April 23, 2026.
(podwise.ai)Patel speaks from the front lines of the semiconductor and AI-infrastructure market. His remarks describe both concrete customer use cases his firm has built and a broader macroeconomic view about where token-driven value is concentrating and how supply constraints will shape the coming years.
Surging AI spend at SemiAnalysis and new class of users
Patel opens with a description of a dramatic change in his own firm's usage: what had been tens of thousands of dollars of AI/cloud spend last year has scaled into a multi-million-dollar annualized rate. We signed an enterprise contract with Anthropic and it's gone to the point where now... we're spending $7 million a year on cloud code at the current rate,
he says, and notes that this is already a very large fraction of the company's payroll expense. He attributes the acceleration to a new class of non-technical users and small teams applying large models to tasks that used to require entire specialized groups.
"One person on the team... with a couple thousand dollars of cloud tokens... created this application that is GPU accelerated... and anytime we send it an image, it's able to... overlay where every single material is."
Concrete use cases: reverse engineering, economic benchmarking, and grid mapping
Throughout the interview Patel lays out worked examples of how token-backed compute has replaced entire teams. He describes a reverse-engineering lab that used microscopes and manual analysis; a single engineer, leveraging GPU-accelerated inference, now extracts layer materials and runs finite-element analysis through a dashboard. He recounts an economist who pulled public and paid APIs, ran thousands of regression-style evaluations across economic task rubrics, and created a new benchmark of language models and a metric he called "phantom GDP."
"Phantom GDP... output can go up, but because cost falls so much, actually GDP theoretically shrinks," Patel explains, describing a new benchmark and a 2,000-eval rubric built with language models.
He also describes an energy-data use case: in a few weeks a team scraped power-plant and transmission data and built a dashboard that maps micro-regional deficits and surpluses—work that, in incumbent firms, had taken large teams and years.
Demand: frontier models, model hoarding, and token economics
Patel repeatedly returns to a single core intuition: access to the most capable models and to inference tokens is increasingly the limiting economic factor. He argues that customers prefer the frontier model even when cheaper alternatives suffice, and that enterprise contracts and rate-limit increases are decisive. What really matters is having an Anthropic rep and having an enterprise contract... because otherwise tokens are ultimately super super in demand,
he says.
He warns of a concentration effect: those who can pay for early access and large token allotments will capture outsized value. He offers a hypothetical where a well‑connected firm purchases the first large tranche of tokens for each new model release, effectively capturing early economic advantages.
Implementation has become easy; idea selection is the bottleneck
Patel frames the transformation as an inversion of historic priorities. Where execution used to be difficult and ideas were cheap, modern AI has made implementation fast and relatively straightforward—so the scarce skill becomes choosing the right idea to implement and then monetizing its output. Now ideas are cheap and plentiful but execution is very easy... so really only the good ideas are the ones that can justify the spend on super cheap implementation,
he summarizes.
Supply constraints across the stack: GPUs, memory, logic, and fab equipment
On the supply side Patel walks through concrete bottlenecks: GPUs and their renewal cycles, DRAM and NAND capacity limits, wafer‑fab lead times, and specialized upstream equipment. He says useful lives for GPU clusters are extending, prices are rising, and memory capacity cannot be scaled quickly—meaning price rises and margin expansions across the supply chain.
"Memory can only grow capacity low double-digit percentages a year... the incremental supply doesn't come till '28," he warns, adding that ASML, Carl Zeiss and other critical suppliers are sold out and that TSMC's capex plans will whip the downstream supply chain.
CPUs, RL environments, and the continuing appetite for tokens
Patel emphasizes that token demand is not only for GPUs running inference. Reinforcement-learning pipelines and complex simulation environments place heavy demand on CPUs. He explains that environments which generate trajectories for grading and reinforcement learning run on CPU infrastructure, and that deployed applications and surrounding services also consume large numbers of CPU cycles.
"Those environments run on CPUs... and once you have these great models... that useful output... runs on CPUs," he says.
Robotics and the bridge from software to physical scale
Looking beyond software, Patel expects robotics to follow as models become more sample-efficient. He predicts breakthroughs that enable few‑shot or few‑example robot adaptation—pretrained robot models that can be customized quickly—which will create another durable demand curve for tokens and compute on the physical side.
Value capture, phantom GDP, and the risk of a permanent underclass
Patel raises three linked challenges for workers and firms: using more tokens, generating economic value from them, and capturing that value. If individuals or organizations fail to do all three, he argues, they risk permanent under‑employment or marginalization as value concentrates among those who both deploy tokens and convert outputs into durable revenue. At the same time he notes that much of the value created by token-backed analysis is not easily measured by conventional GDP statistics—hence his reference to "phantom GDP."
Public backlash and the industry's communications problem
On social and political risk, Patel predicts an intensifying backlash as AI's disruptive effects spread. He suggests the industry needs to focus on concrete, uplifting present‑day use cases and improve how leaders communicate; otherwise fear and anger will fuel protests and political reactions. He warns that concentrated access to frontier models will intensify perceptions of inequality and accelerate social tension.
Closing prediction
When asked what will happen in the next three months, Patel offers a stark, short forecast: large‑scale protests and rising public anger targeted at major model labs and the companies building the new infrastructure. He calls for clearer public narratives about how AI is used today and for more visible, positive deployments that connect with broader communities.
References
Episode page and show notes: Podwise — The Supply and Demand of AI Tokens | Dylan Patel Interview (Invest Like the Best). (podwise.ai)
Podcast listing: Apple Podcasts — Dylan Patel: The Infinite Demand for Tokens, Claude Mythos, and Supply Constraints. (podcasts.apple.com)
Video and transcript references aggregated online: HackMD summary and original YouTube link. (hackmd.io)
Original episode video: YouTube — The Supply and Demand of AI Tokens | Dylan Patel Interview (Invest Like the Best).
Explore more exclusive insights at nextfin.ai.

