Fei‑Fei Li: From ImageNet to World Models — Why Spatial Intelligence Is AI’s Next Frontier

NextFin News - Fei‑Fei Li spoke at the Stanford Institute for Economic Policy Research (SIEPR) Policy Forum "AI & the Economy" on November 20, 2025, at the John A. and Cynthia Fry Gunn Building on the Stanford campus. The session, titled "Beyond Words: Building AI for the Physical World," was moderated by Neale Mahoney of SIEPR and ran in the 5:35–6:05pm PST slot on the forum's schedule.

In a wide‑ranging conversation that moved from personal history to technical priorities and policy concerns, Li mapped how recent advances in AI led to a new frontier she calls spatial intelligence — the ability for machines to understand, simulate and act in three‑dimensional, time‑varying space. The excerpts below focus on the core statements she made during the session.

Entering AI and the arc of progress

Li recalled entering the field "precisely 25 years ago" during the AI winter and described that early curiosity as a scientific pursuit rather than a commercial one. She framed the long arc of the field this way: "I did enter the field of AI actually precisely 25 years ago and that was smack in the middle of AI winter... as a student, I was so fascinated by asking the most audacious questions... what is intelligence, can machines think."

On recent advances she said some developments were expected and others faster than anticipated. She cited work on image captioning around 2015 with a former student, Andrej Karpathy, as an early taste of unexpectedly rapid progress: "we did this series of work called image captioning... the ability to look at any image and be able to tell not only there's a chair, there's a cat, but be able to tell a story out of it... I thought that by the time I retire... it would take a hundred years. But 2015... pretty much solved that problem."

Surprises and the impact of large language models

Li acknowledged that technical building blocks for LLMs had been in development for some time but stressed the surprise of seeing those models become a societal, industrial force: "we know the transformer papers were written, we know the scaling law... but seeing large language model in the hands of the global population and seeing how fast that turned into industrial changing forces was a major epiphany for all of us."

Disappointments: hyperbole and human‑centered responsibilities

Li expressed disappointment with the extreme rhetoric surrounding AI on both ends of the spectrum. She described herself as "the most boring speaker in AI these days" because of her concern about exaggerated claims. She warned against both apocalyptic narratives and utopian promises, saying we should treat AI with sobriety: "This is a tool. It's a very powerful tool. It is a double‑edged sword. It is very much ours—humans’—responsibility to take it for good, use it for good, govern it for good. But also to prevent the misuses or the unintended consequences."

Defining spatial intelligence

Li defined spatial intelligence as a core, non‑linguistic strand of cognition: "So much of our human intelligence... rests upon our ability to perceive, to reason, and to do things, to move around and interact." She emphasized the evolutionary depth of perception versus the relatively recent emergence of language: "perception started about 540 million years ago... language is evolutionarily about a few hundred thousand years old." According to Li, unlocking spatial intelligence links seeing to doing and imagination to creation.

Concrete examples and early impact areas

Li identified industries and applications likely to benefit earliest from spatial intelligence. She highlighted creativity and storytelling—film, gaming, immersive experiences—as immediate beneficiaries because creating 3D worlds is currently time‑intensive: "creativity and storytelling is perhaps one of the first industry that's going to benefit... it's so painful to create 3D worlds for storytelling." She also named simulation, robotic learning and education as areas where spatial models could remove current data bottlenecks and offer new experiential learning opportunities.

Technical difficulty and data challenges

Li acknowledged that spatial intelligence is a harder problem than language because of the dimensional and multimodal nature of space. She referenced the role of large datasets in the success of language models and noted the additional complexity for spatial models: "The Bitter Lesson... wherever there is large amount of data we unlock major progress in AI... spatial intelligence is profoundly 3D—it's 4D if you add time to it." She described a hybrid approach to data for building practical world models and said the technical journey will be gradual.

World Labs and Marble — early steps toward world models

Referencing her startup work, Li described World Labs' product efforts as initial, public steps toward spatially capable world models. She said World Labs created a spatial intelligence model called Marble and made it available to the public as a way to demonstrate 3D world generation and editing capabilities: "I'm running a startup called World Labs. We are the first company who has created a spatial intelligence model called Marble that is in the hands of everybody in the public that you can play with it." She framed Marble as part of a hybrid data and product approach to move the field forward.

Collaboration with economists and interdisciplinary work

Li urged deeper collaboration between technologists and economists. She praised SIEPR‑HAI ties at Stanford and called for joint work across disciplines: "these two areas, disciplines, are just coming closer and closer... The economic impact, the technological impact of every single industry... is just so profound." She also noted the practical challenge that economists often work with data that lags, and urged economists to engage directly with cutting‑edge technical developments to remain contemporary.

Public education, AI literacy, and policy worries

On public education and equity she stressed an institutional responsibility: "public education is one of the most important thing, an underappreciated critical task for this moment in history... we have an outsized responsibility in public education." Li described a range of HAI and SIEPR activities—policy briefs, congressional boot camps, testimony and engagement with governments—as part of that effort.

She concluded with policy concerns about concentration of resources and weakening public investment: "I worry about monopoly. I worry about the lack of entrepreneurship. I worry about... lack of resourcing to public sector... AI has this tendency to concentrate resource... and we are seeing a really shrinking of resourcing in public sector which will have a profound negative impact."

Audience questions and closing

During audience Q&A Li answered questions on AI literacy, on the comparative advantage of firms with mobility data and on specific policy needs for spatial intelligence. On literacy she reiterated the urgency of public education and Stanford's role: "We really need to use whatever way we can to partner and educate... this is the beginning of the change." On policy she repeated concerns about a healthy ecosystem: "America is built on a healthy ecosystem... and right now AI has this tendency to concentrate resource... we are seeing a really shrinking of resourcing in public sector."

References and further reading:

Explore more exclusive insights at nextfin.ai.

Fei‑Fei Li: From ImageNet to World Models — Why Spatial Intelligence Is AI’s Next Frontier

Entering AI and the arc of progress

Surprises and the impact of large language models

Disappointments: hyperbole and human‑centered responsibilities

Defining spatial intelligence

Concrete examples and early impact areas

Technical difficulty and data challenges

World Labs and Marble — early steps toward world models

Collaboration with economists and interdisciplinary work

Public education, AI literacy, and policy worries

Audience questions and closing

Insights

What are the origins and concepts behind spatial intelligence?

What technical principles underpin the development of spatial intelligence models?

How has the perception of AI evolved since Fei-Fei Li entered the field 25 years ago?

What industries are currently benefiting from advances in spatial intelligence?

What user feedback has been gathered regarding the Marble spatial intelligence model?

What recent advancements have been made in large language models (LLMs) relevant to spatial intelligence?

What are the latest updates on policy changes affecting AI and spatial intelligence?

What potential future developments can we expect in the field of spatial intelligence?

What challenges currently hinder the progress of spatial intelligence technology?

How do spatial intelligence models compare to traditional AI models in complexity?

What controversies exist surrounding the implementation of AI in public education?

What role does interdisciplinary collaboration play in advancing spatial intelligence?

How has public investment in AI changed in recent years, according to Fei-Fei Li?

What are the implications of monopolies in the AI sector as discussed by Li?

What case studies illustrate the impact of spatial intelligence on creativity and storytelling?

How does Li propose to address the data challenges faced by spatial intelligence models?

What is the significance of AI literacy as emphasized by Fei-Fei Li?

What are the expected long-term impacts of spatial intelligence on industries like gaming and film?