NextFin

Spatial Intelligence: Fei‑Fei Li on Marble and the Next Frontier of AI

Summarized by NextFin AI
  • Dr. Fei-Fei Li emphasized the importance of spatial intelligence as a fundamental extension of AI systems, arguing that it is as foundational as language intelligence.
  • Marble, World Labs’ first-generation spatial intelligence model, utilizes multimodal inputs to create a fully navigable 3D environment, which contrasts with current video models.
  • Early applications of Marble span various sectors, including gaming, robotics, healthcare, and interior design, showcasing its versatility in creating immersive environments.
  • Li highlighted the need for a hybrid data strategy in training world models like Marble, due to the scarcity of 3D/4D data compared to text data.
NextFin News -

On February 3, 2026, at the Cisco AI Summit session "3D & AI," Dr. Fei‑Fei Li, CEO and co‑founder of World Labs, spoke with Jeetu Patel, President and Chief Product Officer of Cisco, about the company’s work on spatial intelligence and its first-generation model, Marble. The conversation was part of a program focused on how AI will move from capability to trusted real‑world impact.

Li used the stage to frame spatial intelligence as a fundamental extension of current AI systems and to introduce the design intent and early applications of Marble, World Labs’ multimodal world model.

Why spatial intelligence matters

Li began by placing spatial intelligence in an evolutionary context, arguing that perception and physical interaction are the original drivers of intelligence. As she put it, I wake up every day and think about just one thing and one thing only, which is spatial intelligence. She explained that seeing and touching the environment preceded language over hundreds of millions of years, and therefore the ability to "understand, to reason, to interact with and to navigate the real 3D, 4D physical world" is, in her view, as foundational as language intelligence.

What Marble is and how it works

When asked to describe Marble, Li said it is World Labs’ first‑generation spatial intelligence model. She emphasized Marble’s multimodal inputs and persistent 3D output: Whether it's just a sentence or it's a picture or it's a video or it's a few pictures, or it's a simple 3D input, it doesn't matter. It's multimodal. And then it turns that prompt into a fully navigable interactable world that is 3D, which is permanently consistent. She contrasted Marble with current video models, noting that Marble provides geometric structure so environments can be used for simulation, robotics training, or game creation.

Early use cases and surprise applications

Li listed multiple sectors already experimenting with Marble. She said users are developing games and using Marble in commercial virtual production and VFX workflows; robotics and simulation partners are using Marble as training environments; and architects and designers are applying it for interior design and other planning tasks. Li also described unexpected interest from clinical researchers: It turned out a lot of psychiatric and mental health research, as well as intervention, requires immersive environments that's personalized in the particular situation… Marble is just within every time you prompt within minutes, you get many different kind of environments that you can use. She further noted fitness and personalized training scenarios as additional early applications.

Data, scale, and how world models differ from language models

Li explained that world models face different data and engineering constraints than large language models. Text data is abundant and relatively clean, she said, whereas 3D/4D data (voxels, geometry, temporally consistent scenes) is messier and rarer. For this reason, World Labs adopts a hybrid data strategy: layering internet‑scale text, images and video with simulated data and captured real‑world 3D/4D datasets. As she summarized, We don't have large amount of 3D, 4D data to train our model. So what we have to do is to take the hybrid approach of layering multi modality of data. Li also acknowledged that Marble is currently smaller than the largest language models and that the field of world models is younger, but she expects a rapid advance as architectures, data and compute mature.

Robotics and the challenge of general-purpose embodied agents

On robotics, Li highlighted the complexity of moving from two-dimensional, wheeled vehicles to three‑dimensional manipulators and humanoid forms. She contrasted the relative simplicity of cars with the high dimensionality of robots that must touch and manipulate objects without breaking them. She emphasized that simulation—particularly realistic hand and dexterity simulation—remains hard and that world models like Marble can serve as a rich simulation substrate for training and testing robotic policies. Still, she cautioned that the journey is long: Just because the North Star is clear doesn't mean the journey is short.

Reflections on AI’s pace and public conversation

Li reflected on the rapid pace of AI development and the mixed emotions it creates. She described both excitement and humility: the speed of progress is breathtaking, yet it heightens awareness of how much remains unknown. She warned against polarized rhetoric and urged a more nuanced, responsible discourse: Let's be more nuanced. Let's be benevolent. Let's have the optimism of using technology for good, but the sense of responsibility of using that responsibly.

What success looks like

Asked what success for AI would look like in the coming years, Li offered a civilizational framing rather than narrow metrics. She compared imagining AI's benefits to imagining the effects of electrification: better lit schools, warmer homes, longer life expectancy, and broader access to learning. For Li, success means technology that helps individuals pursue prosperity and dignity: Success looks like when civilization is better and civilization is made by every single individual pursuing happiness, pursuing prosperity, and pursuing with a sense of dignity.

Guidance for enterprises and invitations to partners

Li positioned spatial intelligence and world models as horizontal, enterprise‑facing technologies that touch many industries: robotics, simulation, VFX and game production, healthcare, education, field services, finance, agriculture, manufacturing, inspection, warehousing, and urban planning. She encouraged enterprises to engage with World Labs and with the technology more broadly as it moves from early prototypes to practical workflows.

References and related materials:

Cisco AI Summit 2026 - Program

World Labs - Bringing Marble to Life

TechCrunch - Fei‑Fei Li's World Labs speeds up the world model race with Marble

PYMNTS - Fei‑Fei Li Says AI Progress Now Depends on Physical Context

Financial Times - Fei‑Fei Li: AI is incomplete without spatial intelligence

Explore more exclusive insights at nextfin.ai.

Insights

What are the core concepts behind spatial intelligence?

How did spatial intelligence evolve in AI systems?

What technical principles underpin Marble's operation?

What is the current market status of spatial intelligence technologies?

What feedback are users providing on Marble's functionality?

What industry trends are emerging around spatial intelligence?

What recent updates have been made regarding Marble and its applications?

What policy changes are affecting the development of spatial intelligence?

What future directions could spatial intelligence technologies take?

What long-term impacts could Marble have on various industries?

What challenges does Marble face in its development and adoption?

What controversies surround the use of spatial intelligence in AI?

How does Marble compare to traditional language models?

What historical cases illustrate the evolution of spatial intelligence?

What are some unexpected applications of Marble that have emerged?

How do different industries utilize Marble in practical scenarios?

What lessons can be learned from Marble's implementation in healthcare?

What are the key differences between 3D data and traditional text data?

Search
NextFinNextFin
NextFin.Al
No Noise, only Signal.
Open App