Spatial Intelligence: The Next Frontier Beyond Language AI

Latent Space: The AI Engineer Podcast · November 25, 2025 · Listen to Original Episode →

Original Title: The PhD Student & Professor Reinventing AI: Fei-Fei Li & Justin Johnson on Spatial Intelligence

Related Episodes

Spatial Intelligence: AI's Next Frontier Beyond Language Models

Dec 05, 2025 The a16z Show

AI is shifting from language to spatial intelligence, enabling AI to understand and interact with the 3D world, unlocking new possibilities in gaming, VFX, and robotics.

View Episode Notes →

Spatial Intelligence: Beyond LLMs to Generative 3D Worlds

Nov 25, 2025 Latent Space: The AI Engineer Podcast

Unlock AI's next frontier: spatial intelligence. Discover how generative world models like Marble move beyond LLMs to create and interact with rich 3D environments, powered by massive compute.

View Episode Notes →

Category Theory: A Principled Framework for AI Computation

Dec 22, 2025 Machine Learning Street Talk (MLST)

AI fundamentally fails at arithmetic due to pattern matching, not understanding. Category theory provides a principled, scientific framework for AI, moving beyond trial-and-error to true computational understanding.

View Episode Notes →

Mathematical Foundations and Conceptual Leaps Drive AI Advancement

Nov 24, 2025 Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

Scaling AI with more data and compute won't achieve generalized intelligence; conceptual breakthroughs beyond pattern recognition are needed.

View Episode Notes →

Crafting a Meaningful Life Through Iterative Refinement and Disciplined Focus

Dec 01, 2025 Deep Questions with Cal Newport

Iterate on life decisions like a writer revises drafts, gathering real-world feedback to refine your vision and achieve profound subjective experiences.

View Episode Notes →

AI's Scaling Plateau: The Urgent Return to Foundational Research

Nov 25, 2025 Dwarkesh Podcast

AI is transitioning from scaling to research, facing a generalization gap between benchmarks and real-world utility, demanding novel training for true intelligence.

View Episode Notes →

Resources

Resources & Recommendations

Books

Fei-Fei Li's book (Title not specified) - Fei-Fei Li mentioned writing about the simultaneous discovery of image captioning by her lab and Google.

People Mentioned

Yann LeCun - Mentioned as a prominent proponent of world models.
John Markoff (Reporter for New York Times) - Broke the story about the independent discoveries of image captioning by Google and Fei-Fei Li's lab.
Andre (Fei-Fei Li's PhD student) - Collaborated with Fei-Fei Li and Justin Johnson on early image captioning and dense captioning research.
Howard Gardner (Psychologist) - Mentioned for his theory of multiple intelligences, which includes linguistic and spatial intelligence.
Francis Crick - Co-discoverer of the DNA double helix, mentioned in the context of spatial reasoning for understanding 3D molecular structures.
James Watson - Co-discoverer of the DNA double helix, mentioned in the context of spatial reasoning for understanding 3D molecular structures.
Sir Isaac Newton - Referenced for his laws of physics, particularly gravity, and the interplay between empirical spatial understanding and formal language.
Dario (Likely Dario Amodei) - Mentioned for his analogy of a "data center full of Einsteins" in the context of traditional intelligence.

Organizations & Institutions

University of Michigan, Ann Arbor - Justin Johnson was a professor there after his PhD.
Meta - Justin Johnson worked there after his PhD.
Stanford's Institute for Human-Centered AI (HAI) - Fei-Fei Li is a founding director/co-director, involved in advocating for public sector and academic AI work.
Google - Simultaneously developed image captioning technology with Fei-Fei Li's lab.
Harvard - Mentioned for a research paper on inductive bias in world models.

Research & Studies

Behavior (Stanford Lab's open dataset and benchmark) - An open dataset and benchmark for robotic learning in simulated environments, developed by Fei-Fei Li's Stanford lab.
Image Captioning Paper (CVPR 2015) - The first paper by Andre (Fei-Fei Li's student) on generating sentences for images by combining convolutional neural networks and LSTMs.
Language Modeling Paper (ICLR 2015) - A paper by Justin Johnson and Andre on training RNN language models and analyzing their internal representations, including on the Linux source code.
Dense Captioning Paper (CVPR 2016) - A paper by Justin Johnson, Andre, and Fei-Fei Li on a system that draws bounding boxes around objects in an image and generates a text snippet for each.
Inductive Bias to Poor World Models (Harvard paper) - A paper that explored how LLMs, when fed orbital patterns, could predict orbits but failed to generate correct force vectors, highlighting limitations in understanding underlying physics.
RTFM model (World Labs) - An internal model at World Labs that generates frames one at a time as a user interacts with the system.

Tools & Software

AlexNet - A convolutional neural network that demonstrated the power of deep learning for image recognition, mentioned as a historical turning point.
ImageNet - A large visual database that was crucial for training AlexNet and advancing computer vision.
GPUs (Graphics Processing Units) - Essential hardware for the rise of deep learning, enabling the scaling of compute.
Convolutional Neural Networks (ConvNet) - Used for representing images in the image captioning work.
LSTM (Long Short-Term Memory) - An early sequential model used in language processing and image captioning.
PyTorch - A deep learning framework, mentioned in the context of building neural networks.
Transformers - A neural network architecture, discussed as a model of sets rather than strictly sequences.
Gaussian Splats - The native output format for 3D worlds generated by Marble, described as tiny, semi-transparent particles with position and orientation.
Sora - Mentioned as a model speculated to use physics engines for video generation.
Genie 3 - Explicitly mentioned as a video game-like system, speculated to use physics engines for video generation.

Websites & Online Resources

World Labs Homepage - Mentioned as a place to find more information about their work and a specific page called "Marble Labs" showcasing use cases.
Marble Labs - A specific page on the World Labs website that showcases different use cases for Marble, including visual effects, gaming, and simulation.