Spatial Intelligence: The Next Frontier Beyond Language AI
Resources
Resources & Recommendations
Books
- Fei-Fei Li's book (Title not specified) - Fei-Fei Li mentioned writing about the simultaneous discovery of image captioning by her lab and Google.
People Mentioned
- Yann LeCun - Mentioned as a prominent proponent of world models.
- John Markoff (Reporter for New York Times) - Broke the story about the independent discoveries of image captioning by Google and Fei-Fei Li's lab.
- Andre (Fei-Fei Li's PhD student) - Collaborated with Fei-Fei Li and Justin Johnson on early image captioning and dense captioning research.
- Howard Gardner (Psychologist) - Mentioned for his theory of multiple intelligences, which includes linguistic and spatial intelligence.
- Francis Crick - Co-discoverer of the DNA double helix, mentioned in the context of spatial reasoning for understanding 3D molecular structures.
- James Watson - Co-discoverer of the DNA double helix, mentioned in the context of spatial reasoning for understanding 3D molecular structures.
- Sir Isaac Newton - Referenced for his laws of physics, particularly gravity, and the interplay between empirical spatial understanding and formal language.
- Dario (Likely Dario Amodei) - Mentioned for his analogy of a "data center full of Einsteins" in the context of traditional intelligence.
Organizations & Institutions
- University of Michigan, Ann Arbor - Justin Johnson was a professor there after his PhD.
- Meta - Justin Johnson worked there after his PhD.
- Stanford's Institute for Human-Centered AI (HAI) - Fei-Fei Li is a founding director/co-director, involved in advocating for public sector and academic AI work.
- Google - Simultaneously developed image captioning technology with Fei-Fei Li's lab.
- Harvard - Mentioned for a research paper on inductive bias in world models.
Research & Studies
- Behavior (Stanford Lab's open dataset and benchmark) - An open dataset and benchmark for robotic learning in simulated environments, developed by Fei-Fei Li's Stanford lab.
- Image Captioning Paper (CVPR 2015) - The first paper by Andre (Fei-Fei Li's student) on generating sentences for images by combining convolutional neural networks and LSTMs.
- Language Modeling Paper (ICLR 2015) - A paper by Justin Johnson and Andre on training RNN language models and analyzing their internal representations, including on the Linux source code.
- Dense Captioning Paper (CVPR 2016) - A paper by Justin Johnson, Andre, and Fei-Fei Li on a system that draws bounding boxes around objects in an image and generates a text snippet for each.
- Inductive Bias to Poor World Models (Harvard paper) - A paper that explored how LLMs, when fed orbital patterns, could predict orbits but failed to generate correct force vectors, highlighting limitations in understanding underlying physics.
- RTFM model (World Labs) - An internal model at World Labs that generates frames one at a time as a user interacts with the system.
Tools & Software
- AlexNet - A convolutional neural network that demonstrated the power of deep learning for image recognition, mentioned as a historical turning point.
- ImageNet - A large visual database that was crucial for training AlexNet and advancing computer vision.
- GPUs (Graphics Processing Units) - Essential hardware for the rise of deep learning, enabling the scaling of compute.
- Convolutional Neural Networks (ConvNet) - Used for representing images in the image captioning work.
- LSTM (Long Short-Term Memory) - An early sequential model used in language processing and image captioning.
- PyTorch - A deep learning framework, mentioned in the context of building neural networks.
- Transformers - A neural network architecture, discussed as a model of sets rather than strictly sequences.
- Gaussian Splats - The native output format for 3D worlds generated by Marble, described as tiny, semi-transparent particles with position and orientation.
- Sora - Mentioned as a model speculated to use physics engines for video generation.
- Genie 3 - Explicitly mentioned as a video game-like system, speculated to use physics engines for video generation.
Websites & Online Resources
- World Labs Homepage - Mentioned as a place to find more information about their work and a specific page called "Marble Labs" showcasing use cases.
- Marble Labs - A specific page on the World Labs website that showcases different use cases for Marble, including visual effects, gaming, and simulation.