Spatial Intelligence: AI's Next 3D Frontier Unlocked
The Frontier of Spatial Intelligence with Fei-Fei Li
Resources
Books
- "The Bitter Lesson" - This concept suggests that algorithms should be designed to leverage available compute power, as compute tends to advance more predictably than algorithmic cleverness.
Research & Studies
- "Alexnet" (2012 paper) - This paper is credited with a breakthrough moment in computer vision for deep learning, demonstrating a deep neural network's success on the ImageNet challenge.
- "Neural Radiance Fields" (NeRF) (Ben Mildenhall) - This paper presented a clear method for reconstructing 3D structure from 2D observations, significantly impacting the field of 3D computer vision.
- "A Neural Algorithm of Artistic Style" (Leigh Gatys) - This 2015 paper demonstrated the ability to transfer artistic styles to real-world photographs using neural networks, a precursor to generative AI.
People Mentioned
- Fei-Fei Li (Co-founder of World Labs) - A prominent researcher in AI and computer vision, known for her work on ImageNet and her current focus on spatial intelligence.
- Justin Johnson (Co-founder of World Labs) - A researcher who made significant contributions to generative AI and spatial intelligence.
- Martin Casado (a16z General Partner) - Co-host of the discussion, providing insights from the venture capital perspective.
- Honglak Lee (Google Brain) - Mentioned in relation to an early influential paper on deep learning.
- Andrew Ng (Google Brain) - Mentioned in relation to an early influential paper on deep learning and teaching machine learning.
- Pietro Perona (Caltech) - The undergraduate advisor for Justin Johnson and the PhD advisor for Fei-Fei Li.
- Daphne Koller - Mentioned as an instructor of a complicated Bayesian modeling course.
- Jeff Hinton - Mentioned for having generative model papers.
- Leigh Gatys - Lead author of the paper on artistic style transfer.
- Christoph Lasner (Co-founder of World Labs) - Recognized for his work in computer graphics and a precursor to Gaussian Splat representations.
- Ben Mildenhall (Co-founder of World Labs) - Known for his seminal work on NeRF.
Organizations & Institutions
- World Labs - A company focused on developing spatial intelligence for machines.
- Caltech - The alma mater of both Justin Johnson and Fei-Fei Li for their undergraduate studies.
- Stanford - Where Fei-Fei Li was a professor and where the ImageNet project was significantly developed.
- Google Brain - Where early influential papers in deep learning were published.
- OpenAI - Mentioned in the context of large language models and multimodal models.
- Nvidia - Mentioned for its high-performance GPUs.
- Fair (Meta AI) - Where Justin Johnson worked on 3D computer vision.
Websites & Online Resources
- ImageNet - A large-scale dataset of images used for computer vision research, instrumental in the development of modern computer vision.
- Archive - A pre-print server where research papers are often first published.
- X (formerly Twitter) - Mentioned as a platform for following a16z.
- a16z.com - The website for a16z, including disclosures.
- a16z.substack.com - A substack newsletter for a16z.
Other Resources
- The Cat Paper - An early, famous paper on deep learning from Google Brain.
- The "Bitter Lesson" - A concept in AI that highlights the importance of compute over specific algorithmic cleverness.
- Transformer's Paper (Attention) - An algorithmic unlock that has been foundational for modern AI, particularly language models.
- Stable Diffusion - A generative AI model mentioned as a key unlock in the current wave of AI.
- CLIP - A model mentioned in the context of using internet data and human labeling (alt tags) for image understanding.
- GANs (Generative Adversarial Networks) - A type of generative model that was difficult to use, requiring structured input like scene graphs.
- LSTM (Long Short-Term Memory) - A type of recurrent neural network architecture used before transformers.
- RNN (Recurrent Neural Network) - A type of neural network architecture.
- GRU (Gated Recurrent Unit) - A type of recurrent neural network architecture.
- GPT-2 - A large language model mentioned as requiring significant resources to train.
- Scene Graphs - A structured way of representing objects and their relationships, used as input for early generative models.
- Gaussian Splat Representation - A 3D modeling technique that has recently gained traction.
- VR Headset - Mentioned as a transformative technology experience.
- Apple Vision Pro - A spatial computing device released by Apple.