The AI models we deploy today are like characters in Memento, stuck in a perpetual present, unable to truly learn from their experiences. While current methods like extensive context windows and retrieval-augmented generation (RAG) offer sophisticated workarounds, they mask a fundamental limitation: these systems are largely frozen after their initial training. This conversation with Malika Abakirova, partner on the AI infrastructure team at a16z, reveals the non-obvious implications of this paradigm, highlighting that the true "ultimate test" for AI lies not in reasoning or generation, but in its capacity for genuine, on-the-job learning and improvement, much like humans. Anyone building or relying on AI, from researchers to product managers, will gain a critical advantage by understanding the limitations of current approaches and the emerging landscape of continual learning.
The Frozen Present: Why In-Context Learning Isn't Enough
The current AI landscape is dominated by what's termed "in-context learning," a paradigm where models can process and respond to information provided within their immediate context window. This approach, exemplified by tools that leverage file systems or extensive prompts, has undeniably delivered impressive results. However, as Malika Abakirova explains, this is akin to applying sticky notes and tattoos to a patient with amnesia, as depicted in Memento. The core model remains static; it doesn't fundamentally update its knowledge or capabilities based on new interactions.
This limitation becomes starkly apparent in scenarios requiring true adaptation. Consider adversarial security, where a new "jailbreak" attack emerges. Simply updating the system prompt--the AI equivalent of a note--won't suffice if the model's underlying parameters are already tuned to be helpful in ways that can be exploited. The knowledge about the new attack vector needs to be embedded more deeply, within the model's weights, which are inaccessible to the attacker. Similarly, when a software library like React undergoes a breaking change, a model trained on an older version will struggle. No amount of context can override the deeply ingrained "knowledge" of the non-existent old function.
"The model is basically frozen, but new experiences and knowledge still persist. Humans are not AI, but we still learn on the job; we learn from experience, and that's what makes humans unique."
The problem isn't that these systems fail to retrieve information, but that they fail to learn from it in a way that permanently alters their behavior or knowledge base. We've built elaborate scaffolding--agent harnesses, RAG systems, system prompts--to compensate for this inherent lack of adaptability. While these workarounds are effective, they raise a critical question: are we merely papering over a fundamental limitation, or have we hit the ceiling of what this frozen-paradigm can achieve? The implication is that relying solely on these external mechanisms might be a temporary fix, creating a fragile system that struggles with novel, evolving challenges. The true innovation lies in moving beyond these external layers to models that can genuinely learn and adapt.
The Compaction Spectrum: From Non-Parametric to Parametric Evolution
The conversation delves into a framework for understanding where "learning" or, more precisely, "compaction" occurs within AI systems, categorizing it into three buckets: context, modules, and weights. This spectrum reveals the trade-offs and limitations inherent in each approach, offering a systems-level view of AI development.
Context (Non-Parametric Learning): This is the realm of in-context learning, where models leverage external data stores or large context windows. Companies like Pinecone building RAG systems and Latta or Mantis creating agent harnesses fall here. The primary constraint is the finite context length. The challenge is not if it works, but how to use that limited context most efficiently. This approach is powerful for accessing vast amounts of information but lacks the ability to permanently integrate new knowledge into the model's core. It’s like a student who can look up answers in a textbook but doesn't internalize the concepts.
Modules (Hybrid Approaches): This middle ground explores updating parts of the model without retraining the entire thing. The mention of a Stanford paper on updating KV caches hints at techniques that modify specific components of the model's memory or processing. This offers a potential balance, allowing for more dynamic adaptation than pure context-based methods, but without the full complexity of weight updates. It’s akin to a student who can update specific chapters in their notes but not rewrite the entire textbook.
Weights (Parametric Learning): This is the frontier of true continual learning, where the model's core parameters are updated through experience. This is where the most profound, yet nascent, progress is being made. Abakirova notes that this field is still in its early stages, with various teams exploring different paradigms. Some focus on reinforcement learning data and systems, while others question the fundamental transformer architecture itself, suggesting that novel architectures are needed for genuine, continuous learning. This is the ideal state: a student who not only looks up information but truly understands, integrates, and builds upon it, becoming demonstrably smarter over time.
The critical insight here is that while all these mechanisms involve learning, the depth and permanence of that learning vary dramatically. The current reliance on context is a workaround, not a solution for continuous improvement. The real advantage lies in developing systems that can genuinely update their weights, creating a learning loop where use directly translates to improvement, mirroring human cognitive development.
The Memento Metaphor and the Future of AI
The recurring Memento metaphor is not just a stylistic choice; it's a potent encapsulation of the current AI paradigm's core limitation: a frozen present. The protagonist, Leonard Shelby, cannot form new memories, forcing him to rely on external aids to navigate his life. Similarly, today's AI models, after their initial training cutoff, operate without the ability to genuinely integrate new experiences into their core understanding.
Malika Abakirova posits that the "ultimate test" for AI is its capacity for continual learning--the ability to learn on the job and improve through use, just as humans do. This isn't about achieving Artificial General Intelligence (AGI) in a brute-force sense, but about replicating the more nuanced, adaptive learning process that defines human intelligence. The development of benchmarks specifically designed to measure continual learning, as researchers at Berkeley and other labs are undertaking, is crucial for defining and advancing this capability.
The implication is that the very definition of an AI "model" may need to evolve. Instead of static entities trained once, we might need to think of them as dynamic systems that are constantly refining themselves. This shift has profound consequences for how we build, deploy, and interact with AI. It suggests that the most advanced AI won't just be the most powerful in a single moment, but the one that demonstrably gets better and more capable over time, adapting to new information and user interactions.
"Is there a system that is able to learn on the job and get better through use, just like humans? I think that would be kind of the question."
The emergence of early examples of "on-the-job learning," like test-time training that allows models to adapt to out-of-distribution data, offers a glimpse into this future. These aren't just incremental improvements; they represent a fundamental departure from the frozen model paradigm. The competitive advantage will accrue to those who can build or leverage systems that don't just respond to the world but actively learn from it, creating a virtuous cycle of improvement and adaptation.
- Embrace the Memento Analogy: Recognize that current AI models are largely "frozen" after training. Understand the limitations this imposes on their ability to adapt to new information or evolving circumstances.
- Prioritize Weight Updates over Context Augmentation: While RAG and large context windows are useful, focus on research and development that aims to update the model's core parameters (weights) for true, lasting learning. This is a longer-term investment but offers greater durability.
- Explore Hybrid Learning Mechanisms: Investigate and experiment with "module" level updates or other middle-ground approaches that offer more dynamic adaptation than purely non-parametric methods, but with less complexity than full weight retraining.
- Develop Continual Learning Benchmarks: Support or contribute to the creation of robust benchmarks that accurately measure an AI's ability to learn on the job and improve through use, moving beyond static performance metrics.
- Redefine "Model" as a Dynamic System: Shift the mindset from viewing AI models as static artifacts to dynamic systems capable of continuous improvement and adaptation. This requires a strategic re-evaluation of deployment and update cycles.
- Invest in "On-the-Job" Learning Capabilities: Seek out or build AI systems that demonstrate early signs of test-time adaptation and out-of-distribution learning. This capability will be a key differentiator for systems facing unpredictable real-world environments.
- Foster Cross-Disciplinary Research: Encourage collaboration between AI researchers, engineers, and cognitive scientists to bridge the gap between current AI limitations and the nuanced learning processes observed in humans. This is a multi-year endeavor.