LLMs Excel at Correlation, Not Causation; AGI Requires Plasticity

Original Title: What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado

The current generation of Large Language Models (LLMs) are powerful pattern-matching machines, not nascent intelligences. While they excel at predicting the next token with remarkable mathematical precision, this capability is fundamentally different from understanding cause and effect or possessing true learning plasticity. This conversation reveals the hidden consequence of mistaking correlation for causation and the critical, often overlooked, requirements for achieving Artificial General Intelligence (AGI). Those who grasp this distinction gain a significant advantage by focusing on the architectural and conceptual leaps needed for genuine intelligence, rather than solely on scaling existing models.

The Illusion of Understanding: Why LLMs Aren't Thinking

The dazzling capabilities of LLMs, from writing code to translating languages into unseen domain-specific languages, have led many to believe we are on the cusp of AGI. However, as Professor Vishal Misra explains, this impressive performance is rooted in sophisticated pattern matching, not genuine comprehension or learning. The core of LLMs, Misra argues, is the transformer architecture, which, when mathematically modeled, behaves like a colossal matrix. Each row represents a prompt, and the columns define a probability distribution for the next token. When you feed a prompt, the LLM constructs a distribution of likely next words, samples from it, and repeats. This process, while incredibly effective at generating coherent text, is fundamentally about identifying correlations within vast datasets.

Misra's research, detailed in a series of papers, uses "Bayesian wind tunnels" to rigorously test these architectures. In controlled environments where memorization is impossible and the correct probabilistic outcome can be calculated analytically, transformers perform with astonishing accuracy, matching the theoretically correct Bayesian posterior. This demonstrates they are performing Bayesian updating--adjusting their beliefs based on new evidence--but only within the confines of their training data and architecture.

"The whole idea behind the bayesian wind tunnel was unlike these production llms where you don't know what they've been trained on so you cannot mathematically compute the posterior so again how do you prove it i mean it looks bayesian you know from the first paper from the first paper it looks bayesian but you know so the wind tunnel sort of solved that problem for us let's start with a blank architecture give it a task where we know what the answer is it cannot memorize it let's see what it does"

-- Vishal Misra

This precision, however, is a double-edged sword. It highlights the critical difference between correlation and causation. Humans, Misra points out, don't just update beliefs; we build causal models. When a pen is thrown, we don't just correlate the pen's trajectory with potential impact; we simulate the outcome and take action to avoid it. This ability to understand cause and effect, to perform interventions and counterfactuals, is what current LLMs lack. They operate in the realm of "Shannon entropy"--learning correlations--but have not yet crossed over to "Kolmogorov complexity"--understanding the underlying generative program or causal structure.

This distinction has profound implications for the future of AI. The common belief that simply scaling up LLMs will lead to AGI is a flawed premise. Scale might improve pattern matching, but it won't imbue models with the capacity for true learning or causal reasoning. The "data gravity" of their training data can even act as a constraint, preventing them from discovering novel representations or theories, as seen in the hypothetical scenario of training an LLM on pre-1916 physics and expecting it to derive relativity.

The Two Pillars of AGI: Plasticity and Causation

Achieving AGI, according to Misra, hinges on overcoming two fundamental limitations of current LLMs: the lack of plasticity and the absence of causal modeling.

1. Plasticity: The Frozen Weights Problem

LLMs, once trained, have their weights frozen. While they can perform impressive feats of "in-context learning" during a single session--like translating natural language into a domain-specific language without prior exposure--this learning is ephemeral. Each new conversation or query starts from a blank slate; the model doesn't retain knowledge or adapt its underlying structure from previous interactions. This is a stark contrast to human learning, where our brains remain plastic throughout our lives, constantly updating and integrating new information.

Misra's early work with GPT-3, translating natural language into a DSL for querying a cricket database, exemplifies this. The model could perform the task through few-shot learning, guided by examples provided in the prompt. However, it didn't "learn" the DSL in a way that persisted. This lack of continuous learning means LLMs are fundamentally limited in their ability to adapt and evolve beyond their initial training. The challenge of "continual learning" in AI research is precisely about addressing this, balancing the acquisition of new knowledge against the risk of "catastrophic forgetting"--where learning new things erases previous knowledge.

"what happens with llms is once the training is done those weights are frozen when you're doing an inference for instance in context learning or anything during that conversation okay you're doing bayesian inference but then you forget the next time a new conversation starts with zero context you don't retain any learning that happened in the previous instance"

-- Vishal Misra

2. Causation: Moving Beyond Correlation

The second critical missing piece is the move from correlation to causation. LLMs are masters of association--identifying patterns and correlations in data. However, they do not build models of cause and effect. This is essential for true intelligence, enabling capabilities like simulation, intervention, and counterfactual reasoning. Judea Pearl's causal hierarchy clearly delineates these levels: association (what LLMs do), intervention (understanding the effect of changing a variable), and counterfactuals (what would have happened if something had been different). Current LLM architectures are largely confined to the first level.

The hypothetical "Einstein test"--training an LLM on pre-1916 physics and seeing if it can derive relativity--underscores this point. While an LLM might identify correlations in the data, it wouldn't possess the conceptual leap required to reformulate spacetime. This requires a new representation, a new "manifold," as Misra puts it, that explains the observed phenomena causally. Donald Knuth's recent experiments, while showcasing LLMs' ability to solve complex mathematical problems like finding Hamiltonian cycles, also highlight this limitation. While the LLMs could find solutions with guidance and updated context, it was Knuth's human intellect that synthesized these findings into a novel proof, creating a new manifold of understanding.

The path to AGI, therefore, requires not just more data or compute, but a fundamental shift in architecture and approach--one that enables continuous learning and the development of causal models.

Navigating the Path to True Intelligence

Based on this analysis, here are actionable takeaways for individuals and organizations navigating the evolving landscape of AI:

  • Immediate Actions (Next 1-3 Months):

    • Reframe LLM capabilities: Shift internal understanding from "intelligence" to "advanced pattern matching." This prevents over-reliance on current LLMs for tasks requiring true reasoning or causal understanding.
    • Identify "correlation-dependent" workflows: Audit processes that currently rely on LLMs. Flag areas where a misunderstanding of causation could lead to significant downstream errors or missed opportunities.
    • Prioritize data quality over quantity for training: For any custom model development, focus on curated, causally relevant datasets rather than simply massive, undifferentiated dumps.
    • Experiment with DSLs for specific tasks: Explore creating domain-specific languages to bridge natural language queries with structured data, as demonstrated by Misra's cricket database example. This can improve LLM utility for well-defined problems.
  • Longer-Term Investments (6-18+ Months):

    • Invest in research on causal AI: Allocate resources to exploring and experimenting with architectures and methodologies that move beyond correlation to causation. This is where future breakthroughs will likely occur.
    • Develop frameworks for continual learning: Support or develop systems that allow models to learn and adapt over time without catastrophic forgetting. This is crucial for creating AI that truly evolves.
    • Foster interdisciplinary AI teams: Build teams that include not just ML engineers but also domain experts, mathematicians, and philosophers of science. This diverse perspective is vital for tackling the complex conceptual challenges of AGI.
    • Define and pursue an "Einstein Test" equivalent: For your specific domain, establish a high-bar challenge that requires true causal understanding and theoretical innovation, not just pattern recognition, to overcome. This will focus R&D efforts on the right problems.
    • Prepare for a paradigm shift: Recognize that the next major leap in AI may not be evolutionary (bigger LLMs) but revolutionary (new architectures focused on causality and plasticity). Invest in understanding and potentially building these next-generation systems.

---
Handpicked links, AI-assisted summaries. Human judgment, machine efficiency.
This content is a personally curated review and synopsis derived from the original podcast episode.