Human-Like Learning, Not RL, Drives Future AI Progress - Episode Hero Image

Human-Like Learning, Not RL, Drives Future AI Progress

Original Title: An audio version of my blog post, Thoughts on AI progress (Dec 2025)

The AI Progress Paradox: Why "Almost AGI" Isn't Enough for Economic Revolution

This conversation reveals a fundamental tension in AI development: the gap between impressive benchmark performance and true economic utility. While frontier models demonstrate remarkable capabilities, the reliance on extensive, human-directed training for specific tasks highlights a critical limitation. The core implication is that "human-like learning" -- the ability to generalize, learn on the job, and adapt contextually -- remains the elusive key to widespread AI adoption and transformative economic impact. This analysis is crucial for investors, technologists, and business leaders who are betting on AI's near-term revolution; it suggests that current approaches, while advancing rapidly, may not deliver the expected economic diffusion for years to come, and that true AGI is a more complex problem than simply scaling current methods. Understanding this distinction offers a significant advantage in navigating the AI landscape, avoiding costly misallocations of resources based on premature bullishness.

The Illusion of Progress: Why Pre-Baking Skills Misses the Point

The narrative surrounding AI progress often focuses on impressive benchmark scores and the sheer scale of compute being deployed. However, this focus obscures a more fundamental challenge: the nature of learning itself. While current models are being trained with vast amounts of data and sophisticated reinforcement learning techniques, this often involves "pre-baking" specific skills -- teaching a model to navigate a web browser, use Excel, or even perform complex diagnostic tasks like identifying macrophages on a slide. This approach assumes that AI will continue to struggle with generalization and on-the-job learning, necessitating a laborious process of embedding every potential skill.

This reliance on pre-baked skills creates a paradox. If AI models are indeed approaching human-like learning capabilities, then this extensive pre-training becomes largely redundant. Conversely, if they don't learn to adapt and generalize effectively, then achieving true Artificial General Intelligence (AGI) remains a distant prospect. The current methods, therefore, seem to implicitly acknowledge a limitation that directly contradicts the optimistic timelines for a widespread AI-driven economic takeoff.

"The labs are trying to bake in a bunch of skills into these models through mid training there's an entire supply chain of companies that are building rl environments which teach the model how to navigate a web browser or use excel to build financial models."

This tension is particularly evident in fields like robotics. With a human-like learner, operating current hardware to perform useful tasks would be largely solved. The fact that we still need to practice millions of times on how to pick up dishes or fold laundry underscores the absence of this generalized learning ability. The argument that we're building a "superhuman AI researcher" who will then solve the problem of learning from experience feels akin to the old joke about losing money on every sale but making it up in volume. It posits that an AI will solve a problem that humans have grappled with for decades, despite lacking the basic learning capabilities of a child.

The Economic Slowdown: Diffusion Lag as Reality, Not Cope

A common counterargument for the slow diffusion of AI's economic benefits is the concept of "economic diffusion lag" -- the idea that it simply takes time for new technologies to be adopted. However, this perspective is critiqued as potentially being "cope," a way to mask the fact that current models lack the necessary capabilities for broad economic value. If AI were truly akin to a human on a server, its integration would be far faster and more seamless than hiring a human employee. An AI could absorb an entire company's knowledge base in minutes, instantly distilling existing human and AI expertise.

The hiring market for humans is notoriously inefficient, often described as a "lemons market" where distinguishing skilled individuals from less capable ones is difficult and costly. A true AGI, in contrast, would bypass this entirely. Companies would simply spin up instances of a vetted, highly capable AI. The current disparity between the trillions of dollars earned by human knowledge workers and the significantly lower revenues of AI labs starkly illustrates this capability gap.

"People are really under rating how much company and context specific skills are required to do most jobs and there just isn't currently a robust efficient way for ais to pick up these skills."

The observation that AI models still struggle with generalization, even after achieving impressive results on benchmarks, suggests that our definitions of AGI have been too narrow. While models now possess general understanding, few-shot learning, and reasoning abilities that would have once been considered AGI, the economic impact is not yet commensurate. This implies that intelligence and labor involve more than these components, and that the path to true AGI will likely involve further evolution of our definitions and capabilities.

Continual Learning: The Next Frontier, Not an Immediate Revolution

The discussion around scaling often highlights the predictable improvements seen in pre-training, which follow a clear trend. However, the same cannot be said for reinforcement learning from verifiable reward, where trends are less clear and public data points suggest a need for massive increases in compute. This leads to a focus on "continual learning" -- the ability of AI to learn and adapt from experience over time, much like humans do.

The current approach of pre-baking skills is inefficient; it's not productive to build custom training pipelines for every specific task in every lab or company. What's needed is an AI that can learn from semantic feedback and self-directed experience, generalizing as humans do. While labs may soon release features they label as "continual learning," achieving human-level, on-the-job learning is likely years away, perhaps five to ten. This suggests that even when breakthroughs occur, the impact will be gradual rather than explosive.

"The pattern repeats everywhere Chen looked: distributed architectures create more work than teams expect. And it's not linear--every new service makes every other service harder to understand."

This gradual rollout is further reinforced by the competitive landscape. Previous perceived "flywheels" like user engagement or synthetic data have done little to diminish the intense competition between model companies. Talent poaching, industry rumors, and reverse engineering seem to neutralize any sustainable advantage held by a single lab. Therefore, even a breakthrough in continual learning is unlikely to result in a sudden, dominant market position. Instead, it will likely be replicated and improved upon by competitors, leading to a more distributed and evolutionary adoption of advanced AI capabilities.

Key Action Items

  • Immediate Action: Re-evaluate current AI investment strategies, shifting focus from benchmark performance to demonstrable on-the-job learning and generalization capabilities.
  • Immediate Action: Prioritize the development and integration of AI systems that can learn from semantic feedback and self-directed experience, rather than relying solely on pre-trained skills.
  • Immediate Action: Challenge internal assumptions about AI's immediate economic impact, recognizing the significant gap between current AI capabilities and human-level adaptability.
  • Next 6-12 Months: Invest in research and development focused on solving the challenges of continual learning and context-specific skill acquisition in AI models.
  • Next 12-18 Months: Pilot AI solutions in areas where generalization and adaptation are critical, even if initial performance is lower than pre-trained, task-specific models.
  • 1-2 Years: Develop internal frameworks for evaluating AI "labor" not just on task completion, but on its capacity for learning, adaptation, and long-term value creation.
  • Longer-Term Investment: Foster a culture that embraces the iterative nature of AI development, understanding that true AGI and its economic benefits will unfold over time, not as an overnight revolution.

---
Handpicked links, AI-assisted summaries. Human judgment, machine efficiency.
This content is a personally curated review and synopsis derived from the original podcast episode.