AI Accelerates Automation by Quantifying Economic Value and Reshaping Expertise

Original Title: Brendan Foody on Teaching AI and the Future of Knowledge Work

Conversations with Tyler · January 07, 2026 · Listen to Original Episode →

The AI Training Revolution: Beyond Raw Text to Rubrics and Real-World Value

This conversation with Brendan Foody, co-founder of Mercor, reveals a critical, often overlooked, shift in the AI landscape: the move from simply feeding models vast amounts of text to meticulously defining and measuring success through expert-crafted rubrics. The non-obvious implication is that the true bottleneck in AI advancement is not computational power or raw data, but the human expertise required to teach models what good looks like, especially in economically valuable domains. This insight is crucial for anyone in tech, AI development, or business strategy who wants to understand the next frontier of AI utility and competitive advantage. By focusing on the quality of AI evaluation, this discussion offers a roadmap for building more impactful AI, a significant advantage for those who grasp its strategic importance.

The Hidden Cost of "Good Enough" AI: Why Rubrics Trump Raw Text

The prevailing narrative in AI development often centers on the sheer volume of data and the increasing sophistication of models. However, Brendan Foody, the remarkably young founder of Mercor, a company that sources experts to train frontier AI models, highlights a more nuanced and arguably more critical factor: the quality of evaluation. Mercor's business model, which includes paying top poets $150 an hour, underscores a fundamental truth: teaching AI to perform economically valuable tasks requires more than just data; it demands expert-defined standards of success. This is where the concept of "rubrics" becomes paramount, shifting the focus from mere output generation to the precise measurement of desired outcomes.

Foody argues that much of the AI research community has been fixated on academic benchmarks--like graduate-level reasoning tests or math olympiads--which are disconnected from real-world applications. The true challenge lies in evaluating AI's ability to, for instance, automate medical diagnoses or draft legal documents. Mercor’s approach, working with luminaries like Larry Summers for finance, Cass Sunstein for law, and Eric Topol for medicine, aims to bridge this gap. These experts are not just academics; they possess a broad, industry-wide vantage point crucial for designing effective evaluation frameworks.

The core insight here is that the rate of model improvement on economically valuable tasks is staggering, estimated at 25-30% per year. However, this progress is directly tied to the quality of the evaluations. Foody elaborates on the methodology: surveying hundreds of experts in fields like consulting to understand how they spend their time, then translating that into prompts and rubrics. This process quantifies the economic value of tasks, providing a tangible metric for AI progress.

"The largest disconnect that we were seeing in ai research is that everyone was focused on academic evals like gpqa for phd level reasoning or imo for olympiad math which were wholly disconnected from the outcomes that customers actually care about of how do we get the model to automate a medical diagnosis or a legal draft or preparing a certain financial analysis of a company."

-- Brendan Foody

This focus on rubrics reveals a systemic consequence: without them, AI development risks optimizing for the wrong objectives. A model might be excellent at generating grammatically correct poetry, but if it doesn't capture the nuanced aesthetic or emotional resonance that human readers value, its utility is limited. Foody emphasizes that while models are rapidly improving, the "last 25%"--the truly complex, nuanced, and taste-driven aspects of human expertise--remains a significant bottleneck. This is precisely where human experts, guided by well-defined rubrics, become indispensable. The implication for businesses is clear: investing in the development of robust evaluation frameworks, rather than solely in model scaling, will be a key differentiator.

The Long Horizon Payoff: From Task Automation to Agent Training

The conversation pivots to a more profound implication of AI advancement: the shift from automating individual tasks to enabling the training of sophisticated AI agents capable of complex, long-horizon work. Foody predicts that within six to twelve months, we will see models capable of extensive tool use and multi-day tasks. This capability fundamentally alters the nature of knowledge work.

Instead of repetitive analysis, knowledge workers will increasingly transition to training AI agents and building reinforcement learning (RL) environments. This represents a significant departure from conventional wisdom, which often focuses on AI replacing jobs. Foody's perspective is that AI will create new job categories centered around AI supervision and development.

"I think that a huge portion of the economy will become an rl environment machine."

-- Brendan Foody

This transition has profound implications for competitive advantage. Companies that can effectively train and deploy these agents will gain a significant edge. The analogy to software development is apt: initial investment in building robust agents and RL environments yields scalable, repeatable value. This is a delayed payoff, requiring upfront investment in expertise and infrastructure, but one that promises substantial long-term returns. The "discomfort now, advantage later" dynamic is evident here; the effort to define and build these training systems is demanding, but it unlocks unprecedented productivity. Conventional wisdom, focused on immediate task automation, fails to capture the strategic value of this longer-horizon investment.

The Taste Dilemma: Enshrining Preferences in a Changing World

A particularly fascinating thread in the discussion is the challenge of "taste" in AI. Foody acknowledges that taste, particularly in subjective domains like poetry or law, is difficult to capture in a rubric. Immanuel Kant's assertion that taste cannot be codified highlights this inherent tension. While RLHF (Reinforcement Learning from Human Feedback) offers a way to capture preferences, it raises questions about whose preferences should be enshrined. Should AI model the taste of historical masters like Milton or Wordsworth, or contemporary experts?

Foody suggests that in the long run, AI will likely be able to personalize taste, drawing on various historical and contemporary knowledge bases. However, this raises a critical question for businesses and developers: what taste are you optimizing for now? The choice of evaluators and the criteria used to define "good" will shape the AI's output and, consequently, its market impact.

This isn't just an academic debate; it has direct economic consequences. If AI is trained on a narrow definition of taste, it may fail to capture broader market appeal or alienate segments of users. Conversely, a model that can adapt to diverse preferences, guided by well-crafted rubrics and expert feedback, will possess a significant competitive advantage. The implication is that companies need to be deliberate about the "taste" they imbue in their AI, understanding that this choice has long-term strategic ramifications, even if it requires uncomfortable upfront decisions about whose expertise to prioritize.

Actionable Takeaways for Navigating the AI Frontier

Based on this conversation, here are key actions to consider:

Prioritize Rubric Development: For any AI initiative, invest heavily in defining clear, measurable rubrics for success, especially for economically valuable tasks. This is not just about data collection; it's about expert evaluation.
- Immediate Action: Audit current AI projects for clear evaluation criteria.
Focus on Long-Horizon Capabilities: Shift strategic thinking beyond immediate task automation to developing AI agents capable of complex, multi-day projects.
- This pays off in 12-18 months: Begin R&D into agent training and RL environment development.
Cultivate Domain Expertise: Recognize that the "last 25%" of AI performance relies on deep human expertise. Actively seek and integrate domain experts into your AI development and evaluation processes.
- Over the next quarter: Identify and engage with key domain experts relevant to your industry.
Embrace the "Taste" Challenge: Be intentional about the aesthetic and qualitative standards you want your AI to embody. Understand that taste is subjective and evolving, and your choices will shape your AI's market reception.
- This pays off in 18-24 months: Develop strategies for incorporating diverse and evolving taste preferences into AI training.
Invest in AI Training Roles: Anticipate the emergence of new job categories focused on training AI agents and building RL environments.
- Immediate Action: Begin upskilling existing talent or hiring for roles focused on AI supervision and training.
Leverage AI for Hiring: Implement data-driven, skills-based assessments rather than relying on subjective "vibes" or traditional interview heuristics.
- Over the next 6 months: Pilot project-based assessments for key hires.
Consider the "Discomfort Now, Advantage Later" Principle: Embrace initiatives that require upfront effort and may not show immediate results but build durable competitive moats. Developing robust AI evaluation and training systems falls squarely into this category.
- Ongoing Investment: Allocate resources to long-term AI capabilities that competitors may shy away from due to their difficulty.

Related Episodes

Product Engineers Drive Business Impact Beyond Technical Execution

Dec 17, 2025 Beyond Coding

Become a "product engineer" by mastering business context and customer needs. Drive impact, solve problems proactively, and accelerate your career beyond traditional coding.

View Episode Notes →

Human-Like Learning, Not RL, Drives Future AI Progress

Dec 23, 2025 Dwarkesh Podcast

Current AI training methods are flawed if models can learn on the job like humans. This reliance on pre-baked skills may soon be obsolete, highlighting a significant gap in true AI generalization and adaptability.

View Episode Notes →

Roblox's AI Vision: Building a Holodeck for Human Co-Experience

Feb 05, 2026 No Priors: Artificial Intelligence | Technology | Startups

AI enables a "holodeck" for human co-experience, redefining digital interaction through persistent, high-fidelity simulation and advanced AI-driven NPCs.

View Episode Notes →

Cursor's Focused Editor: Powering AI Coding's Future

Nov 10, 2025 The a16z Show

## Episode Synopsis Cursor, a rapidly growing developer tool company, has achieved remarkable success by focusing on core AI coding...

View Episode Notes →

2025 AI Advancements: AGI Arrival, Small Models, and Regulatory Divergence

Dec 18, 2025 Everyday AI Podcast – An AI and ChatGPT Podcast

AI now surpasses human experts on elite thinking tests and economically valuable tasks, marking the quiet arrival of AGI-level capabilities impacting careers and economies.

View Episode Notes →

AI's Societal Adaptation Beyond Historical Technological Shifts

Dec 06, 2025 Offline with Jon Favreau

AI may restore order to public discourse, acting as a new gatekeeper to mitigate social media's divisive effects and ushering in curated information feeds.

View Episode Notes →