Notion's Agentic Work: System Design Trumps Raw AI Power

Original Title: Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion

The Hidden Architecture of Agentic Work: Beyond the Hype with Notion's Simon Last and Sarah Sachs

Notion's journey to building Custom Agents reveals a profound truth: the most significant breakthroughs in AI aren't solely about raw model capability, but about the intricate systems and cultural foundations that harness it. This conversation unpacks the non-obvious implications of integrating AI into a deeply entrenched productivity tool, highlighting the years of iterative rebuilding and strategic foresight required to move beyond simple wrappers. It exposes the hidden costs of premature AI adoption and the competitive advantage gained by patiently architecting for future model advancements. This analysis is crucial for product leaders, engineering managers, and AI practitioners aiming to build durable, impactful AI products, offering a strategic playbook for navigating the complex interplay of technology, user experience, and organizational culture.

The Long Game: Why "Too Early" Was the Right Time

The launch of Notion's Custom Agents wasn't an overnight success; it was the culmination of years of persistent effort, marked by multiple rebuilds. Simon Last and Sarah Sachs candidly discuss the early challenges, where attempts to create AI assistants in late 2022 and early 2023 were stymied by fundamental limitations: the absence of a tool-calling standard, short context windows, and unreliable models. This wasn't a failure of ambition, but a testament to the "Agent Lab" thesis: building around frontier capabilities requires not just wrapping a model, but deeply understanding user collaboration and constructing the right product system.

"We ship things slowly. So it had been in Alpha for a little bit and at the point at which it's an alpha, there's a group of people that are making sure it's ready for prod, and then there's a group of people working on the next thing. So sometimes some of these launches are a bit delayed satisfaction."

-- Sarah Sachs

The critical insight here is Notion's strategic approach to roadmap timing. Instead of simply waiting for models to mature, they invested in building the product infrastructure, anticipating future capabilities. This "building for where models are going, not just where they are" philosophy is a powerful differentiator. It acknowledges that while model advancements are crucial, the product system--including robust evals, flexible harnesses, and intuitive user interfaces--is what truly unlocks value. The repeated rebuilds weren't wasted effort; they were essential steps in de-risking the technology and aligning it with Notion's core mission as a system of record.

The "Simon Vortex" and the Culture of Deletion

A recurring theme is Notion's unique organizational culture, particularly within the AI team. Sarah Sachs emphasizes the importance of objective-setting over idea ownership, fostering "low-ego teams comfortable deleting their own work." This is exemplified by the "Simon Vortex," a rapid prototyping environment where engineers can iterate quickly, embracing change and even obsolescence of their own creations. This culture is vital because the AI landscape shifts so rapidly; what is cutting-edge today can be obsolete tomorrow.

"My job was not to be the ideas person or the technical expert. My job was to make it so that everybody understood the objective, had a resource to help prioritize what they should work on, and had an avenue to prioritize what they thought was important."

-- Sarah Sachs

This approach contrasts sharply with traditional product development. The "demos over memos" philosophy, coupled with a willingness to embrace prototypes that might fundamentally alter existing workflows, allows Notion to stay ahead. The emphasis on empowering individuals and teams to swarm on problems, even if it means re-architecting core components, creates a dynamic environment capable of adapting to the pace of AI evolution. This cultural agility, combined with rigorous internal dogfooding--Notion employees using Notion more than almost any other company uses its own product--provides a powerful feedback loop.

The Unseen Complexity: Tooling, Evals, and the "Software Factory"

The conversation delves into the intricate technical underpinnings of Notion's agent system, revealing layers of complexity often hidden from the end-user. The evolution of their agent harness, from early JavaScript agents to XML representations, and finally to Markdown and SQL-like abstractions, highlights a continuous effort to simplify the interaction for both the model and the developer. A key learning has been to "give the models what they want"--adapting interfaces to align with LLM strengths, such as using SQL for database queries.

This focus on developer experience extends to their sophisticated eval system. Notion employs a multi-tiered approach, including regression tests for stability, launch-quality evals for product readiness, and "frontier/headroom" evals designed to pass only ~30% of the time. This deliberate strategy allows them to proactively identify emerging model capabilities and potential regressions, acting as an early warning system for future product development. The concept of treating the eval system itself as an "agent harness" further underscores their commitment to automating and refining the AI development lifecycle. The vision of a "software factory"--a system where agents collaboratively spec, code, test, debug, review, and maintain codebases--is not a distant dream but an active area of development, aiming to minimize human intervention while preserving critical system invariants.

Actionable Takeaways for Building with AI

  • Embrace Iterative Rebuilding: Do not fear multiple rebuilds. Each iteration is a learning opportunity that de-risks future development and aligns your product with evolving capabilities.
  • Cultivate a Culture of Deletion: Foster an environment where teams are comfortable discarding their own work if a better approach emerges. This is crucial for rapid adaptation in the AI space.
  • Prioritize System Design Over Raw Model Power: Recognize that the infrastructure, tooling, and user experience surrounding an AI model are as critical as the model itself. Invest in robust harnesses, evaluation frameworks, and intuitive interfaces.
  • Build for the Future, Not Just the Present: Anticipate how model capabilities will evolve and architect your systems to accommodate these advancements. This proactive approach creates a significant competitive advantage.
  • Develop Sophisticated Eval Systems: Implement multi-layered evaluation strategies, including "headroom" evals, to monitor model performance, identify regressions, and guide future development.
  • Empower Operators with Transparency: Make tools and system prompts visible to power users. This transparency allows for deeper interrogation of agent behavior and more effective utilization.
  • Focus on Data Capture as a System of Record: Treat features like meeting notes not just as conveniences, but as critical data capture mechanisms that fuel agentic workflows and enhance the overall value proposition of your platform. This builds a powerful data flywheel.
  • Align Pricing with Value and Efficiency: Implement usage-based pricing that reflects the actual cost and capability of AI tasks. Guide users toward cost-effective solutions and explore model optionality to fill the "triangle" of intelligence, price, and latency.
  • Invest in "Auto" for Task Optimization: Develop intelligent systems that can select the most appropriate model for a given task, rather than defaulting to the most powerful (and expensive) option. This optimizes cost and performance.
  • Build for Composability: Design agents and tools that can interact seamlessly, leveraging shared data primitives and enabling agents to invoke one another. This creates a more powerful and flexible system.

---
Handpicked links, AI-assisted summaries. Human judgment, machine efficiency.
This content is a personally curated review and synopsis derived from the original podcast episode.