Turbopuffer's Cloud Primitive Database Design for AI

Original Title: Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

Latent Space: The AI Engineer Podcast · March 12, 2026 · Listen to Original Episode →

The Unseen Architecture: How Turbopuffer's Database Design Unlocks AI's True Potential

The conventional wisdom around building for the future often leads teams down paths of premature optimization, sacrificing long-term viability for immediate gains. In a recent conversation on the Latent Space podcast, Simon Hørup Eskildsen of Turbopuffer reveals a more profound truth: true innovation in data infrastructure doesn't just solve today's problems, it anticipates the fundamental shifts in how we interact with information. Eskildsen’s journey, from scaling infrastructure at Shopify to founding Turbopuffer, highlights how a deep understanding of underlying technological primitives--like object storage and NVMe--combined with a relentless focus on cost efficiency, can unlock capabilities previously thought impossible. This conversation is crucial for engineers, architects, and founders who are grappling with the escalating costs and complexities of managing data in the age of AI, offering a blueprint for building systems that are not only performant but also fundamentally more robust and cost-effective. It exposes the hidden consequences of choosing familiar, but ultimately limiting, architectural patterns.

The Unseen Architecture: How Turbopuffer's Database Design Unlocks AI's True Potential

The explosion of AI, particularly Large Language Models (LLMs), has fundamentally altered the landscape of data management. Suddenly, the ability to access and reason over vast quantities of unstructured data isn't a niche requirement; it's becoming a core competency for businesses across the board. Yet, as Simon Hørup Eskildsen of Turbopuffer explains, the infrastructure to support this seismic shift is still in its nascent stages, often burdened by legacy thinking and prohibitive costs. Eskildsen’s insights into Turbopuffer’s architectural bet--a radical embrace of cloud primitives like object storage and NVMe--offer a compelling counterpoint to traditional database design, revealing how a focus on fundamental cost structures and performance characteristics can create significant downstream advantages.

The Cost of "Good Enough": The Readwise Revelation

Eskildsen’s path to founding Turbopuffer was paved with the pragmatic pain points of scaling large systems. His decade at Shopify, a period marked by relentless growth and the constant struggle to keep critical services online, instilled a deep appreciation for the operational realities of infrastructure. This experience, particularly his “aggravating” encounters with Elasticsearch, fueled a desire for simpler, more performant solutions. The true catalyst, however, emerged during his consulting work with Readwise.

The Readwise team, a bootstrapped company spending around $5,000 a month on infrastructure, explored adding article recommendation and semantic search features. The initial napkin math for embedding articles and running them through a vector index pointed to a staggering $30,000 monthly cost -- a non-starter for a company prioritizing fiscal prudence. This stark cost disparity, where a valuable feature was rendered economically infeasible by the underlying technology, became the seed of Turbopuffer.

"This was a company that was spending maybe five grand a month in total on all their infrastructure and when I did the napkin math on running the embeddings of all the articles, putting them into a vector index, putting it in prod, it's gonna be like 30 grand a month. That just wasn't tenable."

This revelation wasn't just about a specific feature; it was about a systemic issue. The cost structure of existing solutions was fundamentally misaligned with the burgeoning demand for AI-powered data access. The immediate benefit of semantic search was overshadowed by its prohibitive downstream cost, a classic example of a first-order solution creating a second-order problem.

The Architecture Bet: Embracing Cloud Primitives

Eskildsen’s core insight was that the cloud had evolved, offering new building blocks that traditional database architectures hadn't fully leveraged. He identified three critical shifts:

The Rise of NVMe SSDs: Available in the cloud around 2017, NVMe offered significantly higher I/O performance than previous generations of SSDs. This enabled architectures that could aggressively utilize local storage for hot data.
The Maturation of Object Storage (S3): Crucially, S3 achieved strong consistency in December 2020. This removed the need for complex, stateful consensus layers (like Zookeeper) that were historically required to manage data integrity across distributed systems.
The Emergence of Compare-and-Swap on Object Storage: While not available on S3 until late 2024, this primitive (available on GCP earlier) allowed for more efficient atomic updates to metadata, further simplifying distributed system design.

Turbopuffer’s architectural bet was to build from these primitives, rather than retrofitting them onto existing designs. By going all-in on object storage for durability and NVMe for fast access, and by eschewing a traditional consensus layer, Turbopuffer achieved a dramatically simplified architecture. This design choice, while requiring deep conviction and even unconventional solutions like dark fiber to bridge cloud provider latencies for early customers like Notion, directly addressed the cost and complexity issues that plagued previous generations of databases. The immediate discomfort of dealing with cross-cloud latency or tuning TCP windows was a necessary price for a system that avoided the compounded technical debt of stateful, consensus-driven architectures.

The Agentic Shift: Concurrency as the New Frontier

The conversation then pivoted to how agentic workloads are reshaping search. The traditional RAG (Retrieval-Augmented Generation) pattern involved a single retrieval call to provide context for an LLM. However, the emergence of agents that can fire off multiple, parallel queries fundamentally changes the dynamics. This isn’t just about retrieving information; it’s about orchestrating a symphony of concurrent searches.

This shift has direct implications for infrastructure. Instead of optimizing for a small number of carefully crafted queries, systems must now handle massive bursts of concurrent requests. This is where Turbopuffer’s design, built to maximize concurrency and minimize round trips, finds a natural advantage. The ability to drive high throughput on object storage and NVMe, processing many requests in parallel, aligns perfectly with the demands of agentic systems.

"What we're seeing more demand from from our customers now is to do a lot of concurrency, right? Like Notion does a ridiculous amount of queries in every round trip just because they can. And I'm also now, when I use the Cursor agent, I also see them doing more concurrency than I've ever seen before."

This is a prime example of a delayed payoff. While traditional systems might struggle to adapt to this new workload, Turbopuffer’s foundational design choices position it to benefit. The consequence of prioritizing simplicity and raw cloud primitive utilization is a system that can scale cost-effectively with the most demanding AI applications.

The "P99 Engineer" and the Future of Database Development

Eskildsen’s philosophy extends to team building, encapsulated in the concept of the "P99 engineer." This isn't just about hiring top talent; it’s about cultivating a team that can bend software to its will, pushing the boundaries of what’s seemingly possible. The example of Nathan, Turbopuffer’s chief architect, achieving remarkable performance with ANN V3 by deeply understanding and manipulating the underlying systems, illustrates this ethos. This drive to achieve extreme performance, even when it requires unconventional solutions, is precisely what’s needed to navigate the evolving landscape of AI infrastructure.

The future of Turbopuffer, as Eskildsen outlines, involves expanding its capabilities beyond pure search. The goal is to become the ultimate database for connecting AI to vast datasets, gradually incorporating more query plans -- OLAP, time-series, and potentially graph queries -- as dictated by customer needs. However, the core principle remains: maintain focus and leverage fundamental architectural advantages. This strategic approach ensures that as AI workloads evolve, Turbopuffer’s foundational design will continue to provide a cost-effective and performant solution, a testament to the power of building for the long game.

Key Action Items

Re-evaluate Cost Structures: Analyze the cost of current data infrastructure, specifically for AI workloads. Identify where vector databases or advanced search capabilities are becoming prohibitively expensive, mirroring the Readwise scenario.
Investigate Cloud Primitive Architectures: Explore databases or storage solutions that are built on top of cloud primitives like S3 and NVMe, rather than retrofitting them. Understand the trade-offs, particularly around write latency and consistency models.
Benchmark for Concurrency: If using search for AI applications, test current systems under high concurrency loads. This will reveal potential bottlenecks as agentic workloads become more prevalent.
Prioritize Simplicity in Design: When building or selecting new data systems, favor architectures that minimize state and avoid complex consensus layers. This reduces operational overhead and potential failure points.
Embrace Delayed Payoffs: Recognize that solutions requiring upfront investment in architectural purity (like Turbopuffer's initial cross-cloud efforts) often yield significant long-term competitive advantages in cost and scalability.
Develop a "P99" Mindset: Foster a culture that encourages deep technical understanding and the drive to push system limits, not just to meet current requirements but to anticipate future ones.
Explore Hybrid Search: Consider integrating semantic search with traditional text, regex, or SQL-style queries to create more robust and nuanced retrieval systems, especially for code-based or structured data.

Related Episodes

Railway's Agent-Native Cloud: Deep Infrastructure for 1000x Scale

May 20, 2026 Latent Space: The AI Engineer Podcast

AI agents demand infrastructure that scales 1000x faster, forcing a re-evaluation of deployment. Discover how deep infrastructure control unlocks near-zero activation energy for shipping code.

View Episode Notes →

Transitioning AI Agents From Disposable Containers to Composable Computers

May 21, 2026 Latent Space: The AI Engineer Podcast

Standard Kubernetes clusters often fail AI agents because they lack the statefulness needed for complex tasks. By switching to composable, instant-start infrastructure, you can enable agents to navigate legacy UIs and manage high-intensity, variable compute workloads more effectively.

View Episode Notes →

Shopify's AI Shift: Production Stability Over Generation Speed

Apr 22, 2026 Latent Space: The AI Engineer Podcast

AI code generation creates more bugs than it solves; focus shifts from creation speed to rigorous review and stable deployment for sustainable software development.

View Episode Notes →

AI Infrastructure Race: Capital, Energy, and Talent Orchestration

Mar 23, 2026 All-In with Chamath, Jason, Sacks & Friedberg

AI's true advantage lies not just in chips, but in mastering capital, energy, and talent. Discover how orchestrating these downstream consequences builds competitive moats others miss.

View Episode Notes →

Anthropic's Co-Work: Virtual Machines, Skills, and Local-First AI

Mar 17, 2026 Latent Space: The AI Engineer Podcast

AI agents gain autonomy and trusted task execution by running in their own virtual machines, shifting the local computer's role and enabling portable, personalized automation through "skills."

View Episode Notes →

AI Agent Architectures: Security, Environment, and Memory Challenges

May 28, 2026 Latent Space: The AI Engineer Podcast

AI agents require sophisticated infrastructure for secure, scalable operation, transforming software development into autonomous factories. Understand architectural choices for robust, future-proof workflows.

View Episode Notes →