Centralizing Data Infrastructure to Enable Effective AI Agents

Original Title: AI Agents and the Fight for Customer Data

The a16z Show · June 05, 2026 · Listen to Original Episode →

The Data Foundation Paradox: Why AI Agents Need More Centralization, Not Less

The core idea here is that the rise of AI agents will not kill enterprise software. Instead, it will force companies to centralize their data infrastructure. While some worry that AI will bypass existing SaaS platforms, the truth is that agents need context to function. To be useful, they require a unified, clean, and accessible data foundation. This is the opposite of the fragmented, gated silos that many vendors are currently building. For leaders, the best strategy is to avoid the walled garden trap and focus on data portability. Companies that own their data today will be the ones that successfully deploy agentic workflows tomorrow, while those that hand control to vendor-managed AI tools will end up trapped in limited, opaque systems.

The Illusion of Data Gravity and the High Cost of Walled Gardens

The industry is in a reactionary phase where SaaS vendors, afraid of a SaaS apocalypse, are locking down APIs to keep AI agents from accessing data. George Fraser of Fivetran argues this is a major strategic error. These vendors are effectively fighting their customers by treating data as a proprietary asset rather than a utility.

The reason why they get away with blocking people is simply because people don't fight, so pick fights and write it into your if you have big contracts with vendors and you're redlining your MSA's right language guaranteeing your own data access into those MSA's.

-- George Fraser

The data gravity argument, which claims data is too large to move and must stay where it was created, is a myth kept alive by inefficient, slow data pipelines that copy entire datasets every day. In reality, modern change data capture (CDC) allows for lightweight, continuous synchronization. When vendors use egress fees or API blocks to stop this, they are not protecting their business. They are forcing customers to build expensive, complex workarounds. The long-term winner will be the vendor that provides an open way to replicate data, knowing they cannot build every tool a customer needs within their own walled garden.

Why AI Agents Are More Seats, Not Fewer Seats

Conventional wisdom suggests that if an AI agent can do a human's job, the need for software licenses will drop. Fraser disagrees, noting that enterprise software spend is a small fraction of total headcount cost. Businesses are not using AI to save a tiny percentage on their software budget; they are using it to improve operations.

Furthermore, the agent-as-a-human model is gaining ground. By treating agents as distinct entities with their own roles, identities, and permissions, companies can integrate them into existing workflows without a massive overhaul. This does not shrink the software footprint; it increases the consumption of it. As Fraser notes, the most effective agents fit into current systems, using the same interfaces and APIs humans use, because those interfaces have already solved the long tail of integration challenges.

The Durable Infrastructure Advantage

While AI labs push the limits of what models can do, building reliable, operational systems remains a human-led, high-effort task. There is a temptation to believe AI will commoditize infrastructure, allowing models to build their own tools on the fly. However, the complexity of maintaining reliable data pipelines, what Fraser calls the long tail of complexity, is not easily solved by AI alone.

The right data foundation for AI is probably the one you already have. If you have a reasonably modern data platform... that is a great foundation for your context for AI as well.

-- George Fraser

The most successful companies, including the AI labs themselves, are not building exotic, AI-only infrastructure. They rely on proven data foundations like Snowflake, Databricks, or BigQuery to feed context to their agents. The strategic payoff is delayed but significant. By investing in a clean, centralized data lake now, companies create a moat that allows them to pivot as agent capabilities evolve, rather than being beholden to the proprietary AI tools of their software vendors.

Key Action Items

Audit your MSA language: Over the next quarter, review your contracts with major SaaS vendors. If you are signing high-value agreements, insist on explicit language that guarantees access to your own data.
Centralize for context: Treat your data lake as the primary brain for your AI agents. If your data is fragmented across systems, your agents will be limited by the same knowledge cut-off issues that plagued early LLMs.
Invest in CDC: Move away from midnight batch data copying. Implement change data capture (CDC) to keep your data foundation current without the overhead of massive, redundant data movement.
Standardize on roles, not users: As you onboard AI agents, assign them distinct roles and identities within your systems. This allows for cleaner audit trails and better security than sharing human credentials.
Ignore the walled garden siren song: In the next 12 to 18 months, resist the temptation to move your data workflows into a vendor proprietary AI suite. The short-term convenience is not worth the long-term loss of control over your business logic.
Build for the long tail: Do not chase headless or browser-automation hacks if a robust, documented API exists. The interfaces that have survived for decades are usually the most reliable way to interact with your systems of record.

Related Episodes

AI Data Paradox: Open Access Fuels Enterprise AI Integration

Jun 02, 2026 AI + a16z

Vendor data lock-down hinders AI integration. Insist on open data access to empower AI-driven workflows and gain a competitive advantage.

View Episode Notes →

Prioritizing Backend Reliability for Agent-Driven Enterprise Architectures

Apr 21, 2026 AI + a16z

Agent-first architectures require moving away from human-centric interfaces toward reliable machine-to-machine backends. Companies that focus on infrastructure rather than UI will capture more value as the compute costs for agent-driven tasks scale asymptotically.

View Episode Notes →

AI Applications Drive Platform Shifts Through Defensible Workflows and Data

Jan 19, 2026 The a16z Show

AI applications accelerate platform shifts, creating new software categories that replace labor and build defensible moats through proprietary data and workflow ownership.

View Episode Notes →

AI Infrastructure Bottlenecks: Regulation, Not Technology, Drives Disruption

Jan 21, 2026 The a16z Show

AI's growth is stalled not by technology, but by regulatory hurdles and slow infrastructure development, forcing a complete rebuild of the tech stack and challenging traditional IT roles.

View Episode Notes →

AI's Real Challenge: Reimagining Workflows Beyond Systems of Record

Mar 06, 2026 The a16z Show

AI's true impact is reimagining business processes, not just automating tasks. Companies must redesign workflows to leverage AI's power, or risk obsolescence.

View Episode Notes →

AI's Hidden Consequences Drive Long-Term Advantage

Apr 23, 2026 Latent Space: The AI Engineer Podcast

AI's true advantage lies beyond model capabilities, demanding agent-first APIs and agent-friendly developer experiences for durable, defensible businesses.

View Episode Notes →