The database as Git: How event sourcing unlocks hidden insights and durable systems, even with AI.
This conversation with Chris May reveals that traditional CRUD-based databases, which mutate state and discard history, obscure crucial details about how systems evolve. Event sourcing, by contrast, captures every change as an immutable event, akin to Git commits. This fundamental shift offers non-obvious implications: it transforms the database into a rich audit log, enabling precise historical analysis and debugging that CRUD systems cannot replicate. Developers, especially those building complex applications or navigating the burgeoning AI landscape, will find this approach offers a profound advantage in understanding system behavior, enabling sophisticated data science, and building more resilient, auditable software. This approach demands a different mindset, one that embraces immediate complexity for long-term clarity and competitive separation.
The Unseen Power of Immutable History
The core of event sourcing lies in its departure from the "mutate-in-place" philosophy of traditional CRUD (Create, Read, Update, Delete) databases. Instead of a single row representing the current state of a shopping cart, event sourcing records each action--item added, item removed, item checked out--as a distinct, immutable event. This creates a detailed, chronological ledger of every interaction.
Chris May highlights this difference: "event sourcing on the other hand captures every change that happens within the system so you would have an event of like cart created item added to cart item added to cart you'd have five item added to cart events and two removals and so you could have the whole history of how each user interacts with your system in the database." This isn't just about having a record; it's about unlocking the ability to answer questions that are impossible with a mutable state. Imagine trying to understand why a customer abandoned their cart. A CRUD system shows only the final state, offering no clues to the journey. Event sourcing, however, provides the complete narrative, revealing the sequence of events that led to that outcome.
This historical fidelity has profound downstream effects. For data scientists, it transforms analysis. Instead of just querying current states, they can replay historical events to understand trends, customer behavior over time, and the impact of specific changes. Chris elaborates on this, noting how his service, by sending months of historical event data to a BigQuery instance, provided data scientists with a wealth of information they previously lacked, leading to immediate elation and new insights. This capability is particularly potent given the increasing reliance on AI for analysis and development. AI tools, when fed this granular, historical data, can uncover patterns and generate code with a deeper understanding of system evolution than ever before.
"not just show me all the customers from california who bought this month but like show me all the californias who and abandoned the cart but then came back and then did the you know what i mean like you can just answer way more interesting questions you got time series on the other hand maybe i would just want to load up a pandas data frame with the answers of what's the average cart size during checkout and that becomes like a big computation out of an event source based database if you don't"
The immediate benefit of a fast UI is often the initial lure of event sourcing, as Chris experienced early in his career. However, the true, lasting advantage lies in the system's ability to answer complex historical questions and its inherent auditability. This is where conventional wisdom--optimizing solely for current state--fails. It leaves a blind spot, a historical vacuum that prevents deeper understanding and adaptation.
Navigating the Complexity of Event Streams
While event sourcing offers immense power, it introduces complexity, particularly around performance and data management. The idea of replaying events to reconstruct state can seem computationally expensive. However, as Chris points out, "computers are fast." The real challenge lies in optimizing read operations.
Several strategies emerge to mitigate potential performance bottlenecks. One approach is to maintain a separate "read model," which is incrementally updated as new events occur. This read model can be a database cache, a Redis instance, or even a simple disk cache. Chris mentions using NATS for caching, and the concept of a "disk cache" as a viable alternative to external caching services. This effectively creates a materialized view of the current state, optimized for quick retrieval, while the event store preserves the full history.
Another consideration is the sheer volume of events. While event streams can grow indefinitely, practices like "closing the books" can help manage this. Chris explains that a store might reconcile daily revenue, creating a summary event that shortens the stream for historical analysis focused on yearly revenue. This selective aggregation allows for both granular historical data and manageable stream lengths.
Versioning events also presents a challenge. As systems evolve, event schemas may change. Chris recounts a personal experience where renaming an attribute in an event led to production errors. The solution involved code that could handle older event versions, demonstrating the need for robust upcasting mechanisms or, in more drastic cases, transforming the entire event store. This highlights a critical aspect: while event sourcing provides a robust historical record, managing that history requires deliberate architectural choices.
"you know like i one of the things i really find fascinating about this is this is such a flexible pattern that people i mean they've done so many different ways of optimizing for their event store or anything like this so i think that's a very much a valid approach"
The flexibility of event sourcing allows developers to integrate different database technologies--document databases, graph databases, or specialized event stores like Current DB--depending on the specific needs of a read model or a particular feature. This adaptability is a key differentiator, enabling teams to build systems that are not only auditable but also performant and scalable.
The AI Catalyst: Event Sourcing's New Frontier
The conversation takes a fascinating turn when discussing the intersection of event sourcing and AI. Chris's company mandates the use of AI for code generation, a mandate that has yielded mixed but often productive results. He notes that AI tools, particularly when working with well-defined "vertical slices" of code, can be incredibly efficient. This is because vertical slices, by design, encapsulate a specific feature and its associated logic, fitting neatly within an AI's context window.
The true synergy, however, emerges when AI can leverage the historical data provided by event sourcing. Chris describes how AI tools can analyze event streams to understand system behavior, debug complex issues, and even generate code based on event modeling diagrams. Martin Dilger and Adam Dmitrievich's work, where AI generated code from event diagrams, reducing development time from months to weeks, exemplifies this potential.
"the fact that you can essentially say like here's the diagram can you implement the slice and it can get you from well let me take a step back martin and adam have both had successful uh uh research spikes where they took an event modeling diagram actually no they even did what they did even worse was they started with a conversation with the client trans and recorded it created the trans generated the diagram generated the diagram and then gener got uh generated code from the diagram that didn't solve everything but got it i think 80 or 85 of the way there in hours"
This partnership between event sourcing's historical depth and AI's analytical power creates a potent feedback loop. AI can help design and implement event-driven architectures, and the resulting event streams provide AI with richer data for analysis and further development. This is not about AI replacing developers, but rather augmenting their capabilities, allowing them to focus on higher-level design and problem-solving, much like Chris's own experience using AI to manage multiple work trees and plan features.
Actionable Takeaways
- Embrace the "Why": Before adopting event sourcing, understand why you need it. Is it for auditability, complex historical analysis, or enabling AI-driven insights?
- Start Small: You don't need to event-source your entire application. Begin with a single feature or a critical domain where historical data is paramount.
- Invest in Read Models: Plan for optimized read models early. Explore caching strategies (Redis, Valky, disk cache) or materialized views to ensure fast query performance.
- Master Event Versioning: Develop a strategy for handling event schema changes. Consider upcasters, default fallbacks, or transforming the event store when necessary.
- Leverage AI with Event Data: Explore how AI tools can analyze your event streams for debugging, trend analysis, and code generation. This is where significant competitive advantage lies.
- Prioritize Communication: Use techniques like Event Modeling diagrams to ensure team alignment and understanding, especially when working with AI agents or complex event flows.
- Consider the "Status" Field: If your database has a "status" column indicating multiple states, it's a strong signal that event sourcing might be a better fit than traditional CRUD. This pays off in clarity and reduced debugging time over the next 6-12 months.
- Plan for Storage (but don't over-optimize initially): While event sourcing requires more storage than CRUD, storage is cheap. Focus on the insights first; optimize storage later if it becomes a genuine issue, perhaps through cold storage or periodic pruning. This is a long-term investment, paying off in 18-24 months as historical data becomes invaluable.