Crafting Dynamic AI Memory Beyond Simple Data Storage

Original Title: From Data Models to Mind Models: Designing AI Memory at Scale

The subtle art of AI memory is not about storing more data, but about storing it smarter. This conversation with Vasilije Merkevic reveals that the true challenge in building agentic AI isn't the raw storage capacity, but the nuanced architecture of how agents recall, process, and learn from information. The hidden consequence of poorly designed memory systems is not just inefficiency, but the creation of brittle, unreliable AI that fails to adapt or provide genuine value. Data engineers, product managers, and AI architects should read this to understand how to move beyond simplistic RAG and vector stores to build truly dynamic, context-aware AI systems that unlock competitive advantages through intelligent information management. Ignoring these deeper memory dynamics means building systems that are destined to underperform, regardless of their underlying LLM's power.

The Ghost in the Machine: Why Your AI Needs More Than Just a Big Brain

The explosion of Large Language Models (LLMs) has given us powerful tools, but building truly intelligent agents requires more than just a massive knowledge base. As Vasilije Merkevic explains, the core challenge lies in agentic memory: how these stateless LLMs can retain context, learn from interactions, and leverage past experiences to make better decisions. This isn't just about storing data; it's about creating a dynamic, evolving memory that mirrors how humans learn and adapt. The conventional approach of simply dumping data into vector stores is proving insufficient, leading to agents that are accurate but not truly intelligent, or worse, agents that become overwhelmed by their own data.

The Illusion of Intelligence: When More Data Means Less Insight

The initial wave of AI memory solutions focused on Retrieval Augmented Generation (RAG) and vector stores. The idea was simple: feed the LLM relevant data, and it would perform better. However, as Merkevic points out, this approach has a critical flaw. LLMs are trained on vast, general datasets. When you try to inject custom, up-to-date information, the model can struggle to accurately retrieve and utilize it, especially if that information deviates from its training data. This leads to a disconnect: the agent has access to data, but it doesn't truly understand or remember it in a way that enhances its decision-making.

"The data that they need to access was in the let's say earlier days was some company data or some data that they couldn't ingest through their context window because the context window of the agent was relatively small... Now as we move forward with the implementations and the agentic use cases we've seen agents having an extended context windows so the data storage became less of a problem but what became a problem was that the agents were simply not accurate enough."

-- Vasilije Merkevic

This highlights a crucial shift. The problem isn't just about accessing data; it's about the quality and structure of that access. We've moved from the challenge of fitting data into limited context windows to the more complex challenge of ensuring agents can retrieve and act upon the right information, even when that information is novel or customized. This necessitates a move beyond simple vector stores to more sophisticated memory architectures.

Beyond the Static: Crafting Dynamic Memory Architectures

Merkevic introduces the concept of a multi-layered memory system, distinguishing between "permanent memory" and "session memory." Permanent memory, often built on graph-vector store combinations, stores core business rules, ontologies, and long-term contextual information. Session memory, on the other hand, captures the immediate, ephemeral interactions: reasoning traces, tool calls, and transient data generated during a specific agent's task. This session memory is vital for enabling agents to pick up where they left off, share context, and even distill learnings into the permanent memory over time.

The implications of this layered approach are profound. It allows for a more nuanced understanding of an agent's state and history. Imagine an agent that, after a series of failed attempts to complete a task, can review its session memory, identify the flawed reasoning steps, and adjust its strategy. This is a significant leap from stateless agents that must restart from scratch with every new query. Furthermore, the ability to "distill" session memory into permanent memory creates a feedback loop for continuous improvement, allowing the AI to learn and adapt based on its operational history.

"What we built is this concept of a session memory people often call this short term memory in in the agentic space referring wrongly to this kimball models of short term memory in humans but the problem with that is that our short term memory for humans is like what six seconds for a number of items that then get distilled into the permanent in this context for session memory what we did is we created a typical in the in the analytics web landscape the typical concept of a session which starts when a user defines it and ends when user defines it and there we store all the reasoning traces all the tool calls and then these can be used and distilled into the permanent memory..."

-- Vasilije Merkevic

This distinction is critical for competitive advantage. Companies that invest in robust session memory can enable their agents to learn from failures and successes, leading to more efficient and effective operations over time. This is where delayed payoffs create separation; the initial investment in building these memory systems might seem costly, but the long-term gains in agent performance and adaptability are substantial.

The Hidden Cost of Simplicity: When Markdown Isn't Enough

While simple solutions like markdown files or basic prompt templating might suffice for rudimentary tasks, they quickly break down under complexity. Merkevic notes that while a developer might use markdown files to track SDK improvements, the lack of versioning, deduplication, and robust indexing leads to redundancy and lost information, especially when sessions are terminated unexpectedly. This is a classic example of how conventional wisdom -- "just use a file" -- fails when extended forward into more demanding use cases.

The real value emerges when memory systems can handle complex relationships and provide structured retrieval. Merkevic highlights graph databases (like Neo4j, Kuzu) and specialized vector stores (Pinecone, Milvus) as essential components for building sophisticated memory. These tools allow for storing not just raw data, but also metadata, relationships, and temporal information, enabling agents to retrieve contextually relevant information with high precision. The ability to filter by node sets, timestamps, and custom attributes, as described in the Cogny SDK, transforms memory from a passive data store into an active component of the AI's reasoning process.

"The problem is you're going to end up with unversioned seven files like i did you're going to end up with a lot of redundancy and the updates might be hard to manage or um let's say roll back and and know where you left it off after the session is done so that's kind of the limit but for a lot of use cases it's going to work and you probably don't need to do anything more than that and you probably don't need to move towards more complex systems but in cases where you have to reconcile the information that is not intuitive so if we're talking about the cases like you know summer fashion in brazil is winter fashion in europe and these types of business rules that the llm might not intuitively connect the dots on then you can start for example with the vector store..."

-- Vasilije Merkevic

This points to a critical insight: the "obvious" solution often creates downstream complexity. Relying on simple file-based storage for dynamic AI memory leads to unmanageable data, lost context, and ultimately, an AI that cannot scale its intelligence. The competitive advantage lies in embracing the complexity of structured memory systems that can adapt and evolve with the AI's needs.

The Edge Case: Memory Where You Least Expect It

The discussion also touches on the emerging frontier of edge computing and memory. The idea of having a memory layer on a mobile phone, communicating with on-device vector stores and eventually other edge devices, opens up entirely new possibilities. This distributed memory model could enable highly personalized and responsive AI experiences, even in environments with limited connectivity. This is where the concept of memory extends beyond centralized data centers and into the fabric of our daily interactions, creating potential for entirely new product categories and user experiences.

Actionable Takeaways for Building Smarter AI Memory:

  • Embrace Layered Memory: Implement distinct layers for permanent (long-term, structured knowledge) and session (short-term, interaction-based context) memory. This provides a more robust foundation for agent learning and state management.
  • Beyond Simple Vector Stores: For complex use cases, integrate graph databases and specialized vector stores to manage relationships, metadata, and temporal data. This allows for richer context and more precise retrieval.
  • Invest in Session Distillation: Develop mechanisms to distill valuable insights from session memory into permanent memory. This creates a continuous learning loop, improving agent performance over time.
  • Prioritize Data Contract Evolution: Plan for schema evolution and data versioning. While challenging, it's crucial for maintaining data integrity and enabling smooth transitions as memory requirements change.
  • Consider Edge Memory: Explore the potential of on-device and distributed memory for mobile and IoT applications, enabling new forms of personalized and responsive AI.
  • Focus on Tool Calling Abstraction: Abstract tool calling for memory operations (storage, retrieval, feedback) to simplify agent development and ensure consistency across different frameworks.
  • Don't Over-Optimize for Simplicity: While simplicity is appealing, recognize its limitations. For complex agentic tasks, invest in the necessary architectural complexity for memory management.

By understanding and implementing these principles, organizations can move beyond basic LLM integration and build AI systems that are not just intelligent, but truly adaptive and capable of delivering sustained value.

---
Handpicked links, AI-assisted summaries. Human judgment, machine efficiency.
This content is a personally curated review and synopsis derived from the original podcast episode.