How to Think About Agentic Memory Job-to-Job
Andrew ParkEditorial Lead, Heavybit
Moving Memory From Individual Apps to Centralized Infra
Despite all the enthusiasm for AI agents, recent reports suggest that only a fraction of enterprises have bothered to put them into production for a variety of reasons: Reconciling how most orgs’ data is locked up in legacy systems; constraints around data architecture and flows; and the usual enterprise-level governance and compliance concerns.
But agents themselves still struggle with a variety of performance issues that keep them from being the durable, reliable workers of the future they’re supposed to be. One glaring example is agents’ memory, or lack thereof, between jobs. Even after successful runs, agents by default begin their next jobs with a severe case of amnesia.
While some orgs have taken the approach of building agentic memory into specific apps, there’s a growing need for a wider, infrastructure-based approach for agents that can remember how to do things correctly. Open-source builder Heinrich Krupp tackled this challenge with his project MCP Memory Service, and explains his approach below.

MCP Memory Service provides persistent semantic memory for agents. Image courtesy mcp-memory-service
MCP: Generally Useful, But Where’s the Memory?
Krupp, an infrastructure and data management veteran, explains that his day-to-day work with enterprises on cloud infrastructure, delivery, and provisioning resources led him to start experimenting with MCP in December of 2024.
“My immediate reaction was: ‘This is genuinely useful!’ I had been working with AI tools for a while and had noticed a consistent gap. There was no memory layer. So I sat down with Claude, brainstormed the scope, and started building.”
The builder’s design thinking prioritized a privacy-first approach and open-source tooling, which led him to disqualify ChromaDB as a vector store for being too heavyweight and tack toward sqlite-vec for performance.
“I juggle a large number of projects with constant context switching, and I needed a reliable way to store important facts, decisions, and discoveries so they’d be accessible in the next session. When you’re managing Terraform configurations, security compliance, and customer-specific architecture decisions across 50-plus enterprise accounts, losing that context at the start of every AI session is a real tax on your work.”
After experimenting with different versions of Anthropic Claude, the builder found a fit with Claude Code’s hooks, embedded commands that users can build in across a Claude project’s lifecycle. Using this feature, Krupp built “Memory Awareness Hooks” to automatically inject context at the start of new sessions and keep learnings persistent. “This effectively gives Claude Code a persistent brain. That was the breakthrough that made the whole thing feel complete.”
Building Memory that Doesn’t Start from Zero
Krupp notes that his agentic sessions always seemed to start from nothing: No matter how many explanations he’d enter into the prompt, after a few dozen tool calls, the context window would fill up, he’d restart, and begin all over again.
“The tools that existed at the time either treated memory as a cloud subscription (proprietary, per-call pricing, data leaving your infrastructure) or as a simple markdown file you maintained manually, which is just organized forgetting with extra steps. Nobody had built memory as actual infrastructure that was self-hosted, semantically searchable, accessible via a standard API, with a lifecycle that includes consolidation and forgetting.”
The builder suggests that AI memory seemed like a personalization feature for chatbots, not a shared data layer for multiple agents, tools, and sessions. “The decision that shaped everything else was treating memory as infrastructure rather than a feature.” Krupp set about designing a REST API first (with 76 endpoints) so any agent, client, or language could interact with it without the need for specialized SDKs. “MCP support came on top of that, not instead of it.”
“On the storage side, the core tension is retrieval precision against latency and cost. We use local ONNX embeddings (sentence-transformers’ all-MiniLM-L6-v2, running fully on your hardware). No API calls per query, no per-token cost, no data leaving your infrastructure. The trade-off is embedding quality versus a frontier API model, but for most memory retrieval tasks the local model is more than sufficient, and you get 5ms read times against sqlite-vec.”
Krupp also designed a hybrid backend that bridges the gap between local and cross-device. “You want local speed for reads but you want cross-device sync. So the architecture separates concerns. sqlite-vec handles all reads at 5ms, a background process syncs to Cloudflare for durability and multi-device access.” By separating the MCP server and the network service at runtime, the builder avoided ‘database locked’ failures.
“The toughest trade-off was memory granularity. Storing at the turn level gives fine-grained retrieval but spreads a session’s semantic signal across many entries, which hurts benchmark scores on things like LongMemEval. Session-level storage scores higher on those benchmarks but gives you less precision when you want to know exactly what was said about a specific topic.”
Krupp argues that for memory granularity, chasing benchmark scores misses the point. “In production, fine-grained retrieval with a good quality scoring system outperforms coarse session storage on the tasks that actually matter.”
“The knowledge graph layer was an unexpected evolution. It started as a way to link related memories, but once you have typed edges (causes, fixes, contradicts, supports) you can query causal chains, not just similar facts. That’s a qualitatively different capability.”
Job-to-Job Memory vs. Context Rot
The holy grail in agentic engineering appears to be performant, reliable long-horizon agents that are immune to the context rot and hallucinations that set in with prompts that are too long and complicated. Krupp clarifies that MCP Memory Service’s agentic memory isn’t the solution to this challenge, but to a different one.
“Memory at the MCP level doesn’t solve in-session context rot. That’s a different problem involving attention mechanisms and context window management. What it does solve is the inter-session problem: Why does every new session or new agent start without knowing what the previous one learned?”
“The practical mechanism is what we call context providers, automatically injecting relevant memories at session start based on the current project, the current files, the recent activity. In our setup, this reduces context token usage at session start by 65% or more. The agent already knows the architecture, the recent decisions, the known failure modes. You don’t re-explain; you continue.”
“For multi-agent pipelines, the knowledge graph becomes the coordination layer. One agent stores a learning with a specific tag. Another agent retrieves it without any direct inter-agent communication protocol. There’s a user in our community running a five-agent cluster where they use the memory service as both shared state and an inter-agent messaging bus. Cluster agents write to a sentinel tag, the local agent polls that tag. That pattern emerged naturally from the tagging system; nobody designed it explicitly.”
The builder clarifies that memory isn’t the only missing piece for consistent agentic performance. “You also need good task decomposition, clear agent handoff protocols, and checkpointing within long runs. Memory is the persistence layer. It doesn’t replace the orchestration logic. But without it, every long-horizon task is fighting against statelessness at the infrastructure level, which makes everything else harder.”
[PULL QUOTE] “Memory...doesn’t replace the orchestration logic. But without it, every long-horizon task is fighting against statelessness at the infrastructure level, which makes everything else harder.” -Heinrich Krupp, Creator / MCP-Memory-Service [/PULL QUOTE]
The Role of MCP in Agentic Memory (and General Agentic Ops)
“MCP matters because it gives memory a standard interface that AI clients can discover and use without custom integration work. Before MCP, connecting a memory service to an AI tool meant writing glue code for every client. Now Claude Desktop, VS Code with Copilot, Cursor, and a dozen others can connect to the same server with a config entry.”
Krupp cautions that MCP itself may not make sense to be the lynchpin of software architecture for a number of reasons. “The REST API is equally important and more universal. If MCP evolves, is superseded, or simply isn’t supported by a particular client, the memory infrastructure still works via HTTP. Twelve MCP tools sitting on top of 76 REST endpoints is the right architectural layer separation.”
“MCP does some things well, like discoverability, tool schema negotiation, and integration with AI client ecosystems. But MCP doesn’t replace a stable HTTP API for programmatic access, agent frameworks that operate outside MCP-aware clients, and enterprise environments that need REST for their existing auth and proxy infrastructure.”
“In the long term, I think MCP will become a standard the way REST became a standard, ubiquitous but not special. The interesting competition will be at the memory content and quality layer, not at the protocol layer.”
How Startups Should Think About Agent Memory Economics
Krupp suggests that the right price tag to stand up an early, no-frills agent memory system is $0. “The default setup I use of an SQLite-vec backend with local ONNX embeddings costs essentially nothing beyond the server it runs on. No per-call API charges, no vector database subscription, no data egress. For a startup in early development, that’s the right starting point.”
“The economics change when you need team sync and multi-device access. The Cloudflare backend adds real costs, but at typical startup scale, it’s in the range of single-digit dollars per month. That’s worth it when you have multiple developers or agents sharing a memory layer.”
“The comparison that matters for startups is against the alternative: calling a frontier LLM to regenerate context on every session. Memory retrieval at 5ms with zero API cost replaces a non-trivial number of LLM tokens per session. If you’re doing active AI development with many sessions per day, the token savings pay back the infrastructure cost quickly.”
“My practical advice is that teams shouldn’t over-engineer the memory layer early. Start with semantic search and basic tagging. Don’t build a full knowledge graph before you have 1,000 memories worth organizing. The consolidation and quality-scoring systems are powerful but they’re optimization layers; the core value is simple, persistent, semantically searchable storage. Get that working first, understand your retrieval patterns, then add complexity where it actually helps.”
The builder adds: “Self-hosting changes the unit economics entirely. Proprietary memory APIs charge per storage operation and per retrieval. At scale, that adds up. Open-source, self-hosted infrastructure with local embeddings is a fundamentally different cost structure that startups should think about seriously, especially if memory is a core part of the product.”
How Builders Should Approach Agent Memory Economics
Krupp reiterates that the biggest mistake that builders make today is treating agent memory as an individual app feature, rather than as centralized infrastructure. “When memory is a feature, it belongs to a user, lives inside an application, and gets designed around a single interaction model. When memory is infrastructure, it’s a shared layer that multiple agents, tools, and workflows can read from and write to. The architectural implications are completely different.”
The second biggest mistake, says the builder, is storing everything without a quality model. “Raw storage without scoring creates retrieval pollution. You end up with a database full of session noise, test outputs, and transient observations that contaminate your search results. Memory needs a lifecycle: Store, quality-score, consolidate, and eventually forget. The ‘forgetting’ part is uncomfortable for developers, but it’s essential. Stale memories, outdated facts, and superseded decisions actively harm retrieval quality.”
“The third biggest mistake is conflating vector similarity with causal knowledge. Semantic search finds related content well. It does not tell you that A caused B, or that Fix X resolves Problem Y, or that Pattern Z contradicts Pattern W. For complex technical domains, you need typed relationships, not just cosine distance. That’s why the knowledge graph layer matters.”
Krupp recommends that teams that are specifically building AI products prioritize memory isolation early. “Who can see what? If Agent A writes a memory, should Agent B be able to read it? How do you scope memories by customer, by project, by context? Getting the tagging and identity model right early saves painful migrations later.”
[PULL QUOTE] “When memory is a feature, it belongs to a user, lives inside an application, and gets designed around a single interaction model. When memory is infrastructure, it’s a shared layer that multiple agents, tools, and workflows can read from and write to. The architectural implications are completely different.” [/PULL QUOTE]
Getting Agentic Memory Enterprise Ready
As an enterprise architect himself, Krupp reviews the core requirements for enterprise software usage: “Data sovereignty, auditability, and isolation. Each one implies a specific technical answer.”
“Data sovereignty means the memory service has to run on your infrastructure, using your compute, with embeddings generated locally. No API calls to an external service for retrieval. Every sensitive architecture decision, customer interaction, or compliance note that goes into memory stays inside your perimeter. Local ONNX embeddings are non-negotiable here.”
“Auditability means knowing who stored what, when, and why. The service includes an audit log plugin. Combined with the x-agent-id header that auto-tags memories by agent identity, you have a clear lineage trail for every stored fact.”
“Isolation means multi-tenant memory where one team, customer, or agent cannot access another’s memories. This is solved via tagging and scoped retrieval, but it requires deliberate design at the application layer. OAuth 2.1 with Dynamic Client Registration (RFC 7591) handles authentication and client provisioning, the same auth standard enterprises expect.”
Krupp notes that agentic memory is still missing a few key pieces of the puzzle for enterprises. “What’s genuinely missing at this point is formal certification and compliance documentation. SOC 2, ISO 27001 alignment, GDPR data residency statements exist as architectural properties but not as third-party certified artifacts, or at least, not yet.”
“For any enterprise that needs checkbox compliance before adoption, that’s the current gap. Nobody has demanded it yet; the project’s users have either been comfortable with self-assessment or have been developers who can evaluate the architecture themselves. That will change as the project matures.”
Evolving an Open-Source Agentic Memory Project
Krupp concedes that there are many approaches to solving agentic’s amnesia problem, and that there isn’t necessarily a clear-cut leader at the moment. “I periodically compare the project against other memory layers as they become popular. When something interesting appears, I analyze it, usually with Claude, and ask: What can we learn from this? What should we adapt?”
“That process has generated a number of RFCs and design discussions in the project. One issue proposed transitive closure and abductive inference for the knowledge graph. Capable contributors pick those up and implement them. The project has become genuinely community-driven in that sense.”
“I think that what made this community possible was investing in infrastructure that has nothing to do with the code itself. Things like thorough documentation, a comprehensive wiki, deployment automations, periodic housekeeping, and clear contribution paths. Projects that don’t invest in those things don’t attract serious contributors.”
“If I could, I’d like to mention one last thing about the project. Being honest, I think it deserves more support from sponsors. The maintainer workload remains significant even with extensive agent automation, including steering the project, reviewing contributions, maintaining quality standards, and responding to the community. If the companies and developers who depend on this infrastructure valued it at the level they’d pay for a proprietary alternative, the project would be in a very different position. Open-source sustainability is a real problem in this space, and memory infrastructure is no exception.”
What’s Next for Agentic Memory
The builder is realistic on the place of memory within the future AI stack. “Basic vector storage will be commoditized. It’s already happening. The differentiation will move to the quality layer (how memories are organized, scored, and consolidated) and to domain-specific memory schemas that encode deep knowledge about particular fields.”
“A healthcare AI’s memory is not interchangeable with a software developer’s memory. The schema, the relationship types, and the forgetting policies are all different. But durable memory will become table stakes for serious AI products. The competitive advantage will be in the quality of what you remember and how intelligently you use it, rather than simply having the ability to remember anything at all.”
“Also, a less-obvious learning I discovered while building this project is that the most useful patterns for how to use memory in AI systems have come from the community, not from me. A user running a multi-agent cluster discovered that the tagging system could double as an inter-agent messaging bus, no custom protocol, just sentinel tags.”
“Another user put the service behind a Cloudflare tunnel with a self-built auth proxy and made it remotely accessible across all their devices. These weren’t designed features; they emerged from people solving real problems with a flexible primitive. That suggests something broader about how to think about AI memory. We may be able to accomplish more if we don’t design for a specific use case, but instead design for composability. A memory service that exposes a clean, flexible API will find use cases you never imagined. The ones you do imagine will probably be the least interesting ones.”
Content from the Library
Why Orchestration May Be the Future of Agentic Development
For now, AI agents are autonomous entities to which users can delegate simple tasks: Monitor your calendar. Sort emails. But for...
How to Think About Selling AI Products
How to Actually Sell AI Products in Competitive Markets AI in today’s news headlines is all about big numbers: High performance...
The Future of AI Code Generation
AI Code Generation Is Still in Early Innings AI code generation tools are still relatively new. The first tools like OpenAI...