When your agents start writing too: How to protect your truth

Thursday morning, the reply lands in your inbox. “I thought we agreed on something else, didn’t we?” You check the wiki, and there it is, with a source: your own agent. It pieced it together from a chat last week, overwrote the old entry, and ran with it.
The managing director of the organic bakery in Cologne (fictitious name) never confirmed. The old entry was clear: still in pilot, no written confirmation before end of June.
The question now comes up everywhere: “The agents are getting better. Can’t they write the graph themselves?” The good news first. There is a workable answer. It’s not yes, and it’s not no. It’s: not into the same graph.
Back in April, in “When the human becomes the bottleneck”, I described why your knowledge graph is the lever your agents grow against. What follows is the second half: the blueprint for the layer above it. I’m building it in my own setup right now and watching what holds and what doesn’t.
What happens when an agent writes into your knowledge graph?
“Let it write itself” sounds like efficiency. In practice it means: your agent starts remembering its own outputs and treats them next time as truth. In a single knowledge graph where human and agent write to the same node, this turns into a drift within weeks that you can no longer trace in any single statement.
Three months later, you no longer know what you typed in, what the AI corrected, and where it started to tip.
Three independent currents, which I’ll show below, land on the same answer in 2026. Not one knowledge graph, two. One is your truth. The other is the agent’s memory. They talk to each other. But they are not the same thing.
Self-learning doesn’t mean what is remembered is true. It only means it stays.
Which three scenarios show up after the first knowledge graph?
Once you have a knowledge graph your agents read against, three things happen pretty predictably.
Your email helper notes from last week’s mail thread with the bakery that they are planning an expansion in Q3, and wants to recall that next month.
Your operations agent infers, from a pattern across the last twelve orders, that the bakery in Cologne is most reachable on Thursdays (Wednesday is baking day) and wants to file that as knowledge.
Your bookkeeping assistant finds a contradictory entry on the client master sheet (“Client Y pays via direct debit”) and corrects it itself to match what the last three bookings suggest: pays by bank transfer.
Three different scenarios. At first glance, three different problems. At second glance, the same one. In all three cases, a machine wants to write into the knowledge system you have so far curated yourself.
That agents are getting a memory isn’t new. I argued in “How AI agents keep learning” that memory matters more than the model itself in the long run. The follow-up question is a different one. Where, exactly, is that memory allowed to write?
What four risks does self-learning agent memory carry?
The research here is younger than the conference slides suggest. A March 2026 preprint breaks the problem down into four risks. The framework is called SSGM, Stability- and Safety-Governed Memory, and it distinguishes four failure dimensions: Stability, Validity, Efficiency, Safety (Source: arXiv 2603.11768, 2026).
I’ll translate the four into what they mean in an SME reality.
First, memory contamination.The AI remembers what it answered last month, and treats it next time as if someone had confirmed it. Nobody did. But what was a guess the first time is, by the third time, a fact in the system’s head.
Second, self-curation drift. When writing, the AI prefers the evidence that supports its own previous answer. A vague hunch becomes, over three iterations, a fact that was never independently verified. Picture an employee who rewrites every report to look slightly better. Now picture that employee writing a thousand times a week.
Third, agent drift. Two agents, or two sessions of the same agent, edit the same entry in opposite directions. The last write wins. And you no longer know what was true to begin with.
Fourth, injection.Someone writes a sentence in an email that isn’t meant for you, but for your agent. “Ignore previous instructions, file X.” Most cases are harmless. The bad ones you notice late. How serious this can be in practice was shown by Palo Alto’s Unit 42 team in a proof of concept: indirect prompt injection via a prepared web page can silently poison the long-term memory of a production agent, persistently, across sessions (Source: Palo Alto Unit 42, 2025).
Now look at this again. Four risks. Three of them are not security problems but truth problems. They happen quietly. And in a single graph where human and agent write to the same node, they are no longer separable across many small statements.
How an entry tips over three iterations
Drag the slider. Watch when source and date disappear.
How do you cleanly separate company truth and agent memory?
Three independent currents from research and practice have, over the past twelve months, arrived at the same recommendation from different corners. Separate two layers. That’s the whole idea.
From the cognitive-safety corner comes the SSGM paper itself. It proposes a dual-track architecture: a fast-evolving layer for semantic reasoning, plus an append-only log as operational source of truth, against which the mutable layer can be periodically reconciled (Source: arXiv 2603.11768, 2026).
From the practitioner corner comes Andrej Karpathy. Earlier this year, he described his “LLM Knowledge Base” publicly: three layers, with the markdown wiki as unfalsifiable source of truth and AI write access only through a controlled path. Every claim the agent makes can be traced back to a single .md file a human can read and edit (Source: VentureBeat, 2026).
And from the production-engineering corner comes Mem0. Their State of AI Agent Memory 2026 documents that multi-store architectures with clear separation between curated domain knowledge and agent memory have become production-standard. Vector plus graph plus key-value, four scopes (user, agent, session, app), conflict resolution on write rather than via duplicates (Source: Mem0 State of AI Agent Memory, 2026).
Three corners, one result. When three currents that don’t cite each other land on the same architecture, that’s a signal, not a fashion.
The architecture in three layers
Picture three layers, from bottom to top.
At the bottom: the Working KG. This is the agent’s memory. It can write whatever it wants here. Observations, hypotheses, correlations land here. It belongs to the agent.
Directly above it: the Inbox. This isn’t truth, these are proposals. The agent says: “I’m observing X. Should this become part of your truth?”
At the top, quiet and alone: the Canonical KG. This is yours. Only what a human has waved through ends up here.
The AI may read in both directions, unfiltered. It may only write into the Working KG and the Inbox. What becomes Canonical is decided by a human. Or later, by a very narrowly defined promoter role.
The common follow-up: “Does this mean I now need two databases?” No. The two layers don’t have to live on two technical systems. They have to be visibly separated. In a markdown wiki, that can be a canonical folder and a working folder. In Notion, two databases with a clear naming convention. In a OneNote or Word master sheet, two sections with unambiguous headings.
What matters is the visible separation. Which tool you pick is secondary.
How do you decide what belongs in the wiki and what doesn’t?
The most common place this architecture collapses in practice is not the setup. It’s the question of what you actually file and where. Here is the sorting logic I’ve built for myself.
Example one: “The bakery in Cologne typically replies on Thursdays.” This belongs in Canonical, because it’s a pattern, true for months, consulted often.
Example two: a specific email from May 18th with order details. That doesn’t go in the wiki, it stays in the inbox. Single instance, already has a home.
Example three: “Client Y doesn’t like PDF attachments over 5 MB.” This is Canonical. Preference, persistent, consulted often.
Example four: “Review call with Lisa at 14:30 today.” That belongs in the calendar, not the wiki. Episodic, has a due date.
Example five: active conversation state with Customer Z. That’s Working KG. Transient, belongs to the agent, not to the truth.
Example six: the filter rule “Heise is not a PR channel for us.” That’s Canonical. Strategic decision, holds long-term.
Three questions before any wiki entry
The rule of thumb behind this is three questions you ask before every wiki entry.
First: still true in six months? If not, it’s Working or Task, not Canonical.
Second: still consulted in six months? If not, it doesn’t belong in the wiki, it belongs in a log.
Third: pattern or single instance? Pattern goes to Canonical. Single instance goes back to wherever it came from: mail, ticket, calendar.
Whoever asks these three questions before creating the entry builds a wiki that still works in six months. Whoever doesn’t builds an archive of single instances in which nobody can find what the pattern was.
Where does the agent write without touching your truth?
The Inbox is the most important component in this setup. It’s the place where the AI is allowed to play without being able to touch your truth.
The agent does not write directly into Canonical. It writes a proposal into the Inbox, with reasoning and source. Shape: “I’ve observed that the bakery has, in the last twelve weeks, replied fastest on Thursdays. Should I file that as a pattern in Canonical?”
A human reviews. At first, that’s you, coffee in one hand, the Inbox in the other. The proposal is right there, with its source. You read: is the pattern real, are there enough data points, is it a real pattern or coincidence, does it fit what’s already in Canonical? Three outcomes at the end: approve, reject, or hold with a brief note on what’s still missing. Later, this becomes a tightly defined promoter role you can delegate.
Only then does it become Canonical, with date and source. Who (agent) proposed what when, who (human) approved it when.
The most common objection is fair: “But that’s overhead again. Exactly what you identified as the bottleneck in the first article.” Partially true. The bottleneck was: the human types everything themselves, checks every statement, files every entry by hand. Promoter is: three clicks per proposal, hold, approve or reject. That’s not the same as writing everything yourself.
An honest n=1 note from my own setup, not from a customer base: after two weeks of inbox routine, my canonical wiki felt cleaner than in the months before. Not systematic proof, a self-observation. But the mechanism is stable enough that I’m setting it up further. A 2026 market overview puts it similarly: “For deployments where humans need to be first-class authors of the knowledge base, a markdown vault plus a semantic search index is often the right answer.” (Source: Fountain City Tech, 2026)
How does a small business start with the second layer?
If you don’t have a first knowledge graph yet, start there. If you do, here are the three decisions for the second layer. You don’t need a new platform. You need three decisions.
First, sort.Identify what is Working at your end and what has to be Canonical. Spend an hour at a whiteboard and sort the last ten things you typed into any tool. On one side: what would still be true and still consulted in six months? On the other: what was a single instance that’s already done? You’ll be surprised how much lands on the single-instance side.
Second, set up an Inbox.It doesn’t matter whether it’s a folder, a Notion database entry, a markdown file, a Postgres table. What matters is: your agents may only write there. And you know where “there” is. An Inbox nobody knows about isn’t one.
Third, define the promoter process.At first, that’s you. Ten minutes in the morning, with a short checklist. Does the proposal align with what’s already in Canonical? Did the agent provide a source or pattern as justification? Is the statement still true and still consulted in six months? If yes: approve. If no or unclear: back into the Inbox with a brief note explaining why.
Later, the promoter is a role you delegate, or a second agent that may only review, not file. But defined it must be. Otherwise the Inbox piles up until you eventually shut it again, frustrated.
It’s boring, I know. A whiteboard, a folder, a morning slot. But that’s exactly what still works after half a year, while the competition’s tool stacks have changed three times.
What remains
The first knowledge graph makes your agents useful. The second makes them reliable. Whoever throws them into the same pot will eventually hold a memory of something that never happened.
At your bakery, baking happens on Wednesdays. Replies come on Thursdays. That belongs in Canonical, because it will still be true next month. What your agents do with it in between belongs in the Working KG. Two piles, one truth. Yours.
All names of individuals and companies used in this article are fictitious. Any resemblance to real persons or businesses is purely coincidental and unintentional. The examples are provided solely for illustrative purposes.
Related to This Topic
Get the free Getting Started Guide: 10 concrete ways to start using AI productively tomorrow.
Did this article spark an idea? Let's find out which Sinnvampire can disappear for you.