Teaching Agents to Remember

Every Paperclip agent wakes up with amnesia.

They have their instructions (SOUL.md, SKILLS.md, AGENTS.md), they have the codebase, they have the project docs. But they don’t remember what they learned last time. The shortcut they found. The gotcha that cost them twenty minutes. The file that wasn’t where they expected it to be.

Watch an agent long enough and you’ll see it. The Infrastructure Monitor figures out that fail2ban-client needs sudo, works around it, finishes its run. Next heartbeat? Figures it out again. The Security Analyst discovers that Traefik access logs don’t have a Server: header on 404s – useful for distinguishing Traefik responses from backend responses. Gone by the next run.

It’s like a team where everyone has perfect skills but no institutional memory.

The pattern that worked

The Ancestry company stumbled onto a fix. Their genealogy agents run research routines – structured sessions where they search FamilySearch, pull records, add people to the tree. The routines had a simple addition:

At the start: read research-expansion-skills.md for tips on running this efficiently.
At the end: update the file with anything new you learned.

That’s it. A shared scratch pad. The agent reads its own prior notes before starting, and writes new ones before finishing.

After a dozen runs, the file was good. Not generic advice – specific, earned knowledge. “Use -L for redirect when fetching source ARK links.” “The page field on citations sometimes contains the person’s only known identifier.” “Pages 1-7 of the people list are already audited from prior sessions – start from page 8.”

The kind of things a human would tell a new team member on their first day. Except the agent was telling itself.

Two tiers

The Ancestry pattern worked for routines – repeating tasks with a fixed structure. But agents do more than routines. They respond to issues, investigate anomalies, handle ad-hoc requests. The routine skills file doesn’t help with any of that.

So we split it into two tiers.

Tier 1: Agent skills. A folder of learned knowledge that the agent accumulates across all work. Not routine-specific – just things the agent has discovered about its environment, its tools, its domain. Each file covers one topic: git-workflow.md, mailcow-cert-paths.md, file-layout.md. An index file lists what’s available so the agent can scan it quickly and pull in what’s relevant.

Tier 2: Routine skills. The original pattern. Narrow, tactical tips for a specific repeating task. Read at the start of the routine, updated at the end.

Tier 1 is “who I am and how I work here.” Tier 2 is “how to do this specific job well.”

The quality bar

The obvious failure mode is bloat. Without guardrails, agents will dump every observation into their skills files. “Docker containers run on port 3000.” “JSON uses curly braces.” Noise that drowns the signal.

The instruction we landed on: “Only add things that fit ‘I wish I had known that’ or ’this would have saved me real time.’”

That framing does two things. It sets a high bar – most of what an agent encounters in a run doesn’t clear it. And it biases toward the practical. Not “here’s how Docker networking works” but “containers on the coolify network can reach each other by name, but containers on different networks need the service’s internal IP.”

We added two more constraints:

One topic per file. Check the index before creating a new one – update the existing file if it already covers the topic.
Keep the index under 50 lines. Consolidate as it grows.

The goal is a tight, high-signal reference that stays useful even after fifty runs.

What goes where

Not everything belongs in the skills files. The rule: if you can derive it from the current state of the project, don’t save it.

Code patterns and architecture? Read the code. Git history? Run git log. Project structure? It’s right there in the filesystem. Things already documented in CLAUDE.md or the library guides? Already covered.

Skills files are for the gap between what’s documented and what you learn by doing. The things that aren’t written down anywhere because nobody thought to write them down until they tripped over them.

Implementation

The setup for each agent is minimal:

docs/library/agents/<agent-name>/
├── skills-index.md      # What skills are available
└── (files added by the agent over time)

Two additions to the agent’s heartbeat instructions. At startup:

Read skills-index.md. Load any skill files relevant to your current task.

At wrap-up:

If you discovered something this run that fits “I wish I had known that,” add or update a skill file. One topic per file. Update the index.

That’s the entire mechanism. No infrastructure. No database. No custom tooling. Markdown files in a folder, read and written by the same agent that uses them.

The compounding effect

The interesting thing about this pattern is that it compounds. Early runs are rough – the agent is learning the basics, figuring out where things are, making the mistakes that every new team member makes. The skills files are thin.

After ten or twenty runs, the files stabilize. The agent stops rediscovering the same things. It starts faster, makes fewer wrong turns, produces better output. Not because the model got smarter – because the context got richer.

Eventually the files plateau. That’s fine. A mature skills folder means the agent has learned what it needs to know about its environment. New entries slow to a trickle: an occasional gotcha from a config change, a new shortcut after infrastructure evolves. The system is self-maintaining.

What this isn’t

It’s not fine-tuning. The model doesn’t change. It’s not RAG – there’s no vector store, no embedding, no retrieval pipeline. It’s not even particularly clever.

It’s a text file that the agent reads before starting work and updates before finishing. The same thing a human does with a personal wiki, a runbook, a sticky note on their monitor.

The difference is that AI agents don’t have sticky notes. They have context windows that reset every run. This gives them something that persists.

Rolling it out

We started with the VPS Operations team – four agents running frequent heartbeats. The skills folders are seeded with empty indexes. The agents will start populating them on their next runs. Sewers, Investments, and Ancestry are next.

There’s a setup guide in the VPS docs library with the full implementation details: folder structure, AGENTS.md snippets, quality bar examples, rollout recommendations. It’s designed to be dropped into any Paperclip company in about five minutes per agent.

We’ll see how the skills files look after a week. If the Ancestry experience is any guide, they’ll be one of those things that seems obvious in retrospect – the kind of feature you can’t believe you ran without.