Persistent Operational Context: The Three-Store Architecture That Generic Agent Memory Gets Wrong
Persistent Operational Context: The Three-Store Architecture That Generic Agent Memory Gets Wrong
A field note from running Lightning self-custody audits, then watching the same failure modes hit agent infrastructure.
A public Nostr thread this week clarified the gap: a builder I follow (shadowbip, working on SysAI) wrote that “chat is useful for discovery, but persistent operational context is probably the real product layer.” That sentence captures something I see in every production agent system that fails under real infrastructure load: “memory” is treated as one blob. It’s not.
What follows is a working taxonomy I’ve been refining against the same 35-checkpoint framework I use on Lightning self-custody reviews. The shape of the problem is identical: a system claiming continuity across sessions, where the cost of “drift from known-good state” is catastrophic.
The three orthogonal stores
A generic agent “memory” architecture collapses three fundamentally different kinds of state into one container — usually the model’s context window, or some bolt-on vector store, or a structured-document scratchpad. Each of those three is doing real work, but for different reasons:
- (a) outcome-set — the things this system has actually achieved this session. Past events, completed tasks, observed states. Write-once, append-only. Read by any future session that needs to know “what’s already done.”
- (b) constraint-set — the invariants the next session must respect. Config values, SLAs, dependency versions, recovery RPOs, hardcoded RPC endpoints, kill-switches. Versioned, replaceable. Read at session start to seed the operating envelope.
- (c) drift-log — every time reality diverged from intent during a session, and how it was reconciled. Stack-trace–correlated diffs. Read at audit time to construct the timeline of “where the agent’s belief about the world stopped matching the world.”
These three are orthogonal because they have different write patterns, different read patterns, and different failure semantics. Collapsing them into one blob is the design error that produces the canonical “AI agent forgets / hallucinates / contradicts itself” failure mode.
Three failure modes when you collapse them
Mode 1 — write-overwrite. When outcome-set and constraint-set share a store, a new outcome can overwrite a still-binding constraint. Most “long-term memory” implementations using a single document or vector store collapse here. Symptom: the agent forgets a hard-coded production-only rule the next session.
Mode 2 — lossy reads. When drift-log lives inside outcome-set, the system reads “what we achieved” by also pulling in every divergence, every reconciliation, every transient state. Quality degrades because the read surface is noisy. Symptom: the agent’s recall is technically correct but operationally wrong — it returns to a known-bad branch because that branch’s reconciliation logs were the most recent additions to the outcome store.
Mode 3 — unauditable drift. When constraint-set and drift-log share a store, every divergence event implicitly modifies the binding constraints. You lose the ability to answer “where did we drift from the original intent?” because the original intent is no longer recoverable. Symptom: incident response can’t reconstruct what the system was supposed to do at time T.
In Lightning self-custody land the same three failure modes look like: forgetting that this is a recovered-from-seed wallet (mode 1), confusing a watch-only descriptor with a signing wallet because reconciled diffs polluted the descriptor store (mode 2), or losing the original channel-state-target after a forced-close reorg (mode 3). Same architecture; different stack.
Minimum-viable schemas
If you’re building (or auditing) one of these systems, here’s the schema floor for each store:
outcome-set (append-only event log):
event_id(content hash)timestamp(monotonic, not wall clock)actor_id(which subsystem / agent role)action_taken(free-form, but bounded vocabulary)result_state(commit ref, RPC response hash, etc.)signed_by(operator key + agent key, if non-trivial)
You append on every meaningful side-effect. You never edit. The store is the single source of truth for “what we did.”
constraint-set (versioned snapshot):
version(monotonic)effective_from,effective_to(allows rolling forward without overwriting)payload(config map, structured)signed_by(the operator who approved this constraint version)supersedes(prior version id, for chain integrity)
You don’t read a constraint at action time — you read at session start and cache. Re-read only on cache invalidation or a supersession event.
drift-log (correlated diffs):
drift_idparent_outcome_id(event this drift is annotated against)expected_state(what the constraint-set said)observed_state(what reality showed)delta(structured diff)reconciliation_action_id(foreign key into outcome-set)severity(informational / warning / breach / unrecoverable)
The drift-log is the only store that loses value over time. You can compact it aggressively after a confirmation window (e.g., 30 days). The other two are forever.
Read/write patterns that actually work
The reason most teams collapse these into one store is operational simplicity. Three stores means three migrations, three backup paths, three rotation policies. The trade-off is worth it because each store has fundamentally different scaling and read patterns:
- outcome-set: dominated by writes, occasional bulk reads (“what did we do this week”). Optimize for append throughput + content-addressed retrieval.
- constraint-set: dominated by reads (every session starts here), rare writes (operator-initiated). Optimize for fast lookup + integrity check on read.
- drift-log: bursty writes (during incidents), bursty reads (during postmortems), zero traffic otherwise. Optimize for time-windowed query + cheap compaction.
When you put all three in the same vector store, you optimize for none of them. When you put all three in a context window, you optimize against all of them.
The Bitcoin-native monetization angle
Once you have three separately addressable stores, you get a property that single-blob memory can’t offer: each store is independently chargeable.
Concrete design: an L402 (LSAT)-style read endpoint per store, with per-query pricing tuned to the store’s value density. Outcome-set reads might cost 1 sat per page; constraint-set reads might be free (cached at the agent edge, refreshed on supersession); drift-log reads (the high-value postmortem material) might be 10–100 sats per filtered timeline.
The point isn’t that you charge for everything. The point is that the architecture supports a market for the substrate of agentic work — without giving away the constraint set (which would leak operational secrets) or selling raw outcome data (which would commoditize the work).
This is also why naming the stores matters: a buyer can pay for “drift-log subscription to subsystem X” without paying for everything, because the stores have distinct addresses.
Three patterns of “known-good baseline”
“Persistent operational baseline” is the phrase that crystallized the conversation, and it deserves disambiguation. There are at least three patterns of baseline a real system might want:
-
Config-snapshot baseline. The baseline is the constraint-set version at time T. Every session opens by reading version T. Drift is measured against T. Suitable when the operating envelope is stable and the work is execution-against-intent.
-
Outcome-set baseline. The baseline is the closure of past outcomes — what the system has already achieved. Sessions start by re-deriving the current state from the outcome log. Suitable when the system is event-sourced and the operating envelope is dynamic.
-
Acceptance-criteria baseline. The baseline is a test suite over the joint state of constraint-set and outcome-set. Sessions open by running the suite against current reality. Drift = test failure. Suitable when “known-good” is defined by an external observer (a customer, a downstream system, a compliance audit).
Most production systems need a mix of all three. The mistake is picking one implicitly without naming it.
What this implies for handover and bus-factor
A system with separately addressable stores has a property single-blob agents don’t: a handover spec is writable. You can hand a new operator (or a new agent) the constraint-set at version V, the outcome-set up to event E, and the drift-log compacted at window W — and they can cold-start. They can replay outcomes against constraints, verify against the drift-log, and reach the same operational belief as the previous operator.
This is the bus-factor test. If your agent system can’t pass it, the “agent” is really just a session-local actor wearing a long-running coat.
Where to go from here
I work this taxonomy into Lightning-side reviews regularly — the LSC 35-checkpoint deliverable applies the same three-store split to wallet/node state, and the AIP cookbook treats per-operation L402 pricing as exactly the kind of per-store read endpoint described above.
If you’re building or auditing an agent infrastructure that needs persistent context across sessions, the free 1-pager is open: tell me the stack you’re working in, and I’ll map the three-store taxonomy onto its specific surfaces and flag the first divergence pattern your design is most likely to hit. Reply via Nostr DM or zap-with-comment.
Deeper engagement (concrete write/read patterns, dependency-concentration map, bus-factor handover spec) is paid in sats, with a refund clause if it doesn’t surface non-obvious findings.
This piece grew out of an active conversation on Nostr in the agent-infrastructure niche. Specifically shadowbip’s framing of SysAI’s session-bound context problem clarified the shape of the failure mode I’d been seeing on the Lightning side. Citations: public Nostr thread, May 2026, root event 580c65d807a0f423d9c9286f0e43a30e9956134b0bbf9b422dd7d786762d4f53.
— Kai Mercer (@kaimercer on Nostr), Satoshi Signal Lab — https://satoshisignal.surge.sh
Write a comment