Your diagnosis is spot-on. The solution is good but not best-in-class yet.
Your root cause analysis is exactly right — the single-slot SESSION-STATE.md is a classic last-writer-wins race condition, and answering orientation questions from a single state file without cross-referencing is a fidelity problem. The OpenClaw issue #24832 frames this identically: session isolation is by design, but from the human's perspective it's one assistant while from the architecture's perspective they're completely isolated. GitHub
Here's where your three layers stand against what's actually been shipped in production:

Layer 1 (Per-channel state files) — Solid. This is table stakes.
Channel-scoped state files eliminate the cross-contamination vector cleanly. Letta's Conversations API does exactly this at the infrastructure level: each conversation has its own context window and message history, processed independently. Letta Your file-based approach is the manual analog. No complaints here.
One refinement worth stealing from Letta: messages from all conversations are pooled together in a searchable database, so the agent can use conversation_search to recall context from any past conversation, not just the current one. Letta Your per-channel files isolate correctly, but you're losing the ability to search across channels. Consider an append-only daily log that all channels write to (read-only for cross-reference, never used as primary state).

Layer 2 (Orientation cross-check rule) — Necessary but fragile.
Putting a hard rule in AGENTS.md that says "cross-check PROJECTS.md + daily log before answering orientation questions" is the right instinct. But you've identified the core problem in your own research questions: this relies on the LLM remembering to follow the rule, every time, under all conditions. That's an enforcement problem, not a design problem.
Two patterns from the community that make this more robust:
First, Letta's Context Repositories use git-based versioning for memory, with background "sleep-time" processes that periodically review recent conversation history and persist important information with commit messages. Letta The key insight is that memory consolidation happens asynchronously — a background agent reviews what happened and writes structured state, rather than the primary agent being responsible for its own state hygiene. You could implement this as a cron-triggered consolidation step in your OpenClaw heartbeat system: every N minutes, a lightweight agent reads the daily logs across all channels and updates PROJECTS.md. The primary agent is never responsible for its own orientation — it's always reading from a consolidated view maintained by something else.
Second, the orientation question itself should trigger a tool call, not a prompt-reading behavior. If "where were we?" always invokes a get_session_context(channel) function that reads the channel state file + PROJECTS.md + today's log and returns a structured summary, you've removed the LLM's ability to skip the cross-check. The rule moves from "remember to do this" to "this is the only way to get the answer."

Layer 3 (shared/ACTIVE.md bulletin board) — This is where the design gets interesting and where you have the most to gain from the prior art.
Your shared context file maps directly to Letta's shared memory blocks, where you can attach the same block ID to multiple agents, and when one agent writes to it, the other agent can read the updates immediately. Letta Your manual file-based approach is the right primitive.
But here's the critical gap you flagged in your research questions: the prompt cache cost.
The OpenClaw issue notes that any change to a bootstrap file like MEMORY.md busts the prompt cache for all sessions — invalidating the ~90% token savings from Anthropic/OpenAI prompt caching. With multiple active sessions, frequent writes make this prohibitively expensive. GitHub
This is a real and serious cost concern, and OpenClaw issue #19534 confirmed it in practice: dynamic content in system prompts changes every turn, invalidating the cache and resulting in 10x higher costs. GitHub Anthropic's cache follows the hierarchy tools → system → messages, and changes at each level invalidate that level and all subsequent levels. Claude API Docs
The #24832 issue proposes the right architectural fix: the shared context block should have its own cache breakpoint in the assembled prompt, placed after the static content, so when it changes only the block re-tokenizes, not the entire system prompt. GitHub This is the design you should target. Concretely:

Static system prompt → cache breakpoint 1
Shared ACTIVE.md content → cache breakpoint 2 (placed last in system prompt)
Messages

When ACTIVE.md changes, you invalidate only the small shared block + messages, not the ~100k+ tokens of static system prompt. Anthropic supports up to 4 cache breakpoints Claude API Docs, so you have room for this architecture.
Practical controls for your ACTIVE.md:

Hard cap at 20 lines / ~500 tokens (you already proposed this — correct instinct)
Write frequency cap: no more than once per 5 minutes (aligns with Anthropic's cache TTL)
Structured format: each entry has a channel, timestamp, and one-line status. No prose.


Now for your deeper research questions:
Is B+C+Shared the right pattern? Yes, with the refinements above. Letta's Conversations API — shipped January 2026 — does exactly what you're building: multiple parallel conversations with separate context windows but shared memory blocks and searchable message history. Letta You're manually implementing their architecture. The question is whether to keep building on OpenClaw's file-based primitives or consider whether Letta's API could serve as the memory/state layer underneath your OpenClaw sessions.
Letta's newest system, Context Repositories, goes even further — using git worktrees so multiple subagents can process and write to memory concurrently, then merge their changes through git-based conflict resolution. Letta This directly addresses the concurrent-write problem that a shared ACTIVE.md has. If two channels write simultaneously, you get a merge conflict with git — with flat files, last writer wins silently. Consider making your shared state file a git-tracked artifact, even if it's just git add && git commit on each write. Then you have a WAL for free — the git log is your write-ahead log.
Is Discord the right surface? Discord is fine for the interaction layer but has real limitations for state management. The community pattern for OpenClaw concurrent sessions is "one agent + multiple dedicated chat lanes" where each lane becomes its own isolated session. Openclaw-setup Discord channels map naturally to this. But Discord threads get buried, search is weak, and you can't programmatically manage channel state. OpenClaw's Channel Layer adapts different platform formats to a common internal structure DEV Community — so the platform doesn't actually matter much at the architecture level. Telegram groups offer slightly better bot APIs and inline keyboards for state management. Slack has better threading and enterprise search. But honestly, for your use case (private server, one human, N work streams), Discord is adequate. The real state management happens in your file system, not in the chat platform.
WAL discipline for AI agents: This is the hardest problem in your list. The WAL pattern is well-understood in database systems — log the intent to change first, apply the change later, and on recovery replay the log to restore consistent state. Architecture Weekly But LLM agents don't have transactions, don't have commit semantics, and can fail silently mid-thought.
The practical adaptation: make state writes a tool, not a behavior. If the agent has a write_session_state(channel, state) tool, you can wrap it with pre/post hooks at the OpenClaw level: the hook writes to an append-only log before applying the state file update. If the agent crashes mid-run, you have the log. If the state file is corrupted, you replay the log. This is exactly the pattern Letta uses with MemFS — agents edit memory through bash tools, and the memory is versioned using git, enabling easy rollbacks. Letta
The harder problem is ensuring the agent calls the tool before responding. OpenClaw's before_prompt_build hook (which you're already using for temporal awareness) is the right enforcement point. You could inject a hard rule: "If this is the first message in a session or follows a /new, you MUST call get_session_context before any other action." Make it a tool rule, not a prompt instruction.
Session routing by intent: Yes, this is possible and worth doing, but as a fallback, not a primary mechanism. If someone messages #personal-projects about a bookkeeping question, the agent could detect "bookkeeping" in the message content and load SESSION-STATE-bookkeeping.md instead. But intent classification is fuzzy and the failure mode (loading the wrong state confidently) is exactly the problem you're trying to solve. Better to use channel as the primary routing key and intent detection as a secondary signal that triggers a clarifying question: "This looks like it might be about the bookkeeping reconciliation — want me to pull that context?"

Bottom line: Your B+C+Shared architecture is in the right family of solutions. The production system that most closely matches what you're building is Letta's Conversations API + shared memory blocks, and they've shipped answers to several problems you're still solving manually. The three upgrades that would make your design best-in-class:

Make the orientation cross-check a mandatory tool call, not a prompt rule
Give ACTIVE.md its own cache breakpoint, placed after all static content
Git-track your state files for free WAL semantics and concurrent-write safety

You're not far off. The architecture is sound — the gap is in enforcement and cache management, not in the conceptual model.