System Architecture Explainer
A deep technical walkthrough of a containerized AI assistant. Not the what — the why and how of every design decision, with exercises you can run locally.
01 — The Problem
The core question: how do you give an AI agent real capabilities — running code, browsing the web, reading your files — without trusting it with your entire system?
Most AI assistants (ChatGPT, Claude.ai) run in a browser sandbox. They can produce text but cannot act on your computer. When systems try to bridge this gap — running code, accessing files, sending emails — security becomes the central problem.
OpenClaw (the predecessor project) tried to solve security with application-level controls: allowlists, permission checks, pairing codes, role-based access. This produced 52+ modules, 45+ dependencies, and 8 config files. The result? A system too complex to audit, where security depends on no developer ever forgetting a permission check in any code path.
NanoClaw uses OS-level isolation instead. Each agent runs in a Docker container — a lightweight Linux VM. The agent can execute arbitrary code inside its container freely. Security comes from what's mounted into the container, not from what the code is allowed to do.
The agent runs in your process. You add checks: "don't access /etc/passwd", "don't run rm -rf". Every new capability needs a new permission rule. One missed check = full compromise.
The agent runs in a container. It physically cannot see files you didn't mount. No permission checks needed — the filesystem boundary is the security boundary.
02 — Design Philosophy
These aren't abstract values — each one directly explains a specific code decision. Understanding them lets you predict how the system handles any scenario.
One process. Seventeen source files. No microservices, no message queues, no abstraction layers. The entire codebase is designed to fit into a single Claude Code context window (~35K tokens). If you can't understand it, you can't trust it — and you shouldn't run an AI agent you can't trust.
Agents run in containers. Bash access is safe because commands execute in the container, not on your host. The mount allowlist lives outside the project root (~/.config/nanoclaw/) so agents can never modify their own security config. The host stores credentials; the container receives them via stdin and deletes them from disk immediately.
No multi-tenancy, no user management, no auth system. This isn't a platform — it's working software for one person. The simplification this enables is enormous: a single SQLite database, global mutable state that's safe because there's one process, no need for distributed consensus.
No config files to learn (trigger word is the only config). Want different behavior? Change the code. This works because the codebase is small enough to modify safely. A 50-module system needs config files because developers can't safely change code they don't understand. A 17-file system doesn't.
No installation wizard — Claude Code guides setup. No monitoring dashboard — ask Claude what's happening. No debugging tools — describe the problem, Claude reads the logs. The codebase assumes you have an AI coding agent as your primary interface.
Don't add Telegram to the codebase. Add a skill that teaches Claude Code how to transform a NanoClaw installation to use Telegram. Users fork → run skill → get clean code that does exactly what they need. No bloated system supporting every channel simultaneously.
NanoClaw runs Claude Code directly via the Agent SDK — not a custom wrapper, not a simplified API call. The "harness" (the infrastructure that runs an LLM) determines how smart the model appears. Claude Code includes tool use, file operations, web access, browser automation, agent teams. Running it inside a container gives it maximum capability with maximum safety.
03 — The Big Picture
The system has three layers: a messaging channel (WhatsApp), a host orchestrator (Node.js), and sandboxed agents (Docker containers running Claude Agent SDK). They communicate through SQLite and filesystem-based IPC.
You might notice: everything polls. The message loop polls SQLite every 2 seconds. The IPC watcher polls the filesystem every second. The scheduler polls every 60 seconds. This feels wasteful — why not use SQLite change notifications or fs.watch?
Three reasons: (1) reliability — fs.watch is notoriously inconsistent across platforms, (2) simplicity — a polling loop is four lines of code with no edge cases, (3) adequacy — for a single-user system processing a few messages per minute, 2-second latency is imperceptible. The system optimizes for correctness and readability, not throughput.
| File | Lines | Purpose |
|---|---|---|
src/index.ts | 488 | Orchestrator: message loop, state, agent dispatch, startup/shutdown |
src/container-runner.ts | 646 | Docker spawn, volume mounts, streaming output parsing |
src/db.ts | 636 | SQLite schema, CRUD, migrations, all parameterized queries |
src/mount-security.ts | 420 | Allowlist validation, symlink resolution, blocked patterns |
src/ipc.ts | 380 | Filesystem IPC watcher, authorization, task processing |
src/channels/whatsapp.ts | 330 | WhatsApp connection, QR auth, message send/receive |
src/group-queue.ts | 340 | Per-group message queue with global concurrency limit |
src/task-scheduler.ts | 223 | Cron/interval/once task execution |
src/router.ts | 45 | Message formatting, outbound routing |
src/config.ts | 70 | Constants: trigger pattern, paths, timeouts |
| Container management | ||
src/container-runtime.ts | 77 | Docker abstraction: runtime binary, mount args, cleanup orphans |
| Supporting modules | ||
src/types.ts | 105 | Interface contracts: Channel, RegisteredGroup, ContainerOutput |
src/env.ts | 41 | Secret reader: loads .env without polluting process.env |
src/whatsapp-auth.ts | 158 | QR auth, credential storage, reconnect backoff |
src/logger.ts | 17 | Pino logger configuration |
| In-container (agent-runner) | ||
agent-runner/src/index.ts | 588 | Query loop, MessageStream, PreCompact hook, session management |
agent-runner/src/ipc-mcp-stdio.ts | 280 | MCP server: send_message, schedule_task, list/pause/cancel tasks |
04 — The Critical Path
When you send @Andy what's the weather? in WhatsApp, here is exactly what happens, step by step. This is the single most important flow to understand.
The baileys library maintains a persistent WebSocket connection to WhatsApp's servers. When a message arrives, the messages.upsert event fires. The channel extracts text content, sender name, timestamp, and chat JID (WhatsApp's unique identifier for each chat).
The onMessage callback calls storeMessage() which inserts into the messages table. Parameterized query — no SQL injection possible. The message is now durable. If the process crashes here, the message survives.
Every 2 seconds, startMessageLoop() calls getNewMessages() with all registered group JIDs and the lastTimestamp cursor. Only messages newer than the cursor and not from the bot itself are returned. Messages from unregistered groups are silently ignored.
For non-main groups, the system checks if any message matches the trigger pattern (/^@Andy\b/i). If no trigger is found, the messages accumulate silently — they'll be included as context when a trigger does arrive. The main group (your self-chat) skips this check entirely: every message triggers the agent.
The GroupQueue tracks active containers per group. If one exists and is idle-waiting for input, the message is piped to it via IPC file (data/ipc/{group}/input/{timestamp}.json). If no container is running, a new one is spawned. Global concurrency is capped (default: 5 containers).
formatMessages() wraps messages in XML with sender names and timestamps escaped. This becomes the prompt to Claude:
runContainerAgent() builds volume mounts, constructs the Docker command, and spawns the container. The input JSON (prompt + secrets) is written to the container's stdin. Secrets are deleted from memory immediately after writing. The container mounts only: the group folder, the IPC directory, session data, and any explicitly allowed extra directories.
The agent runner reads stdin, authenticates with the Anthropic API using credentials from the input, and calls query() with the full prompt. Claude Code processes the query — it may search the web, run Bash commands, read/write files, all inside the container sandbox. The agent has bypassPermissions mode because the container IS the permission boundary.
The agent runner wraps each result in ---NANOCLAW_OUTPUT_START--- / ---NANOCLAW_OUTPUT_END--- markers on stdout. The host process parses these in real-time from the container's stdout stream. This handles mixed output (debug logs + results + stderr) robustly.
Each parsed result is stripped of <internal>...</internal> tags (agent's private reasoning) and sent to the chat via channel.sendMessage(). A promise chain (outputChain = outputChain.then(() => onOutput(parsed))) serializes concurrent streaming outputs so messages arrive in order even though stdout.on('data') fires synchronously while sendMessage is async.
lastTimestamp (latest message seen by the loop) and lastAgentTimestamp[chatJid] (latest message processed by an agent for each group). This two-cursor design handles the case where messages arrive faster than agents process them — non-trigger messages accumulate as context until a trigger arrives.
05 — The Security Boundary
The container is the single most important architectural element. Every security property flows from what does and doesn't get mounted into it.
The buildVolumeMounts() function in container-runner.ts constructs the mount list. The logic branches on whether this is the main group or a regular group:
Gets the entire project root (rw) + its own group folder. Can see all code, all group folders, all config. This is the "trusted" context.
Gets only its own folder (rw) + global memory (ro). Cannot see other groups, cannot see project code, cannot see credentials.
API keys take a careful path from disk to the agent:
.env at project root. Never loaded into process.env (to avoid leaking to child processes).readSecrets() reads only CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_API_KEY from .env./tmp/input.json, which is read and deleted on first line of agent runner code.query() via the env parameter. A PreToolUse hook prepends unset commands to Bash invocations to scrub them from shell environments./proc/self/environ inside the container. The project acknowledges this openly. Full credential isolation would require the Claude Agent SDK to support proxy-based auth — separating the auth layer from the execution layer. This is an unsolved problem across all agent frameworks.
| Event | What happens |
|---|---|
| Spawn | Docker run with -i --rm. Named nanoclaw-{group}-{timestamp}. Stdio piped. |
| Input | JSON written to stdin, then stdin closed (for initial message). |
| Follow-up | New messages arrive via IPC files in /workspace/ipc/input/, polled every 500ms. |
| Idle | After responding, container waits for IPC input. If no messages for 30 min, a _close sentinel is written. |
| Close | Agent detects _close, exits cleanly. Container is removed (--rm). |
| Timeout | If container exceeds hard timeout (30 min default), host sends docker stop, then SIGKILL. |
| Crash recovery | On restart, cleanupOrphans() stops any stale nanoclaw-* containers. |
06 — Inter-Process Communication
Containers can't call host functions directly. They communicate by writing JSON files to a shared directory. The host polls this directory and processes the files. This is the simplest possible IPC mechanism — and that simplicity is the point.
All of those require networking between host and container, authentication, serialization libraries, and error handling for network failures. Filesystem IPC needs: (1) write a file, (2) read a file, (3) delete a file. It's debuggable with ls and cat. It's atomic on rename. And the mount boundary means one group's IPC directory is invisible to other groups.
This is the cleverest security pattern in NanoClaw. The host determines a message's source group by which IPC directory it came from — not by reading a groupFolder field from the JSON. Since each container can only write to its own mounted IPC directory, a compromised agent in Group A physically cannot write files that appear to come from Group B.
Both the MCP server and the host use the write-to-temp-then-rename pattern to prevent reading half-written files:
07 — Threat Model
NanoClaw's security model has four layers. Understanding them helps you reason about what an attacker (or a misbehaving agent) can and cannot do.
Docker containers provide process isolation (separate PID namespace), filesystem isolation (only mounted paths visible), and user isolation (runs as uid 1000, not root). This is the primary security boundary. An agent cannot see files you didn't mount, cannot kill host processes, and cannot access host services directly.
Additional directories (beyond the group folder) require explicit allowlisting in ~/.config/nanoclaw/mount-allowlist.json. This file is stored outside the project root and is never mounted into any container, making it tamper-proof. The validation resolves symlinks before checking paths (preventing traversal attacks) and blocks a hardcoded list of sensitive patterns (.ssh, .gnupg, .aws, .env, etc.).
Group identity derived from filesystem path, not self-reported. Non-main groups can only send messages to their own chat and manage their own tasks. The main group has full access — admin privileges.
API keys are passed through carefully and scrubbed from Bash environments, but remain accessible via /proc/self/environ inside the container. Combined with unrestricted container network access, this means a prompt injection could exfiltrate API keys.
| Threat | Mitigated? | How |
|---|---|---|
| Agent reads host files | Yes | Container mount boundary |
| Agent kills host processes | Yes | PID namespace isolation |
| Agent reads other group's data | Yes | Per-group mount isolation |
| Agent impersonates another group | Yes | Identity by directory path |
| Agent modifies security config | Yes | Allowlist outside project root |
| Agent sends messages as another group | Yes | IPC authorization check |
| Agent reads API keys | No | /proc/self/environ readable |
| Agent exfiltrates data via network | No | No network restriction on containers |
| Prompt injection in messages | No | XML escaping prevents markup, not semantic injection |
| Agent floods WhatsApp (DoS) | No | No IPC rate limiting |
08 — Inside the Container
The host orchestrator is well-understood — it polls, dispatches, and routes. But the most architecturally sophisticated code lives inside the container, in agent-runner/src/index.ts (588 lines). This is where the Claude Agent SDK actually runs.
The agent runner doesn't make a single SDK call and exit. It runs a query loop: call query(), wait for IPC follow-up messages, then call query() again with resumeAt pointing to the last assistant message. This keeps the conversation going across multiple user messages without spawning a new container each time.
Here's the problem: the Claude Agent SDK's query() accepts a prompt parameter that can be an AsyncIterable. If you pass a single string, the SDK treats it as a one-shot query. But NanoClaw needs to inject follow-up messages while a query is still running — for example, when a user sends a correction while the agent is mid-search.
The MessageStream class solves this with a push-callback queue pattern:
The SDK's for await (const message of query({ prompt: stream ... })) consumes this iterable. When the queue is empty, the consumer blocks on the Promise. When push() is called (from the IPC poller), the Promise resolves and the consumer yields the next message. When end() is called, the generator returns and the query completes.
for await loops). The resolve-callback queue pattern — store a resolver, call it when data arrives — is the minimal implementation. This same pattern works in Python with asyncio.Queue and async for.
Claude Code compacts conversation history when it gets too long, discarding older messages to stay within context limits. NanoClaw hooks into this with a PreCompact callback that archives the full transcript to /workspace/group/conversations/ as a markdown file before compaction happens.
The env.ts module on the host deliberately does NOT use dotenv or load secrets into process.env. This is intentional: any child process inherits process.env, so loading secrets there would leak them to every Bash command. Instead, secrets flow through a narrow pipeline: .env file → readEnvFile() → stdin JSON → sdkEnv object → SDK query({ env: sdkEnv }). A PreToolUse hook injects unset ANTHROPIC_API_KEY CLAUDE_CODE_OAUTH_TOKEN before every Bash command.
require('dotenv').config() which loads everything into process.env globally. This is fine for web servers but dangerous for agent systems that spawn subprocesses. NanoClaw's custom readEnvFile() returns a plain object — the caller decides what to do with each value. A small design choice with large security implications.
09 — Error Handling
A system that spawns containers, manages WhatsApp connections, and coordinates async tasks has many failure modes. NanoClaw handles them with three patterns worth studying.
When a container fails (crash, timeout, SDK error), the GroupQueue retries with exponential backoff:
This is the subtlest correctness pattern in the codebase. When an agent fails, should the system retry those messages? It depends on whether the user already saw output:
The key insight: an outputSentToUser boolean tracks whether any response was already delivered. If yes — even if the agent later crashes — the cursor advances to prevent sending the same response twice. The system tolerates a partially-answered query over a duplicated one.
If the process crashes between polling messages and starting an agent, those messages would be "seen" (cursor advanced) but never processed. On startup, recoverPendingMessages() re-scans every registered group for unprocessed messages:
On SIGTERM or SIGINT, the system drains active containers (10-second grace period via queue.shutdown(10000)), disconnects all channels cleanly, then exits. Active containers receive _close sentinels so agents can archive conversations before the container is stopped.
10 — Autonomous Execution
Scheduled tasks are what make this more than a chat interface. A task is a prompt that runs on a schedule — cron expression, interval, or one-time — without any user trigger. The task agent has full access to all tools and can message the user with results.
Tasks can run in two modes, and the distinction matters:
Task resumes the group's existing session. Has full conversation history, knows what was discussed. Use for: "follow up on my request", "continue where we left off".
Task runs in a fresh session with no history. All context must be in the prompt itself. Use for: "check the weather", "generate a report", "scan for updates".
Unlike message-triggered containers that wait for follow-up IPC messages, task containers close promptly. A 10-second delay after the first result allows any final MCP calls to complete, then a _close sentinel is written. This prevents task containers from idling for the full 30-minute timeout.
11 — Extensibility Model
Most projects add features by writing code that handles every case and configuration flags to pick between them. NanoClaw takes a radically different approach: instead of adding a feature flag for Telegram, you contribute a skill that teaches the AI code editor how to transform NanoClaw into a Telegram-based system.
A skill is a SKILL.md file in .claude/skills/{name}/. When the user runs /add-telegram in Claude Code, the AI reads the skill file and follows its instructions to modify the codebase — creating new files, editing existing ones, updating dependencies.
The result is that every NanoClaw installation is a clean fork that does exactly what the user needs. There's no dead code for channels you don't use, no config for features you didn't enable, no runtime cost for integration points you don't care about. The codebase stays small because features aren't accumulated — they're applied selectively.
12 — Hands-On Learning
These exercises are designed to run locally without deploying NanoClaw. They isolate individual concepts from the codebase so you can understand them independently. No WhatsApp account or Anthropic API key needed.
Replicate NanoClaw's IPC pattern in Python. Create two scripts: a producer that writes JSON files to a directory, and a consumer that polls, reads, and deletes them. Add atomic writes (write to .tmp, then rename).
Run both in separate terminals. Type messages in the producer, watch them appear in the consumer. Then try: what happens if you rename the ipc/messages directory while both are running?
NanoClaw's biggest stdout challenge: the container emits debug logs, stderr, AND structured JSON results on the same stream. Build a parser that extracts JSON between sentinel markers from a noisy stream.
Challenge: modify the parser to work incrementally — process chunks as they arrive (like reading from a pipe), handling the case where a marker is split across chunks.
Implement the core of NanoClaw's mount security: validate that a requested path is under an allowed root, doesn't match blocked patterns, and resolve symlinks before checking.
Advanced: create a symlink from ~/projects/backdoor pointing to ~/.ssh and verify it gets caught by the symlink resolution step.
NanoClaw limits concurrent containers (default: 5). Build a queue that processes tasks per-group (FIFO within each group, parallel across groups, with a global concurrency limit).
Observe: "family" tasks never run concurrently (per-group lock), but "family" and "work" do run in parallel (up to the global limit). What happens if you set max_concurrent=1?
Replicate the core agent pattern using a local model (no API costs). Use Ollama or llama.cpp to run a small model, then build a loop that receives a prompt, sends it to the model, and executes any tool calls.
This is the same pattern as NanoClaw's agent runner: prompt → model → check for tool use → execute → feed result back → repeat. The difference is scale: NanoClaw uses Claude Code (with 50+ built-in tools) in a container, while this uses a 1.5B model with 2 hand-rolled tools. The architecture is identical.
Verify that Docker volume mounts actually prevent access to non-mounted directories. This is what makes NanoClaw's security model work.
This is the core guarantee NanoClaw relies on. The container physically cannot see /tmp/nanoclaw-test/group-work/ because it was never mounted. No permission checks, no ACLs — the path simply does not exist inside the container's filesystem.
The trickiest state management problem in the codebase: two cursors track different things. lastTimestamp tracks what the polling loop has seen. lastAgentTimestamp[group] tracks what each agent has processed. Build a simulation.
Key insight: when @Andy is triggered at t=3, the agent sees messages at t=1, t=2, AND t=3 (all accumulated context). The "work" group trigger at t=4 only sees its own message. Run it and trace the cursors.
Replicate NanoClaw's MessageStream pattern: a producer pushes items, a consumer reads them with async for. The consumer blocks when the queue is empty and unblocks when a new item is pushed.
This is exactly how NanoClaw injects follow-up WhatsApp messages into an active Claude query. The SDK consumes the async iterable; the IPC poller pushes messages into it. The consumer never knows whether the next message will arrive in 1 second or 10 minutes — it just awaits.
Further Reading
src/index.ts top-to-bottom (488 lines, the orchestrator). Then container/agent-runner/src/index.ts (588 lines, the agent loop). These two files contain 80% of the system's logic. Everything else is infrastructure supporting these two files.