System Architecture Explainer

NanoClaw: How to Build a Personal AI Agent System

A deep technical walkthrough of a containerized AI assistant. Not the what — the why and how of every design decision, with exercises you can run locally.

Core source

4,844

Source files

Dependencies

Commits

154

Age (days)

01 — The Problem

Why does NanoClaw exist?

The core question: how do you give an AI agent real capabilities — running code, browsing the web, reading your files — without trusting it with your entire system?

Most AI assistants (ChatGPT, Claude.ai) run in a browser sandbox. They can produce text but cannot act on your computer. When systems try to bridge this gap — running code, accessing files, sending emails — security becomes the central problem.

The approach that fails

OpenClaw (the predecessor project) tried to solve security with application-level controls: allowlists, permission checks, pairing codes, role-based access. This produced 52+ modules, 45+ dependencies, and 8 config files. The result? A system too complex to audit, where security depends on no developer ever forgetting a permission check in any code path.

The approach that works

NanoClaw uses OS-level isolation instead. Each agent runs in a Docker container — a lightweight Linux VM. The agent can execute arbitrary code inside its container freely. Security comes from what's mounted into the container, not from what the code is allowed to do.

Application-level security

The agent runs in your process. You add checks: "don't access /etc/passwd", "don't run rm -rf". Every new capability needs a new permission rule. One missed check = full compromise.

OS-level isolation

The agent runs in a container. It physically cannot see files you didn't mount. No permission checks needed — the filesystem boundary is the security boundary.

Design insight: "Don't make the agent promise not to look at your secrets. Instead, put the agent in a room where your secrets don't exist." This is the fundamental difference between permission-based and isolation-based security.

02 — Design Philosophy

Seven principles that shaped every decision

These aren't abstract values — each one directly explains a specific code decision. Understanding them lets you predict how the system handles any scenario.

1. Small enough to understand

One process. Seventeen source files. No microservices, no message queues, no abstraction layers. The entire codebase is designed to fit into a single Claude Code context window (~35K tokens). If you can't understand it, you can't trust it — and you shouldn't run an AI agent you can't trust.

2. Security through isolation, not permissions

Agents run in containers. Bash access is safe because commands execute in the container, not on your host. The mount allowlist lives outside the project root (~/.config/nanoclaw/) so agents can never modify their own security config. The host stores credentials; the container receives them via stdin and deletes them from disk immediately.

3. Built for one user

No multi-tenancy, no user management, no auth system. This isn't a platform — it's working software for one person. The simplification this enables is enormous: a single SQLite database, global mutable state that's safe because there's one process, no need for distributed consensus.

4. Customization = code changes

No config files to learn (trigger word is the only config). Want different behavior? Change the code. This works because the codebase is small enough to modify safely. A 50-module system needs config files because developers can't safely change code they don't understand. A 17-file system doesn't.

5. AI-native development

No installation wizard — Claude Code guides setup. No monitoring dashboard — ask Claude what's happening. No debugging tools — describe the problem, Claude reads the logs. The codebase assumes you have an AI coding agent as your primary interface.

6. Skills over features

Don't add Telegram to the codebase. Add a skill that teaches Claude Code how to transform a NanoClaw installation to use Telegram. Users fork → run skill → get clean code that does exactly what they need. No bloated system supporting every channel simultaneously.

7. Best harness, best model

NanoClaw runs Claude Code directly via the Agent SDK — not a custom wrapper, not a simplified API call. The "harness" (the infrastructure that runs an LLM) determines how smart the model appears. Claude Code includes tool use, file operations, web access, browser automation, agent teams. Running it inside a container gives it maximum capability with maximum safety.

03 — The Big Picture

Architecture: one process, containers, filesystem IPC

The system has three layers: a messaging channel (WhatsApp), a host orchestrator (Node.js), and sandboxed agents (Docker containers running Claude Agent SDK). They communicate through SQLite and filesystem-based IPC.

USER'S PHONE HOST MACHINE CONTAINERS ┌──────────┐ ┌────────────────────────┐ ┌─────────────────┐ │ WhatsApp │ ═══WebSocket═══> │ WhatsApp Channel │ │ Agent Container │ │ App │ <═══════════════ │ (baileys library) │ ┌───> │ Claude SDK │ └──────────┘ │ │ │ │ │ + Bash │ │ ▼ │ │ │ + Browser │ │ SQLite Database │ │ │ + WebSearch │ │ (messages, groups, │ │ │ + MCP Tools │ │ tasks, sessions) │ │ └────────┬────────┘ │ │ │ │ │ │ ▼ │ │ │ writes JSON files │ Message Loop │ │ ▼ │ (polls every 2s) │ ───┘ ┌─────────────────┐ │ │ │ │ IPC Directory │ │ Scheduler Loop │ <──────── │ data/ipc/{group} │ │ (polls every 60s) │ │ /messages/*.json│ │ │ │ │ /tasks/*.json │ │ IPC Watcher │ │ /input/*.json │ │ (polls every 1s) │ └─────────────────┘ └────────────────────────┘

Why polling instead of events?

You might notice: everything polls. The message loop polls SQLite every 2 seconds. The IPC watcher polls the filesystem every second. The scheduler polls every 60 seconds. This feels wasteful — why not use SQLite change notifications or fs.watch?

Three reasons: (1) reliability — fs.watch is notoriously inconsistent across platforms, (2) simplicity — a polling loop is four lines of code with no edge cases, (3) adequacy — for a single-user system processing a few messages per minute, 2-second latency is imperceptible. The system optimizes for correctness and readability, not throughput.

Key source files

File	Lines	Purpose
`src/index.ts`	488	Orchestrator: message loop, state, agent dispatch, startup/shutdown
`src/container-runner.ts`	646	Docker spawn, volume mounts, streaming output parsing
`src/db.ts`	636	SQLite schema, CRUD, migrations, all parameterized queries
`src/mount-security.ts`	420	Allowlist validation, symlink resolution, blocked patterns
`src/ipc.ts`	380	Filesystem IPC watcher, authorization, task processing
`src/channels/whatsapp.ts`	330	WhatsApp connection, QR auth, message send/receive
`src/group-queue.ts`	340	Per-group message queue with global concurrency limit
`src/task-scheduler.ts`	223	Cron/interval/once task execution
`src/router.ts`	45	Message formatting, outbound routing
`src/config.ts`	70	Constants: trigger pattern, paths, timeouts
Container management
`src/container-runtime.ts`	77	Docker abstraction: runtime binary, mount args, cleanup orphans
Supporting modules
`src/types.ts`	105	Interface contracts: Channel, RegisteredGroup, ContainerOutput
`src/env.ts`	41	Secret reader: loads .env without polluting process.env
`src/whatsapp-auth.ts`	158	QR auth, credential storage, reconnect backoff
`src/logger.ts`	17	Pino logger configuration
In-container (agent-runner)
`agent-runner/src/index.ts`	588	Query loop, MessageStream, PreCompact hook, session management
`agent-runner/src/ipc-mcp-stdio.ts`	280	MCP server: send_message, schedule_task, list/pause/cancel tasks

Notice the distribution: The two largest files on the host (container-runner at 646 lines, db at 636 lines) handle the two hardest problems — process isolation and data persistence. The agent-runner at 588 lines handles the most sophisticated problem — multi-turn SDK interaction. The message formatter is 45 lines. Complexity lives where the actual hard problems are, not spread uniformly.

04 — The Critical Path

Message flow: from phone to AI and back

When you send @Andy what's the weather? in WhatsApp, here is exactly what happens, step by step. This is the single most important flow to understand.

WhatsApp delivers the message via WebSocket

The baileys library maintains a persistent WebSocket connection to WhatsApp's servers. When a message arrives, the messages.upsert event fires. The channel extracts text content, sender name, timestamp, and chat JID (WhatsApp's unique identifier for each chat).

Message stored in SQLite

The onMessage callback calls storeMessage() which inserts into the messages table. Parameterized query — no SQL injection possible. The message is now durable. If the process crashes here, the message survives.

Message loop detects new messages

Every 2 seconds, startMessageLoop() calls getNewMessages() with all registered group JIDs and the lastTimestamp cursor. Only messages newer than the cursor and not from the bot itself are returned. Messages from unregistered groups are silently ignored.

Trigger check

For non-main groups, the system checks if any message matches the trigger pattern (/^@Andy\b/i). If no trigger is found, the messages accumulate silently — they'll be included as context when a trigger does arrive. The main group (your self-chat) skips this check entirely: every message triggers the agent.

Check if a container is already running for this group

The GroupQueue tracks active containers per group. If one exists and is idle-waiting for input, the message is piped to it via IPC file (data/ipc/{group}/input/{timestamp}.json). If no container is running, a new one is spawned. Global concurrency is capped (default: 5 containers).

Messages formatted as XML

formatMessages() wraps messages in XML with sender names and timestamps escaped. This becomes the prompt to Claude:

// Output of formatMessages() <messages> <message sender="Sumit" time="2026-02-22T10:30:00Z">@Andy what's the weather in Bangalore?</message> </messages>

Container spawned with Docker

runContainerAgent() builds volume mounts, constructs the Docker command, and spawns the container. The input JSON (prompt + secrets) is written to the container's stdin. Secrets are deleted from memory immediately after writing. The container mounts only: the group folder, the IPC directory, session data, and any explicitly allowed extra directories.

Inside the container: Claude Agent SDK runs

The agent runner reads stdin, authenticates with the Anthropic API using credentials from the input, and calls query() with the full prompt. Claude Code processes the query — it may search the web, run Bash commands, read/write files, all inside the container sandbox. The agent has bypassPermissions mode because the container IS the permission boundary.

Output streamed via sentinel markers

The agent runner wraps each result in ---NANOCLAW_OUTPUT_START--- / ---NANOCLAW_OUTPUT_END--- markers on stdout. The host process parses these in real-time from the container's stdout stream. This handles mixed output (debug logs + results + stderr) robustly.

Response sent to WhatsApp

Each parsed result is stripped of <internal>...</internal> tags (agent's private reasoning) and sent to the chat via channel.sendMessage(). A promise chain (outputChain = outputChain.then(() => onOutput(parsed))) serializes concurrent streaming outputs so messages arrive in order even though stdout.on('data') fires synchronously while sendMessage is async.

Cursor management matters: The system maintains two cursors: lastTimestamp (latest message seen by the loop) and lastAgentTimestamp[chatJid] (latest message processed by an agent for each group). This two-cursor design handles the case where messages arrive faster than agents process them — non-trigger messages accumulate as context until a trigger arrives.

05 — The Security Boundary

Container isolation: how the sandbox works

The container is the single most important architectural element. Every security property flows from what does and doesn't get mounted into it.

What's inside a container

CONTAINER FILESYSTEM HOST PATH /workspace/group/ (rw) <──────── groups/{name}/ /workspace/global/ (ro) <──────── groups/global/ (non-main only) /workspace/ipc/ (rw) <──────── data/ipc/{name}/ /workspace/extra/* (varies) <──────── mount-allowlist paths /workspace/project/ (rw) <──────── project root (main only) /home/node/.claude/ (rw) <──────── data/sessions/{name}/.claude/ /app/src/ (ro) <──────── container/agent-runner/src/ NOT INSIDE THE CONTAINER (EVER): store/auth/ WhatsApp credentials — host only ~/.config/nanoclaw/ Mount allowlist — cannot be tampered with .env API keys — passed via stdin, never mounted Other groups' folders Complete inter-group isolation

How mounts are built

The buildVolumeMounts() function in container-runner.ts constructs the mount list. The logic branches on whether this is the main group or a regular group:

Main group (admin)

Gets the entire project root (rw) + its own group folder. Can see all code, all group folders, all config. This is the "trusted" context.

Regular group (untrusted)

Gets only its own folder (rw) + global memory (ro). Cannot see other groups, cannot see project code, cannot see credentials.

The credential pipeline

API keys take a careful path from disk to the agent:

At rest: Credentials live in .env at project root. Never loaded into process.env (to avoid leaking to child processes).
On spawn: readSecrets() reads only CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_API_KEY from .env.
Transit: Secrets are written to container stdin as JSON, then deleted from the input object immediately.
In container: The entrypoint writes stdin to /tmp/input.json, which is read and deleted on first line of agent runner code.
To SDK: Secrets are passed to query() via the env parameter. A PreToolUse hook prepends unset commands to Bash invocations to scrub them from shell environments.

Known limitation: Despite the scrubbing, the agent can read credentials from /proc/self/environ inside the container. The project acknowledges this openly. Full credential isolation would require the Claude Agent SDK to support proxy-based auth — separating the auth layer from the execution layer. This is an unsolved problem across all agent frameworks.

Container lifecycle

Event	What happens
Spawn	Docker run with `-i --rm`. Named `nanoclaw-{group}-{timestamp}`. Stdio piped.
Input	JSON written to stdin, then stdin closed (for initial message).
Follow-up	New messages arrive via IPC files in `/workspace/ipc/input/`, polled every 500ms.
Idle	After responding, container waits for IPC input. If no messages for 30 min, a `_close` sentinel is written.
Close	Agent detects `_close`, exits cleanly. Container is removed (`--rm`).
Timeout	If container exceeds hard timeout (30 min default), host sends `docker stop`, then SIGKILL.
Crash recovery	On restart, `cleanupOrphans()` stops any stale `nanoclaw-*` containers.

06 — Inter-Process Communication

Filesystem-based IPC: simple, auditable, secure

Containers can't call host functions directly. They communicate by writing JSON files to a shared directory. The host polls this directory and processes the files. This is the simplest possible IPC mechanism — and that simplicity is the point.

Why not sockets, gRPC, or HTTP?

All of those require networking between host and container, authentication, serialization libraries, and error handling for network failures. Filesystem IPC needs: (1) write a file, (2) read a file, (3) delete a file. It's debuggable with ls and cat. It's atomic on rename. And the mount boundary means one group's IPC directory is invisible to other groups.

IPC directory structure

// Each group gets its own IPC namespace data/ipc/{group}/ messages/ Container writes: "send this text to the chat" 1708612345-a3f2.json tasks/ Container writes: "schedule/pause/cancel a task" 1708612350-b7c1.json input/ Host writes: "here's a follow-up message for you" 1708612360-c9d4.json _close Host writes: sentinel to end container current_tasks.json Host writes: snapshot of all tasks (for agent to read) available_groups.json Host writes: group list (main only)

Authorization by directory

This is the cleverest security pattern in NanoClaw. The host determines a message's source group by which IPC directory it came from — not by reading a groupFolder field from the JSON. Since each container can only write to its own mounted IPC directory, a compromised agent in Group A physically cannot write files that appear to come from Group B.

// src/ipc.ts — identity from directory, not from JSON content for (const sourceGroup of groupFolders) { const isMain = sourceGroup === MAIN_GROUP_FOLDER; const messagesDir = path.join(ipcBaseDir, sourceGroup, 'messages'); // Authorization: verify this group can send to this chatJid if (isMain || (targetGroup && targetGroup.folder === sourceGroup)) { await deps.sendMessage(data.chatJid, data.text); } else { logger.warn('Unauthorized IPC message attempt blocked'); } }

Atomic writes

Both the MCP server and the host use the write-to-temp-then-rename pattern to prevent reading half-written files:

// container/agent-runner/src/ipc-mcp-stdio.ts const tempPath = `${filepath}.tmp`; fs.writeFileSync(tempPath, JSON.stringify(data, null, 2)); fs.renameSync(tempPath, filepath); // atomic on POSIX

Pattern to learn: Filesystem IPC with identity-by-mount-path is applicable far beyond NanoClaw. Any system where you need isolated processes to communicate through a shared filesystem can use this pattern. It's simpler than message brokers, more auditable than sockets, and the security properties are trivially verifiable.

07 — Threat Model

Security model: what's protected, what's not

NanoClaw's security model has four layers. Understanding them helps you reason about what an attacker (or a misbehaving agent) can and cannot do.

Layer 1: Container boundary (strong)

Docker containers provide process isolation (separate PID namespace), filesystem isolation (only mounted paths visible), and user isolation (runs as uid 1000, not root). This is the primary security boundary. An agent cannot see files you didn't mount, cannot kill host processes, and cannot access host services directly.

Layer 2: Mount allowlist (strong)

Additional directories (beyond the group folder) require explicit allowlisting in ~/.config/nanoclaw/mount-allowlist.json. This file is stored outside the project root and is never mounted into any container, making it tamper-proof. The validation resolves symlinks before checking paths (preventing traversal attacks) and blocks a hardcoded list of sensitive patterns (.ssh, .gnupg, .aws, .env, etc.).

Layer 3: IPC authorization (strong)

Group identity derived from filesystem path, not self-reported. Non-main groups can only send messages to their own chat and manage their own tasks. The main group has full access — admin privileges.

Layer 4: Credential scrubbing (weak)

API keys are passed through carefully and scrubbed from Bash environments, but remain accessible via /proc/self/environ inside the container. Combined with unrestricted container network access, this means a prompt injection could exfiltrate API keys.

Threat matrix

Threat	Mitigated?	How
Agent reads host files	Yes	Container mount boundary
Agent kills host processes	Yes	PID namespace isolation
Agent reads other group's data	Yes	Per-group mount isolation
Agent impersonates another group	Yes	Identity by directory path
Agent modifies security config	Yes	Allowlist outside project root
Agent sends messages as another group	Yes	IPC authorization check
Agent reads API keys	No	/proc/self/environ readable
Agent exfiltrates data via network	No	No network restriction on containers
Prompt injection in messages	No	XML escaping prevents markup, not semantic injection
Agent floods WhatsApp (DoS)	No	No IPC rate limiting

Honest assessment: The container boundary is genuinely strong. The credential and network gaps are real but documented openly — the project doesn't pretend to solve problems it hasn't solved. For a personal single-user assistant, this risk profile is reasonable. For anything multi-user or holding sensitive data, the network and credential gaps need closing first.

08 — Inside the Container

The agent runner: query loop, streaming, and session management

The host orchestrator is well-understood — it polls, dispatches, and routes. But the most architecturally sophisticated code lives inside the container, in agent-runner/src/index.ts (588 lines). This is where the Claude Agent SDK actually runs.

The query loop

The agent runner doesn't make a single SDK call and exit. It runs a query loop: call query(), wait for IPC follow-up messages, then call query() again with resumeAt pointing to the last assistant message. This keeps the conversation going across multiple user messages without spawning a new container each time.

// Simplified from agent-runner/src/index.ts — the query loop let resumeAt: string | undefined; let sessionId = containerInput.sessionId; while (true) { const result = await runQuery(prompt, sessionId, ..., resumeAt); if (result.newSessionId) sessionId = result.newSessionId; if (result.lastAssistantUuid) resumeAt = result.lastAssistantUuid; if (result.closedDuringQuery) break; // _close sentinel received // Wait indefinitely for next IPC message or _close const nextMessage = await waitForIpcMessage(); if (nextMessage === null) break; // _close sentinel prompt = nextMessage; // start new query with follow-up }

MessageStream: a push-based async iterable

Here's the problem: the Claude Agent SDK's query() accepts a prompt parameter that can be an AsyncIterable. If you pass a single string, the SDK treats it as a one-shot query. But NanoClaw needs to inject follow-up messages while a query is still running — for example, when a user sends a correction while the agent is mid-search.

The MessageStream class solves this with a push-callback queue pattern:

// Actual code from agent-runner/src/index.ts class MessageStream { private queue: SDKUserMessage[] = []; private waiting: (() => void) | null = null; private done = false; push(text: string): void { this.queue.push({ type: 'user', message: { role: 'user', content: text }, ... }); this.waiting?.(); // wake up the consumer } end(): void { this.done = true; this.waiting?.(); } async *[Symbol.asyncIterator](): AsyncGenerator<SDKUserMessage> { while (true) { while (this.queue.length > 0) { yield this.queue.shift()!; } if (this.done) return; await new Promise<void>(r => { this.waiting = r; }); this.waiting = null; } } }

The SDK's for await (const message of query({ prompt: stream ... })) consumes this iterable. When the queue is empty, the consumer blocks on the Promise. When push() is called (from the IPC poller), the Promise resolves and the consumer yields the next message. When end() is called, the generator returns and the query completes.

Pattern to learn: Push-based async iterables are useful whenever you need to bridge a push API (events, polling) to a pull API (async generators, for await loops). The resolve-callback queue pattern — store a resolver, call it when data arrives — is the minimal implementation. This same pattern works in Python with asyncio.Queue and async for.

PreCompact hook: archiving before context loss

Claude Code compacts conversation history when it gets too long, discarding older messages to stay within context limits. NanoClaw hooks into this with a PreCompact callback that archives the full transcript to /workspace/group/conversations/ as a markdown file before compaction happens.

// createPreCompactHook() — archive before the SDK compacts const content = fs.readFileSync(preCompact.transcript_path, 'utf-8'); const messages = parseTranscript(content); const summary = getSessionSummary(sessionId, transcriptPath); const filename = `${date}-${sanitizeFilename(summary)}.md`; fs.writeFileSync(path.join('/workspace/group/conversations', filename), markdown); // Now the SDK can safely compact — nothing is lost

Credential isolation inside the container

The env.ts module on the host deliberately does NOT use dotenv or load secrets into process.env. This is intentional: any child process inherits process.env, so loading secrets there would leak them to every Bash command. Instead, secrets flow through a narrow pipeline: .env file → readEnvFile() → stdin JSON → sdkEnv object → SDK query({ env: sdkEnv }). A PreToolUse hook injects unset ANTHROPIC_API_KEY CLAUDE_CODE_OAUTH_TOKEN before every Bash command.

Why not dotenv? Most Node.js projects use require('dotenv').config() which loads everything into process.env globally. This is fine for web servers but dangerous for agent systems that spawn subprocesses. NanoClaw's custom readEnvFile() returns a plain object — the caller decides what to do with each value. A small design choice with large security implications.

09 — Error Handling

Resilience: retry, recovery, and graceful shutdown

A system that spawns containers, manages WhatsApp connections, and coordinates async tasks has many failure modes. NanoClaw handles them with three patterns worth studying.

Pattern 1: Exponential backoff retries

When a container fails (crash, timeout, SDK error), the GroupQueue retries with exponential backoff:

// src/group-queue.ts — retry with backoff const MAX_RETRIES = 5; const BASE_RETRY_MS = 5000; state.retryCount++; if (state.retryCount > MAX_RETRIES) { logger.error('Max retries exceeded, giving up'); state.retryCount = 0; // reset for future messages return; } const delayMs = BASE_RETRY_MS * Math.pow(2, state.retryCount - 1); // Retry 1: 5s, Retry 2: 10s, Retry 3: 20s, Retry 4: 40s, Retry 5: 80s setTimeout(() => processNext(groupJid), delayMs);

Why exponential? If the failure is transient (network blip, API rate limit), a short delay fixes it. If it's persistent (invalid prompt, container image broken), you don't want to hammer the system every 5 seconds. Exponential backoff naturally adapts: fast recovery for transient failures, gradual back-off for persistent ones. The retry counter resets on success, so a single failure in an otherwise healthy group doesn't accumulate penalty.

Pattern 2: Conditional cursor rollback

This is the subtlest correctness pattern in the codebase. When an agent fails, should the system retry those messages? It depends on whether the user already saw output:

// src/index.ts — processGroupMessages() if (hadError) { if (outputSentToUser) { // User already got partial results. Rollback would cause DUPLICATES. logger.warn('Error after output sent, skipping cursor rollback'); return true; // treat as "done" — no retry } // No output sent yet. Safe to retry — user won't see duplicates. lastAgentTimestamp[chatJid] = previousCursor; saveState(); // persist rollback to survive crash return false; // signal failure — triggers backoff retry }

The key insight: an outputSentToUser boolean tracks whether any response was already delivered. If yes — even if the agent later crashes — the cursor advances to prevent sending the same response twice. The system tolerates a partially-answered query over a duplicated one.

Pattern 3: Crash recovery at startup

If the process crashes between polling messages and starting an agent, those messages would be "seen" (cursor advanced) but never processed. On startup, recoverPendingMessages() re-scans every registered group for unprocessed messages:

// src/index.ts — recoverPendingMessages() for (const [chatJid, group] of Object.entries(registeredGroups)) { const pending = getMessagesSince(chatJid, lastAgentTimestamp[chatJid], ASSISTANT_NAME); if (pending.length > 0) { logger.info({ group: group.name, count: pending.length }, 'Recovery: found unprocessed messages'); queue.enqueueMessageCheck(chatJid); // re-enter the processing pipeline } }

Graceful shutdown

On SIGTERM or SIGINT, the system drains active containers (10-second grace period via queue.shutdown(10000)), disconnects all channels cleanly, then exits. Active containers receive _close sentinels so agents can archive conversations before the container is stopped.

10 — Autonomous Execution

Task scheduler: agents that act without being asked

Scheduled tasks are what make this more than a chat interface. A task is a prompt that runs on a schedule — cron expression, interval, or one-time — without any user trigger. The task agent has full access to all tools and can message the user with results.

Task lifecycle

// How a task gets created: User: "@Andy remind me to exercise every morning at 7am" → Agent calls MCP tool: schedule_task( prompt: "Remind the user to exercise...", schedule_type: "cron", schedule_value: "0 7 * * *" ) → MCP server writes JSON to data/ipc/{group}/tasks/ → Host IPC watcher processes it, creates task in SQLite → Scheduler loop detects task is due at next 7am → Spawns container in group context, agent runs prompt → Agent calls send_message to deliver reminder

Context modes

Tasks can run in two modes, and the distinction matters:

Group context

Task resumes the group's existing session. Has full conversation history, knows what was discussed. Use for: "follow up on my request", "continue where we left off".

Isolated context

Task runs in a fresh session with no history. All context must be in the prompt itself. Use for: "check the weather", "generate a report", "scan for updates".

Key implementation detail

Unlike message-triggered containers that wait for follow-up IPC messages, task containers close promptly. A 10-second delay after the first result allows any final MCP calls to complete, then a _close sentinel is written. This prevents task containers from idling for the full 30-minute timeout.

11 — Extensibility Model

Skills engine: code transformation over feature flags

Most projects add features by writing code that handles every case and configuration flags to pick between them. NanoClaw takes a radically different approach: instead of adding a feature flag for Telegram, you contribute a skill that teaches the AI code editor how to transform NanoClaw into a Telegram-based system.

How skills work

A skill is a SKILL.md file in .claude/skills/{name}/. When the user runs /add-telegram in Claude Code, the AI reads the skill file and follows its instructions to modify the codebase — creating new files, editing existing ones, updating dependencies.

// .claude/skills/add-telegram/SKILL.md (simplified) # Add Telegram Channel ## What this does Adds Telegram as a messaging channel alongside or replacing WhatsApp. ## Steps 1. Install grammy dependency 2. Create src/channels/telegram.ts implementing Channel interface 3. Add TelegramChannel to the channels array in src/index.ts 4. Update .env with TELEGRAM_BOT_TOKEN 5. Add 46 unit tests in src/channels/telegram.test.ts ## The Channel interface interface Channel { name: string; connect(): Promise<void>; sendMessage(jid: string, text: string): Promise<void>; isConnected(): boolean; ownsJid(jid: string): boolean; disconnect(): Promise<void>; }

Why this matters

The result is that every NanoClaw installation is a clean fork that does exactly what the user needs. There's no dead code for channels you don't use, no config for features you didn't enable, no runtime cost for integration points you don't care about. The codebase stays small because features aren't accumulated — they're applied selectively.

Design pattern worth studying: Skills-over-features is an emerging pattern in AI-native development. Instead of building configuration systems, build instruction sets that teach AI to modify code. This works when: (1) the codebase is small enough for AI to understand fully, (2) changes are well-scoped with clear boundaries, (3) users have an AI coding assistant available.

12 — Hands-On Learning

Exercises: build understanding by doing

These exercises are designed to run locally without deploying NanoClaw. They isolate individual concepts from the codebase so you can understand them independently. No WhatsApp account or Anthropic API key needed.

Exercise 1 — Filesystem IPC

Build a producer-consumer IPC system

Replicate NanoClaw's IPC pattern in Python. Create two scripts: a producer that writes JSON files to a directory, and a consumer that polls, reads, and deletes them. Add atomic writes (write to .tmp, then rename).

# producer.py — write messages to ipc/ directory import json, os, time, uuid from pathlib import Path ipc_dir = Path("ipc/messages") ipc_dir.mkdir(parents=True, exist_ok=True) while True: msg = {"type": "message", "text": input("> "), "time": time.time()} temp = ipc_dir / f"{uuid.uuid4()}.json.tmp" final = temp.with_suffix("").with_suffix(".json") temp.write_text(json.dumps(msg)) temp.rename(final) # atomic on POSIX / Windows (same drive) print(f"Wrote {final.name}")

# consumer.py — poll ipc/ directory and process messages import json, time from pathlib import Path ipc_dir = Path("ipc/messages") while True: for f in sorted(ipc_dir.glob("*.json")): data = json.loads(f.read_text()) print(f"[{data['type']}] {data['text']}") f.unlink() # consume = delete time.sleep(1) # poll interval

Run both in separate terminals. Type messages in the producer, watch them appear in the consumer. Then try: what happens if you rename the ipc/messages directory while both are running?

Exercise 2 — Sentinel Marker Parsing

Parse structured output from mixed stdout

NanoClaw's biggest stdout challenge: the container emits debug logs, stderr, AND structured JSON results on the same stream. Build a parser that extracts JSON between sentinel markers from a noisy stream.

# stream_parser.py import json START = "---OUTPUT_START---" END = "---OUTPUT_END---" def parse_stream(text: str) -> list[dict]: """Extract all JSON objects between sentinel markers.""" results = [] buf = text while True: start = buf.find(START) if start == -1: break end = buf.find(END, start) if end == -1: break json_str = buf[start + len(START):end].strip() try: results.append(json.loads(json_str)) except json.JSONDecodeError: pass # skip malformed buf = buf[end + len(END):] return results # Test with realistic mixed output test_stream = """ [debug] Starting agent... [debug] Loading model... ---OUTPUT_START--- {"status": "success", "result": "The weather is 28C in Bangalore"} ---OUTPUT_END--- [debug] Tool call: WebSearch [debug] Tokens used: 1523 ---OUTPUT_START--- {"status": "success", "result": null, "newSessionId": "sess-abc123"} ---OUTPUT_END--- """ for r in parse_stream(test_stream): print(r)

Challenge: modify the parser to work incrementally — process chunks as they arrive (like reading from a pipe), handling the case where a marker is split across chunks.

Exercise 3 — Mount Allowlist Validator

Build a path validation system

Implement the core of NanoClaw's mount security: validate that a requested path is under an allowed root, doesn't match blocked patterns, and resolve symlinks before checking.

# mount_validator.py from pathlib import Path BLOCKED = [".ssh", ".gnupg", ".env", ".aws", "credentials", "private_key"] ALLOWED_ROOTS = [Path.home() / "projects", Path.home() / "repos"] def validate_mount(requested: str) -> tuple[bool, str]: path = Path(requested).expanduser() # Resolve symlinks BEFORE checking try: real = path.resolve(strict=True) except FileNotFoundError: return False, f"Path does not exist: {path}" # Check blocked patterns for part in real.parts: for blocked in BLOCKED: if blocked in part: return False, f"Matches blocked pattern: {blocked}" # Check under allowed root for root in ALLOWED_ROOTS: try: real.relative_to(root.resolve()) return True, f"Allowed under {root}" except ValueError: continue return False, "Not under any allowed root" # Try it: print(validate_mount("~/projects/myapp")) print(validate_mount("~/.ssh/id_rsa")) print(validate_mount("/etc/passwd"))

Advanced: create a symlink from ~/projects/backdoor pointing to ~/.ssh and verify it gets caught by the symlink resolution step.

Exercise 4 — Per-Group Queue with Concurrency

Build a bounded concurrent task queue

NanoClaw limits concurrent containers (default: 5). Build a queue that processes tasks per-group (FIFO within each group, parallel across groups, with a global concurrency limit).

# group_queue.py import asyncio from collections import defaultdict class GroupQueue: def __init__(self, max_concurrent: int = 3): self.semaphore = asyncio.Semaphore(max_concurrent) self.group_locks: dict[str, asyncio.Lock] = defaultdict(asyncio.Lock) async def process(self, group: str, task_fn): # Per-group serialization + global concurrency limit async with self.group_locks[group]: async with self.semaphore: return await task_fn() # Test: 5 groups, 3 concurrent slots async def main(): q = GroupQueue(max_concurrent=3) async def work(group, task_id): print(f"[{group}] task-{task_id} START") await asyncio.sleep(2) # simulate container run print(f"[{group}] task-{task_id} DONE") tasks = [] for i, g in enumerate(["family", "work", "hobby", "family", "work"]): tasks.append(q.process(g, lambda g=g, i=i: work(g, i))) await asyncio.gather(*tasks) asyncio.run(main())

Observe: "family" tasks never run concurrently (per-group lock), but "family" and "work" do run in parallel (up to the global limit). What happens if you set max_concurrent=1?

Exercise 5 — Local LLM Agent Loop

Build a minimal agent loop with local inference

Replicate the core agent pattern using a local model (no API costs). Use Ollama or llama.cpp to run a small model, then build a loop that receives a prompt, sends it to the model, and executes any tool calls.

# mini_agent.py — requires: pip install httpx # Run Ollama first: ollama run qwen2.5:1.5b import httpx, json, subprocess OLLAMA_URL = "http://localhost:11434/api/chat" TOOLS_PROMPT = """You are a helpful assistant. When the user asks you to run a command, respond with EXACTLY: [TOOL:bash] command here [/TOOL] When you want to read a file, respond with: [TOOL:read] filepath [/TOOL] Otherwise respond normally.""" def call_model(messages: list) -> str: resp = httpx.post(OLLAMA_URL, json={ "model": "qwen2.5:1.5b", "messages": messages, "stream": False, }, timeout=60) return resp.json()["message"]["content"] def execute_tool(text: str) -> str | None: if "[TOOL:bash]" in text: cmd = text.split("[TOOL:bash]")[1].split("[/TOOL]")[0].strip() result = subprocess.run(cmd, shell=True, capture_output=True, text=True) return f"Output:\n{result.stdout}{result.stderr}" if "[TOOL:read]" in text: path = text.split("[TOOL:read]")[1].split("[/TOOL]")[0].strip() try: return f"File contents:\n{open(path).read()[:2000]}" except Exception as e: return f"Error: {e}" return None # Agent loop messages = [{"role": "system", "content": TOOLS_PROMPT}] print("Mini Agent (type 'quit' to exit)") while True: user = input("\nYou: ") if user == "quit": break messages.append({"role": "user", "content": user}) # Agent loop: call model, execute tool, feed result back for _ in range(5): # max 5 tool rounds response = call_model(messages) print(f"\nAgent: {response}") messages.append({"role": "assistant", "content": response}) tool_result = execute_tool(response) if tool_result: print(f"\n[Tool result]: {tool_result[:500]}") messages.append({"role": "user", "content": tool_result}) else: break # no tool call = final answer

This is the same pattern as NanoClaw's agent runner: prompt → model → check for tool use → execute → feed result back → repeat. The difference is scale: NanoClaw uses Claude Code (with 50+ built-in tools) in a container, while this uses a 1.5B model with 2 hand-rolled tools. The architecture is identical.

Exercise 6 — Container Mount Isolation

Prove container isolation with Docker

Verify that Docker volume mounts actually prevent access to non-mounted directories. This is what makes NanoClaw's security model work.

# Create test directories mkdir -p /tmp/nanoclaw-test/group-family mkdir -p /tmp/nanoclaw-test/group-work echo "family secret" > /tmp/nanoclaw-test/group-family/notes.txt echo "work secret" > /tmp/nanoclaw-test/group-work/notes.txt # Run container with ONLY the family folder mounted docker run --rm -v /tmp/nanoclaw-test/group-family:/workspace alpine sh -c " echo '--- Can see family data: ---' cat /workspace/notes.txt echo '--- Trying to see work data: ---' cat /tmp/nanoclaw-test/group-work/notes.txt 2>&1 || echo 'BLOCKED' echo '--- Trying to reach host filesystem: ---' ls /tmp/ 2>&1 || echo 'BLOCKED' " # Output: # --- Can see family data: --- # family secret # --- Trying to see work data: --- # BLOCKED # --- Trying to reach host filesystem: --- # (only container's /tmp/, not host's)

This is the core guarantee NanoClaw relies on. The container physically cannot see /tmp/nanoclaw-test/group-work/ because it was never mounted. No permission checks, no ACLs — the path simply does not exist inside the container's filesystem.

Exercise 7 — Two-Cursor Message Processing

Implement NanoClaw's dual-cursor pattern

The trickiest state management problem in the codebase: two cursors track different things. lastTimestamp tracks what the polling loop has seen. lastAgentTimestamp[group] tracks what each agent has processed. Build a simulation.

# dual_cursor.py from dataclasses import dataclass @dataclass class Message: group: str text: str timestamp: int has_trigger: bool # Simulated message stream messages = [ Message("family", "did you see the match?", 1, False), Message("family", "what was the score?", 2, False), Message("family", "@Andy summarize the game", 3, True), # trigger! Message("work", "@Andy check the pipeline", 4, True), Message("family", "thanks!", 5, False), ] last_seen = 0 # global cursor last_agent: dict[str, int] = {} # per-group cursor for msg in messages: if msg.timestamp <= last_seen: continue last_seen = msg.timestamp # Non-trigger messages accumulate silently if not msg.has_trigger: print(f" [{msg.group}] t={msg.timestamp} accumulated: {msg.text}") continue # Trigger: grab ALL messages since last agent run for this group agent_cursor = last_agent.get(msg.group, 0) context = [m for m in messages if m.group == msg.group and m.timestamp > agent_cursor] print(f" [{msg.group}] TRIGGER at t={msg.timestamp}") print(f" Context ({len(context)} msgs): {[m.text for m in context]}") # Advance per-group cursor last_agent[msg.group] = context[-1].timestamp

Key insight: when @Andy is triggered at t=3, the agent sees messages at t=1, t=2, AND t=3 (all accumulated context). The "work" group trigger at t=4 only sees its own message. Run it and trace the cursors.

Exercise 8 — Push-Based Async Iterable

Build a MessageStream in Python

Replicate NanoClaw's MessageStream pattern: a producer pushes items, a consumer reads them with async for. The consumer blocks when the queue is empty and unblocks when a new item is pushed.

# message_stream.py import asyncio class MessageStream: """Push-based async iterable. Mirrors NanoClaw's MessageStream.""" def __init__(self): self.queue: asyncio.Queue[str | None] = asyncio.Queue() def push(self, text: str) -> None: """Producer: push a message (thread-safe via loop.call_soon_threadsafe).""" self.queue.put_nowait(text) def end(self) -> None: """Signal no more messages.""" self.queue.put_nowait(None) # sentinel def __aiter__(self): return self async def __anext__(self) -> str: item = await self.queue.get() # blocks until push() or end() if item is None: raise StopAsyncIteration return item # Simulate: producer pushes messages on a timer, consumer reads them async def producer(stream: MessageStream): for msg in ["What is the weather?", "Actually, check Bangalore", "Thanks!"]: await asyncio.sleep(1) print(f" [producer] push: {msg}") stream.push(msg) await asyncio.sleep(1) stream.end() async def consumer(stream: MessageStream): async for msg in stream: print(f" [consumer] got: {msg}") await asyncio.sleep(0.5) # simulate processing print(" [consumer] stream ended") async def main(): stream = MessageStream() await asyncio.gather(producer(stream), consumer(stream)) asyncio.run(main())

This is exactly how NanoClaw injects follow-up WhatsApp messages into an active Claude query. The SDK consumes the async iterable; the IPC poller pushes messages into it. The consumer never knows whether the next message will arrive in 1 second or 10 minutes — it just awaits.

Where to go from here

docs/SECURITY.md — Full threat model with trust levels and architecture diagram
docs/SDK_DEEP_DIVE.md — Reverse-engineering notes on how Claude Agent SDK works internally
docs/SPEC.md — Complete technical specification with folder structure and integration points
docs/REQUIREMENTS.md — Original design decisions and philosophy from the creator
container/agent-runner/src/index.ts — The query loop, MessageStream, and PreCompact hook; the most instructive single file (588 lines)
.claude/skills/ — Read the skill files to understand the code-transformation-over-config philosophy

Best learning path: Read src/index.ts top-to-bottom (488 lines, the orchestrator). Then container/agent-runner/src/index.ts (588 lines, the agent loop). These two files contain 80% of the system's logic. Everything else is infrastructure supporting these two files.