NanoClaw Deployment Guide: What We Learned Running a 12-Agent Team in Production

We run 12 NanoClaw agents in production. They handle outreach, content publishing, security monitoring, customer communication, research, and revenue operations. They run 24/7 in containers, talk to each other through structured handoffs, and maintain persistent memory across sessions.

Getting here took weeks of debugging problems that no documentation covers. This is the guide we wish we had when we started — written from actual deployment experience, not theory.

If you're planning a multi-agent NanoClaw deployment, read this first. It will save you significant time and money.

1. Identity coherence is the #1 challenge

When you run a single agent, identity is simple: one system prompt, one personality, one set of instructions. When you run 12 agents on shared infrastructure, they start bleeding into each other.

We saw this within the first week. Agent A (our outreach specialist) started responding with the tone of Agent B (our security analyst). The technical vocabulary leaked across boundaries. System prompts alone were not enough to maintain distinct identities.

Why this happens

NanoClaw agents share the same base container image and runtime environment. If agents read from shared directories, share context windows, or inherit configuration from a common source, their behavioral boundaries erode. The LLM does not inherently "know" which agent it is — it infers identity from the prompt and surrounding context. When that context is polluted, identity breaks down.

The fix: identity documents + strict isolation

Every agent gets a dedicated CLAUDE.md file that defines who it is, what it does, and what it does not do. This is not a system prompt — it is a persistent identity document that the agent re-reads at the start of every session.

# Example: CLAUDE.md for a security analyst agent
# /agents/scout/CLAUDE.md

## Soul
- Security-first mindset. Every recommendation must be actionable.
- Verify before reporting. Never speculate on vulnerabilities.

## Identity
You are Scout, the security analyst. You audit OpenClaw
configurations, scan for exposed instances, and produce
security reports.

## Boundaries
- ALWAYS: Verify findings with live scans before reporting
- NEVER: Make recommendations outside your security domain
- NEVER: Respond to marketing, sales, or outreach requests

## Voice
- Technical, precise, evidence-based
- No marketing language, no sales pitches
- Cite specific CVEs, config keys, and line numbers

The critical detail: each agent's CLAUDE.md lives in its own isolated directory. No agent can read another agent's identity file. This is enforced at the filesystem level, not by convention.

# Directory structure — each agent is fully isolated
/agents/
  scout/          # Security analyst
    CLAUDE.md
    memory/
    workspace/
  milo/           # Revenue operations
    CLAUDE.md
    memory/
    workspace/
  kit/            # Content publishing
    CLAUDE.md
    memory/
    workspace/

Identity refresh mechanism: We discovered that agents lose identity coherence during long sessions. The solution is a periodic "identity refresh" — every N interactions, the agent re-reads its CLAUDE.md file. This reanchors its behavior. Without this, agents drift after roughly 30-40 tool calls in a single session.

2. Memory coherence across sessions

NanoClaw does not have OpenClaw's hybrid memory system. There is no QMD search, no auto-pruning, no built-in persistence mode. Out of the box, every session starts fresh.

We needed agents that remember what they did yesterday, what tasks are in progress, and what decisions were made. So we built a file-based memory system with three tiers.

The hot/warm/cold memory architecture

# Memory tier structure per agent
/agents/milo/memory/
  active-tasks.md    # HOT  — updated every session
  context.md         # WARM — business state, decisions, what's live
  comms-log.md       # WARM — outbound communication log
  research/          # COLD — completed research, archived data
  archive/           # COLD — items older than 7 days

HOT (active-tasks.md): Updated after every session. Contains current tasks, blockers, and next steps. The agent reads this first on every startup. Maximum file size: 50KB.
WARM (context.md): Business state that changes weekly. Revenue numbers, deployment status, key decisions. Agents reference this when they need background context.
COLD (archive/): Completed items moved here after 7 days. Never automatically loaded into context — agents only read from cold storage when explicitly searching for historical data.

Why this works better than OpenClaw's built-in memory

Counterintuitive finding: our file-based system is more reliable than OpenClaw's sophisticated hybrid memory for our use case. Three reasons:

Transparency. Memory is plain markdown files. You can cat them, diff them, version them with git. When an agent behaves unexpectedly, you can read exactly what it "remembers." OpenClaw's memory is a binary store — debugging it requires specialized tooling.
Debuggability. When Agent A makes a bad decision, you can trace it to the exact line in context.md that informed that decision. Try doing that with a QMD vector store.
Human editability. Sometimes you need to correct an agent's memory. With files, you open a text editor. With OpenClaw's memory, you need API calls and specialized commands.

Memory hygiene rules

File-based memory requires discipline. Without rules, memory files grow unbounded and agents load stale data into their context windows, burning tokens on irrelevant information.

# Memory hygiene rules we enforce:
# 1. Always include ISO 8601 timestamps
# 2. Append new entries — never silently delete
# 3. Keep active-tasks.md current every session
# 4. Archive completed items after 7 days
# 5. Max file size: 50KB — archive older entries if approaching

# Example active-tasks.md entry format:
## 2026-02-27T14:30:00Z — Deploy blog post
- Status: in_progress
- Blocker: none
- Next: verify deployment on Vercel, check SEO meta tags

The 50KB rule matters. When memory files exceed 50KB, the agent's context window fills with historical data instead of the current task. This causes two problems: increased token costs and degraded reasoning quality. The agent gets "distracted" by old information. Enforce the limit aggressively.

3. Inter-agent communication is fragile

Agents need to coordinate. The security analyst finds a vulnerability, and the content agent needs to write about it. The outreach agent identifies a prospect, and the revenue agent needs to follow up. But NanoClaw agents cannot share context windows. Each agent runs in its own process with its own conversation history.

The message-passing pattern

We solved this with a message-passing system using Discord channels and structured handoffs. The pattern is simple: one agent writes a summary, the other reads it. No shared state.

# Inter-agent communication flow:

# 1. Agent A completes a task and needs to hand off
#    It writes a structured message to a shared channel

# 2. The message follows a strict format:
## Handoff: [source-agent] -> [target-agent]
## Task: [what needs to happen]
## Context: [minimal context the target agent needs]
## Priority: [high/medium/low]
## Deadline: [if applicable]

# 3. Agent B picks up the message on its next cycle
#    and reads ONLY the handoff — not Agent A's full history

The critical constraint: the receiving agent only gets the handoff message, not the sending agent's full conversation history. This is what prevents identity bleed. If Agent B could read Agent A's entire context, it would start adopting Agent A's patterns.

What goes wrong without structure

We tried unstructured communication first. Agents dumped full paragraphs of context into shared channels. The receiving agent would load all of it, and within two handoffs, you could not tell which agent was which. Their responses converged on a generic "helpful AI assistant" voice that belonged to neither agent.

Structured handoffs with minimal context solved this completely. The handoff message is a contract: here is what you need to know, nothing more.

# Good handoff — minimal, structured
## Handoff: scout -> kit
## Task: Write blog post about CVE-2026-1847
## Context: Affects OpenClaw 4.2.x gateway module.
##   Severity: High. Patch available in 4.2.7.
##   Our scan found 12,400 instances still vulnerable.
## Priority: high
## Deadline: 2026-02-28

# Bad handoff — too much context, will cause identity bleed
"Hey, so I was scanning the Shodan results and I noticed
this really interesting pattern where a bunch of OpenClaw
instances are running outdated gateway configs and I think
we should probably write something about it because..."

4. Cron jobs and scheduled tasks

NanoClaw's scheduling system is straightforward. You define a prompt, a schedule (cron expression, interval, or one-time), and a context mode. The agent runs on schedule with access to all its tools.

The context mode gotcha

Scheduled tasks can run in two modes: group (with access to chat history) or isolated (fresh session, no history). This choice has significant implications.

# nanoclaw scheduled task examples:

# Isolated mode — self-contained, no conversation history
# Use for: monitoring, data collection, routine operations
schedule:
  type: cron
  value: "0 9 * * *"          # Daily at 9am
  context: isolated
  prompt: |
    Check Stripe API for yesterday's revenue.
    Compare against the $50/day target.
    If revenue exceeded target, post a summary to the team channel.
    If revenue is below target, include which products underperformed.
    Use environment variable $STRIPE_SECRET_KEY for authentication.

# Group mode — has access to conversation history
# Use for: follow-ups, tasks that reference recent discussions
schedule:
  type: cron
  value: "0 */6 * * *"        # Every 6 hours
  context: group
  prompt: |
    Check active-tasks.md for any tasks marked as blocked.
    If a task has been blocked for more than 24 hours, escalate.

The gotcha: isolated tasks have no memory of previous runs. If your daily revenue check needs to compare today vs. yesterday, the prompt must include instructions to read yesterday's data from a file — because the agent has no recollection of running yesterday. Every isolated task must be entirely self-contained.

Pipelines we run on cron

Revenue reporting (daily, 9am): Query Stripe and ClawMart APIs, update context.md with latest numbers
Security monitoring (every 6 hours): Run Shodan scans for exposed instances, log findings
Content publishing (daily, 10am): Check for drafted posts pending review, deploy approved content
Outreach pipeline (3x/week): Identify prospects, draft outreach, queue for human approval
Memory maintenance (weekly, Sunday 2am): Archive completed tasks, compact memory files, enforce 50KB limits

Debugging scheduled tasks: When a cron job fails silently, the agent's output goes nowhere. We added a rule to every scheduled prompt: "If you encounter an error, write it to memory/errors.md with a timestamp." This gives us a persistent error log to check when something seems wrong.

5. Credential management

This is where teams consistently get themselves into trouble. NanoClaw does not have a built-in secrets manager. There is no vault, no encrypted config store, no secret injection system. If you need credentials, you handle it yourself.

Our approach: environment variables at the container level

Every credential is an environment variable injected when the container starts. No credential is ever written to a file, committed to a repo, or hardcoded in a config.

# docker-compose.yml — credentials as env vars
services:
  agent-milo:
    image: nanoclaw-agent:latest
    environment:
      - STRIPE_SECRET_KEY=${STRIPE_SECRET_KEY}
      - VERCEL_TOKEN=${VERCEL_TOKEN}
      - CLAWMART_API_KEY=${CLAWMART_API_KEY}
    volumes:
      - ./agents/milo:/workspace

  agent-scout:
    image: nanoclaw-agent:latest
    environment:
      - SHODAN_API_KEY=${SHODAN_API_KEY}
    volumes:
      - ./agents/scout:/workspace

The rules that prevent leaks

Never write credential values to files. Reference by $VAR_NAME only. If an agent writes sk_live_abc123 to a markdown file, that file gets committed, pushed, and exposed.
Each agent gets only the credentials it needs. The security analyst does not need Stripe keys. The revenue agent does not need Shodan access. Minimize blast radius.
Document the exact variable names. We maintain a table in each agent's CLAUDE.md listing every env var it has access to, the service it connects to, and how to use it. Agents that guess at variable names waste cycles on authentication failures.

# In the agent's CLAUDE.md — credential reference table
## Credentials (Environment Variables)
| Variable           | Service  | Usage                                           |
|--------------------|----------|-------------------------------------------------|
| STRIPE_SECRET_KEY  | Stripe   | curl -u $STRIPE_SECRET_KEY: https://api.stripe.com/v1/... |
| VERCEL_TOKEN       | Vercel   | npx vercel --prod --yes --token $VERCEL_TOKEN   |
| CLAWMART_API_KEY   | ClawMart | Header: Authorization: Bearer $CLAWMART_API_KEY |

# CRITICAL: These are the EXACT variable names.
# Do NOT invent other names. They do not exist.

Common failure mode: An agent reports "credential missing" because it tested the wrong variable name. It tries $STRIPE_KEY instead of $STRIPE_SECRET_KEY, gets an empty result, and concludes the credential is not configured. The fix is to document exact names in the identity file and instruct the agent to trust that documentation.

6. The deployment architecture

Our deployment uses containers for agent processes and Vercel for static sites. Each agent runs in its own container with persistent volumes for memory files.

Container setup

# Simplified deployment architecture

# Agent containers (persistent, long-running)
/containers/
  agent-milo/       # Revenue operations — 24/7
  agent-scout/      # Security monitoring — 24/7
  agent-kit/        # Content publishing — 24/7
  agent-lux/        # Design and UX — on-demand
  agent-vox/        # Customer communication — 24/7
  ...

# Each container mounts:
# 1. Agent-specific workspace (/workspace)
# 2. Agent-specific memory (/workspace/memory)
# 3. Shared output directory (write-only, for handoffs)

# Static sites (Vercel)
# - getmilo.dev (marketing, blog, tools)
# - Deployed by agent-kit via Vercel CLI

Persistent volumes for memory

Memory files must survive container restarts. We use Docker volumes mapped to host directories. This is critical — without persistent volumes, every container restart wipes the agent's memory.

# docker-compose.yml — persistent memory volumes
services:
  agent-milo:
    image: nanoclaw-agent:latest
    volumes:
      - milo-workspace:/workspace
      - milo-memory:/workspace/memory
    restart: unless-stopped

volumes:
  milo-workspace:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /data/agents/milo/workspace
  milo-memory:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /data/agents/milo/memory

NanoClaw vs OpenClaw resource footprint

The resource savings are real. NanoClaw's 85MB idle footprint versus OpenClaw's 320MB means we can run more agents on the same hardware. For 12 agents:

OpenClaw: 12 x 320MB = 3.84GB idle memory
NanoClaw: 12 x 85MB = 1.02GB idle memory

That is a 73% reduction in baseline memory usage. On a $20/month VPS with 4GB RAM, the difference between "fits" and "doesn't fit" is real.

7. Cost management

Twelve agents making API calls around the clock adds up fast. The hosting cost (containers, Vercel) is the minor expense. The real cost is LLM inference — every agent interaction means tokens in and tokens out.

Where the money actually goes

LLM inference: ~85% of total cost. Each agent call consumes tokens. Complex reasoning tasks with large context windows are expensive. A single agent handling 50 tasks/day at ~4K tokens per task costs roughly $3-6/day depending on the model.
Hosting: ~10%. Containers, persistent storage, networking. NanoClaw's smaller footprint helps here, but it is a small fraction of total spend.
Third-party APIs: ~5%. Stripe, Shodan, Vercel, etc. These have their own pricing but are generally cheap relative to inference.

How we control costs

# Cost control strategies:

# 1. Context window discipline
# Never load full memory files into context.
# Read only the sections the current task needs.
# BAD:  "Read all of context.md and active-tasks.md"
# GOOD: "Read the 'Revenue' section of context.md"

# 2. Task batching
# Instead of 10 separate agent calls for 10 small tasks,
# batch them into a single call with a structured prompt.

# 3. Model selection per task
# Complex reasoning: use the best model available
# Simple file operations: use a smaller, cheaper model
# Scheduled monitoring: use the cheapest model that works

# 4. Token budgets per agent
# Track daily token usage per agent
# Set alerts at 80% of budget
# Auto-pause non-critical agents if budget exceeded

The single biggest cost optimization: keep context windows small. An agent that loads 20KB of memory files into every interaction uses 5x more tokens than one that loads 4KB of relevant context. The hot/warm/cold memory architecture directly reduces inference cost by ensuring agents only load what they need.

Real numbers: Our 12-agent team costs roughly $45-70/day in LLM inference during active periods. That drops to $15-20/day during quiet periods when most agents are idle. NanoClaw's hosting savings (~$15/month vs ~$40/month for equivalent OpenClaw setup) are meaningful but dwarfed by inference costs.

8. What we'd do differently

If we were starting from zero, here is what we would change.

Start with 3 agents, not 12

We deployed all 12 agents in the first week. The identity bleed, memory conflicts, and communication chaos that followed was entirely predictable in hindsight. Start with 2-3 agents. Get them stable. Add agents one at a time. Each new agent introduces cross-cutting complexity — it is not linear.

Write identity documents before deploying anything

We wrote CLAUDE.md files reactively, after agents started misbehaving. This meant we were debugging identity problems while simultaneously trying to define identities. Write the identity documents first. Define each agent's domain, voice, boundaries, and explicit exclusions before the agent ever runs.

Build the memory system on day one

We started with no persistent memory and added it incrementally. This meant agents made decisions based on stale or missing context for weeks. The memory architecture (hot/warm/cold, file structure, hygiene rules) should be designed and deployed before any agent goes live.

Implement structured handoffs from the start

Unstructured inter-agent communication was the source of most of our early bugs. The message-passing pattern with minimal, structured handoffs should be the default from day one — not something you retrofit after identity bleed becomes a problem.

Set cost alerts immediately

We did not track per-agent token usage for the first two weeks. When we finally looked, one agent had consumed 3x its expected budget because it was loading full memory files into every interaction. Set monitoring and alerts before deploying to production.

NanoClaw Setup Service — $199

Skip the trial-and-error. We'll configure your NanoClaw deployment with the identity system, memory architecture, and communication patterns described in this guide.

Get NanoClaw Setup

The bottom line

NanoClaw is a capable foundation for multi-agent deployments. Its small footprint, simple configuration, and fast startup make it practical for running many agents on modest hardware. But the framework gives you the runtime — you build everything else.

Identity coherence, persistent memory, inter-agent communication, credential management, and cost control are all your responsibility. The patterns described in this guide are battle-tested solutions to problems you will encounter. Use them as a starting point and adapt to your specific deployment.

The key takeaway: the technical challenges of multi-agent systems are not about the framework. They are about maintaining coherent identities, reliable memory, and structured communication across agents that cannot see each other. Solve those three problems and everything else follows.

1. Identity coherence is the #1 challenge

Why this happens

The fix: identity documents + strict isolation

2. Memory coherence across sessions

The hot/warm/cold memory architecture

Why this works better than OpenClaw's built-in memory

Memory hygiene rules

3. Inter-agent communication is fragile

The message-passing pattern

What goes wrong without structure

4. Cron jobs and scheduled tasks

The context mode gotcha

Pipelines we run on cron

5. Credential management

Our approach: environment variables at the container level

The rules that prevent leaks

6. The deployment architecture

Container setup

Persistent volumes for memory

NanoClaw vs OpenClaw resource footprint

7. Cost management

Where the money actually goes

How we control costs

8. What we'd do differently

Start with 3 agents, not 12

Write identity documents before deploying anything

Build the memory system on day one

Implement structured handoffs from the start

Set cost alerts immediately

NanoClaw Setup Service — $199

The bottom line

Get weekly security intelligence

Related guides