How Narwhal works under the hood — for engineers who want to understand, extend, or contribute.
Narwhal is a native macOS Electron app with a Python backend. The Electron renderer communicates with a local FastAPI server (bridge.py) over WebSocket. The bridge runs the NarwhalLLM SDK — a custom orchestration framework that manages conversations, tools, agents, and streaming against AWS Bedrock.
Custom SDK Built from scratch. Not a wrapper around LangChain, Strands, or any other framework.
Existing SDKs (Strands, LangChain) manipulate the message list behind the scenes — injecting system messages, rewriting tool results, merging content in ways that break Bedrock's strict alternating-role requirement. NarwhalLLM gives you explicit control over every message sent to the model. What you build is exactly what gets sent.
| Module | Purpose |
|---|---|
sdk.py | NarwhalLLM — main loop: stream → parse → execute tools → repeat. Model call limits, usage tracking, cache points. |
conversation.py | ConversationManager — message history with auto-merge (eliminates consecutive-role bugs), two-phase context compaction, content replacement. |
tools.py | ToolRouter — contract-driven tools with operational metadata. Parallel batch execution with sibling abort on failure. |
agents.py | AgentRunner — scoped subagent spawning with worker contracts, recursive delegation prevention, per-agent usage tracking. |
streaming.py | Bedrock ConverseStream handler with thread-to-queue pattern + heartbeat keepalive. |
providers/ | BedrockProvider with cache points, exponential retry, throttle handling. |
Unique Every tool is a typed contract with operational metadata — not just a function with a docstring.
ToolDefinition(
name="read_file",
description="Read a file from the filesystem",
input_schema={...},
execute=read_file_fn,
is_read_only=True, # Safe to run without approval
is_concurrency_safe=True, # Can run in parallel with other tools
is_destructive=False, # Won't modify state
interrupt_behavior="cancel" # Safe to abort mid-execution
)
The tool router uses these contracts for:
Unique Four orchestration patterns, all with hard safety boundaries.
| Pattern | How it works | Use case |
|---|---|---|
| Supervisor | Main agent delegates subtasks to scoped workers | Complex multi-step tasks |
| Pipeline | Sequential chain — agent A output → agent B input | Research → summarize → write |
| Swarm | Parallel independent agents with shared cancellation | Search Slack + Jira + email simultaneously |
| Context Brief | Domain agents (Slack/Jira/Email/Confluence) → synthesizer | "What happened this week?" |
spawn_agent, context_brief) are stripped at runtime — no recursive delegationUnique Two-phase compaction keeps conversations running indefinitely without losing critical context.
When a tool result exceeds a threshold, it's replaced with a compressed summary. The original content is stored in a side-channel and can be re-expanded if the model needs it. This happens transparently — the model sees a summary, but the full data is one tool call away.
When the conversation approaches the context window limit, older turns are collapsed into a structured summary. The collapse preserves: key decisions, file paths mentioned, tool results that were acted on, and any user preferences expressed. Collapsed content is never silently dropped — it's summarized with attribution.
Unique Narwhal observes its own behavior and proposes improvements — with your approval.
| Layer | What it does |
|---|---|
| Observation | Tracks tool usage patterns, friction events (retries, errors, user corrections), and capability gaps (questions it couldn't answer). |
| Learning | Converts observations into confidence-scored learnings: "When user says 'deploy', they mean run ./scripts/deploy-release.sh" (confidence: 0.85). |
| Evolution | Proposes concrete changes: new skills, auto-approval rules, prompt adjustments. All changes require explicit user approval before taking effect. |
Persistent memory stored as editable Markdown files at ~/.orcha/memory/:
YYYY-MM-DD.md — what you worked on, decisions made, context establishedNARWHAL.md — your projects, preferences, team context, coding styleskills/*.md — reusable instruction sets activated by keyword triggersA lightweight side-model (Haiku) routes each query to determine which memory files are relevant, injecting only what's needed — not the entire memory corpus.
A separate service — an AI-native wiki and project management platform that replaces Confluence and Jira for structured product documentation.
The desktop app is a client. NarwhalCloud is the authority. One SQLite database, one source of truth. AI agents query and propose changes through typed MCP tools — they never generate markup or touch the database directly.
User types message
→ WebSocket to bridge.py
→ ConversationManager adds user message
→ ATR selects relevant tools (Haiku side-model)
→ NarwhalLLM.run() streams to Bedrock
→ Claude responds with text + tool_use blocks
→ ToolRouter executes tools (parallel if safe)
→ Results added to conversation
→ Loop until Claude responds with text only
→ Stream text tokens back via WebSocket
→ React renders in real-time
| Layer | Technology |
|---|---|
| Desktop shell | Electron 33 + electron-vite |
| Frontend | React 19, TypeScript, Vite |
| Backend | Python 3.11, FastAPI, WebSocket |
| LLM | AWS Bedrock (Claude Sonnet 4) |
| Auth | Cognito + Amazon Federate (Midway) |
| Storage | SQLite (sessions, NarwhalCloud), Markdown (memory, skills) |
| Tool protocol | MCP (Model Context Protocol) |
| Hosting | CloudFront + S3 (desktop releases), EC2 (NarwhalCloud) |