Architecture

How Narwhal works under the hood — for engineers who want to understand, extend, or contribute.

System Overview

┌─────────────────┐ WebSocket ┌──────────────────────────┐ Bedrock API ┌──────────┐ │ Electron App │ ◄──────────────► │ Bridge Server │ ◄──────────────► │ Claude │ │ (React + Vite) │ port 7777 │ (FastAPI + NarwhalLLM) │ │ Sonnet │ │ │ │ │ └──────────┘ │ • Chat UI │ │ • NarwhalLLM SDK │ │ • Sidebar │ │ • Tool Router │ MCP Protocol ┌──────────┐ │ • Preview │ │ • Agent Runner │ ◄──────────────► │ MCP │ │ • Workspace │ │ • Memory System │ │ Servers │ │ • Settings │ │ • Self-Evolution │ └──────────┘ └─────────────────┘ └──────────┬───────────────┘ │ REST API ┌──────────┐ └──────────────────────────────────► │ Narwhal │ │ Cloud │ └──────────┘

Narwhal is a native macOS Electron app with a Python backend. The Electron renderer communicates with a local FastAPI server (bridge.py) over WebSocket. The bridge runs the NarwhalLLM SDK — a custom orchestration framework that manages conversations, tools, agents, and streaming against AWS Bedrock.

NarwhalLLM SDK

Custom SDK Built from scratch. Not a wrapper around LangChain, Strands, or any other framework.

Why a custom SDK?

Existing SDKs (Strands, LangChain) manipulate the message list behind the scenes — injecting system messages, rewriting tool results, merging content in ways that break Bedrock's strict alternating-role requirement. NarwhalLLM gives you explicit control over every message sent to the model. What you build is exactly what gets sent.

Module	Purpose
`sdk.py`	NarwhalLLM — main loop: stream → parse → execute tools → repeat. Model call limits, usage tracking, cache points.
`conversation.py`	ConversationManager — message history with auto-merge (eliminates consecutive-role bugs), two-phase context compaction, content replacement.
`tools.py`	ToolRouter — contract-driven tools with operational metadata. Parallel batch execution with sibling abort on failure.
`agents.py`	AgentRunner — scoped subagent spawning with worker contracts, recursive delegation prevention, per-agent usage tracking.
`streaming.py`	Bedrock ConverseStream handler with thread-to-queue pattern + heartbeat keepalive.
`providers/`	BedrockProvider with cache points, exponential retry, throttle handling.

Tool System

Unique Every tool is a typed contract with operational metadata — not just a function with a docstring.

ToolDefinition(
    name="read_file",
    description="Read a file from the filesystem",
    input_schema={...},
    execute=read_file_fn,
    is_read_only=True,        # Safe to run without approval
    is_concurrency_safe=True,  # Can run in parallel with other tools
    is_destructive=False,      # Won't modify state
    interrupt_behavior="cancel" # Safe to abort mid-execution
)

The tool router uses these contracts for:

Parallel execution — concurrency-safe tools run simultaneously with sibling abort on failure
Tiered permissions — read-only tools auto-approve, destructive tools require user confirmation
Adaptive Tool Routing (ATR) — a lightweight side-model (Haiku) selects the most relevant tools per turn, keeping the tool list under 15 even when 50+ are available

Agent System

Unique Four orchestration patterns, all with hard safety boundaries.

Patterns

Pattern	How it works	Use case
Supervisor	Main agent delegates subtasks to scoped workers	Complex multi-step tasks
Pipeline	Sequential chain — agent A output → agent B input	Research → summarize → write
Swarm	Parallel independent agents with shared cancellation	Search Slack + Jira + email simultaneously
Context Brief	Domain agents (Slack/Jira/Email/Confluence) → synthesizer	"What happened this week?"

Safety Properties

Subagents get scoped tool sets (allowlist/denylist) — a Jira agent can't run shell commands
Spawning tools (spawn_agent, context_brief) are stripped at runtime — no recursive delegation
Hard model call limit per subagent (max_turns × 3)
Worker contracts enforce structured output format and prevent drift
Per-subagent token usage tracking

Context Management

Unique Two-phase compaction keeps conversations running indefinitely without losing critical context.

Phase 1: Micro-compaction

When a tool result exceeds a threshold, it's replaced with a compressed summary. The original content is stored in a side-channel and can be re-expanded if the model needs it. This happens transparently — the model sees a summary, but the full data is one tool call away.

Phase 2: Context Collapse

When the conversation approaches the context window limit, older turns are collapsed into a structured summary. The collapse preserves: key decisions, file paths mentioned, tool results that were acted on, and any user preferences expressed. Collapsed content is never silently dropped — it's summarized with attribution.

Self-Designing Agent

Unique Narwhal observes its own behavior and proposes improvements — with your approval.

Layer	What it does
Observation	Tracks tool usage patterns, friction events (retries, errors, user corrections), and capability gaps (questions it couldn't answer).
Learning	Converts observations into confidence-scored learnings: "When user says 'deploy', they mean run `./scripts/deploy-release.sh`" (confidence: 0.85).
Evolution	Proposes concrete changes: new skills, auto-approval rules, prompt adjustments. All changes require explicit user approval before taking effect.

Memory System

Persistent memory stored as editable Markdown files at ~/.orcha/memory/:

Daily logs — YYYY-MM-DD.md — what you worked on, decisions made, context established
Global memory — NARWHAL.md — your projects, preferences, team context, coding style
Skills — skills/*.md — reusable instruction sets activated by keyword triggers

A lightweight side-model (Haiku) routes each query to determine which memory files are relevant, injecting only what's needed — not the entire memory corpus.

NarwhalCloud

A separate service — an AI-native wiki and project management platform that replaces Confluence and Jira for structured product documentation.

Desktop App ──► REST API ──► NarwhalCloud (EC2) │ ├── Requirements DB (1,399 URF requirements) ├── Projects, Milestones, Items ├── Documents (wiki pages, specs) ├── Full-text search (FTS5) └── Activity log + audit trail

The desktop app is a client. NarwhalCloud is the authority. One SQLite database, one source of truth. AI agents query and propose changes through typed MCP tools — they never generate markup or touch the database directly.

Data Flow

User types message
  → WebSocket to bridge.py
    → ConversationManager adds user message
      → ATR selects relevant tools (Haiku side-model)
        → NarwhalLLM.run() streams to Bedrock
          → Claude responds with text + tool_use blocks
            → ToolRouter executes tools (parallel if safe)
              → Results added to conversation
                → Loop until Claude responds with text only
                  → Stream text tokens back via WebSocket
                    → React renders in real-time

Tech Stack

Layer	Technology
Desktop shell	Electron 33 + electron-vite
Frontend	React 19, TypeScript, Vite
Backend	Python 3.11, FastAPI, WebSocket
LLM	AWS Bedrock (Claude Sonnet 4)
Auth	Cognito + Amazon Federate (Midway)
Storage	SQLite (sessions, NarwhalCloud), Markdown (memory, skills)
Tool protocol	MCP (Model Context Protocol)
Hosting	CloudFront + S3 (desktop releases), EC2 (NarwhalCloud)