Architecture

How Narwhal works under the hood — for engineers who want to understand, extend, or contribute.

System Overview

┌─────────────────┐ WebSocket ┌──────────────────────────┐ Bedrock API ┌──────────┐ │ Electron App │ ◄──────────────► │ Bridge Server │ ◄──────────────► │ Claude │ │ (React + Vite) │ port 7777 │ (FastAPI + NarwhalLLM) │ │ Sonnet │ │ │ │ │ └──────────┘ │ • Chat UI │ │ • NarwhalLLM SDK │ │ • Sidebar │ │ • Tool Router │ MCP Protocol ┌──────────┐ │ • Preview │ │ • Agent Runner │ ◄──────────────► │ MCP │ │ • Workspace │ │ • Memory System │ │ Servers │ │ • Settings │ │ • Self-Evolution │ └──────────┘ └─────────────────┘ └──────────┬───────────────┘ │ REST API ┌──────────┐ └──────────────────────────────────► │ Narwhal │ │ Cloud │ └──────────┘

Narwhal is a native macOS Electron app with a Python backend. The Electron renderer communicates with a local FastAPI server (bridge.py) over WebSocket. The bridge runs the NarwhalLLM SDK — a custom orchestration framework that manages conversations, tools, agents, and streaming against AWS Bedrock.

NarwhalLLM SDK

Custom SDK Built from scratch. Not a wrapper around LangChain, Strands, or any other framework.

Why a custom SDK?

Existing SDKs (Strands, LangChain) manipulate the message list behind the scenes — injecting system messages, rewriting tool results, merging content in ways that break Bedrock's strict alternating-role requirement. NarwhalLLM gives you explicit control over every message sent to the model. What you build is exactly what gets sent.

ModulePurpose
sdk.pyNarwhalLLM — main loop: stream → parse → execute tools → repeat. Model call limits, usage tracking, cache points.
conversation.pyConversationManager — message history with auto-merge (eliminates consecutive-role bugs), two-phase context compaction, content replacement.
tools.pyToolRouter — contract-driven tools with operational metadata. Parallel batch execution with sibling abort on failure.
agents.pyAgentRunner — scoped subagent spawning with worker contracts, recursive delegation prevention, per-agent usage tracking.
streaming.pyBedrock ConverseStream handler with thread-to-queue pattern + heartbeat keepalive.
providers/BedrockProvider with cache points, exponential retry, throttle handling.

Tool System

Unique Every tool is a typed contract with operational metadata — not just a function with a docstring.

ToolDefinition(
    name="read_file",
    description="Read a file from the filesystem",
    input_schema={...},
    execute=read_file_fn,
    is_read_only=True,        # Safe to run without approval
    is_concurrency_safe=True,  # Can run in parallel with other tools
    is_destructive=False,      # Won't modify state
    interrupt_behavior="cancel" # Safe to abort mid-execution
)

The tool router uses these contracts for:

Agent System

Unique Four orchestration patterns, all with hard safety boundaries.

Patterns

PatternHow it worksUse case
SupervisorMain agent delegates subtasks to scoped workersComplex multi-step tasks
PipelineSequential chain — agent A output → agent B inputResearch → summarize → write
SwarmParallel independent agents with shared cancellationSearch Slack + Jira + email simultaneously
Context BriefDomain agents (Slack/Jira/Email/Confluence) → synthesizer"What happened this week?"

Safety Properties

Context Management

Unique Two-phase compaction keeps conversations running indefinitely without losing critical context.

Phase 1: Micro-compaction

When a tool result exceeds a threshold, it's replaced with a compressed summary. The original content is stored in a side-channel and can be re-expanded if the model needs it. This happens transparently — the model sees a summary, but the full data is one tool call away.

Phase 2: Context Collapse

When the conversation approaches the context window limit, older turns are collapsed into a structured summary. The collapse preserves: key decisions, file paths mentioned, tool results that were acted on, and any user preferences expressed. Collapsed content is never silently dropped — it's summarized with attribution.

Self-Designing Agent

Unique Narwhal observes its own behavior and proposes improvements — with your approval.

LayerWhat it does
ObservationTracks tool usage patterns, friction events (retries, errors, user corrections), and capability gaps (questions it couldn't answer).
LearningConverts observations into confidence-scored learnings: "When user says 'deploy', they mean run ./scripts/deploy-release.sh" (confidence: 0.85).
EvolutionProposes concrete changes: new skills, auto-approval rules, prompt adjustments. All changes require explicit user approval before taking effect.

Memory System

Persistent memory stored as editable Markdown files at ~/.orcha/memory/:

A lightweight side-model (Haiku) routes each query to determine which memory files are relevant, injecting only what's needed — not the entire memory corpus.

NarwhalCloud

A separate service — an AI-native wiki and project management platform that replaces Confluence and Jira for structured product documentation.

Desktop App ──► REST API ──► NarwhalCloud (EC2) │ ├── Requirements DB (1,399 URF requirements) ├── Projects, Milestones, Items ├── Documents (wiki pages, specs) ├── Full-text search (FTS5) └── Activity log + audit trail

The desktop app is a client. NarwhalCloud is the authority. One SQLite database, one source of truth. AI agents query and propose changes through typed MCP tools — they never generate markup or touch the database directly.

Data Flow

User types message
  → WebSocket to bridge.py
    → ConversationManager adds user message
      → ATR selects relevant tools (Haiku side-model)
        → NarwhalLLM.run() streams to Bedrock
          → Claude responds with text + tool_use blocks
            → ToolRouter executes tools (parallel if safe)
              → Results added to conversation
                → Loop until Claude responds with text only
                  → Stream text tokens back via WebSocket
                    → React renders in real-time

Tech Stack

LayerTechnology
Desktop shellElectron 33 + electron-vite
FrontendReact 19, TypeScript, Vite
BackendPython 3.11, FastAPI, WebSocket
LLMAWS Bedrock (Claude Sonnet 4)
AuthCognito + Amazon Federate (Midway)
StorageSQLite (sessions, NarwhalCloud), Markdown (memory, skills)
Tool protocolMCP (Model Context Protocol)
HostingCloudFront + S3 (desktop releases), EC2 (NarwhalCloud)