Architecture¶

Overwatch inverts the typical "LLM-as-orchestrator" pattern. Instead of stuffing engagement state into a prompt, the orchestrator is a persistent MCP server that the LLM calls into.

System Diagram¶

System Architecture

Data Flow Example¶

Here's a concrete walkthrough of how data flows through the system during a typical engagement step:

Data Flow Lifecycle

Every step is traceable: action_id links validate_action → log_action_event → parse_output → report_finding. The activity log records the causal chain with tiered truncation that preserves milestone and causal-linkage events (validations, parse results, warnings, session states, errors) while trimming ephemeral events to stay within budget. Full evidence payloads are stored durably in the evidence store and referenced by evidence_id.

Design Decisions¶

Graph, Not Database¶

Engagements are directed property graphs — hosts, services, credentials, and the relationships between them. The graph structure means "credential X is valid on service Y which runs on host Z" is a traversable path, not three rows in a table.

The graph is powered by graphology, a robust JavaScript graph library, with shortest-path analysis via graphology-shortest-path and community detection via graphology-communities-louvain.

Community detection runs Louvain modularity optimization on an undirected projection of the graph. Each node gets a community_id attribute, materialized lazily and cached until the next topology change. Communities feed two consumers:

Frontier — each FrontierItem carries community_id and community_unexplored_count, letting the LLM reason about cluster coverage
Dashboard — convex hull overlays color-code communities in the graph visualization

MCP Server, Not a Prompt¶

The orchestrator survives context compaction by design — it's not in the context window. After compaction, get_state() reconstructs a complete briefing from the graph. Zero information loss.

Two transports are supported:

stdio — the default, using the Model Context Protocol over standard I/O. This is how Claude Code connects.
HTTP/SSE — streamable HTTP transport for remote deployment, web-based consumers, and multiple simultaneous clients. Enable with OVERWATCH_TRANSPORT=http or the --http CLI flag.

The core app bootstrap (src/app.ts) is transport-neutral — both transports share the same GraphEngine, skills, and services. Each HTTP session gets its own McpServer instance (SDK limitation: one connect() per server) but all sessions share the underlying graph.

Hybrid Scoring¶

The deterministic layer handles hard constraints:

Scope enforcement — targets outside CIDRs/domains are rejected
Deduplication — already-tested edges don't re-enter the frontier
OPSEC vetoes — techniques exceeding the noise ceiling are filtered
Dead host pruning — unreachable hosts are deprioritized

The LLM handles nuanced reasoning:

Attack chain spotting — connecting discoveries across multiple hops
Sequencing — determining what should happen before what
Risk assessment — weighing reward against defensive posture
Creative path discovery — finding non-obvious routes through the graph

Inference Rules¶

When findings are reported, deterministic rules fire automatically to generate hypothesis edges. Fifty-five built-in rules span AD, Linux privilege escalation, web application, MSSQL, and cloud domains:

Domain	Rules	Examples
AD & Service	21	Kerberos → Domain, SMB Relay, Credential Fanout, ADCS ESC1–ESC8+, Delegation, Roasting, LAPS/gMSA, RBCD, DACL escalation, Shadow Credentials, GPO abuse, DCSync
ADCS	14	ESC1–ESC8, ESC9/10/11, ESC13, EDITF_ATTRIBUTESUBJECTALTNAME2
Linux Privesc	7	SUID root, SSH key reuse, Docker escape, NFS no_root_squash, sudo NOPASSWD, dangerous capabilities, writable cron/systemd
Web	8	Webapp login spray, web login form, token→webapp auth, auth bypass escalation, default CMS creds, IMDSv1 SSRF
MSSQL	2	Linked server → REACHABLE, xp_cmdshell → code execution
Cloud	3	Overprivileged policy, public bucket, cross-account role

Many rules use edge-triggered inference — they require a matching inbound edge (requires_edge field) in addition to the node property match. When a new or updated edge arrives, inference also re-evaluates its endpoints. Edge-triggered rules span AD (LAPS, gMSA, RBCD, DACL escalation, Shadow Credentials, GPO abuse, DCSync), cloud (cross-account role), and MSSQL (linked server + domain).

See Graph Model — Inference Rules for the full rule reference with triggers and productions. Custom rules can be added at runtime via suggest_inference_rule. See Concepts for how the rule lifecycle works.

Full Graph Access¶

The LLM isn't restricted to scored frontier items. query_graph gives unrestricted access to the entire graph for creative path discovery. find_paths provides shortest-path analysis between any nodes or toward objectives.

Component Overview¶

Service Decomposition

Core¶

Component	File	Purpose
Entrypoint	`src/index.ts`	Config loading, server init, tool registration
Config	`src/config.ts`	Engagement config parsing and validation
Types	`src/types.ts`	Shared types + Zod schemas

Services¶

Component	File	Purpose
Graph Engine	`src/services/graph-engine.ts`	Core graph operations, state coordination
Engine Context	`src/services/engine-context.ts`	Mutable state container, update callbacks
Frontier	`src/services/frontier.ts`	Frontier item generation and filtering
Inference Engine	`src/services/inference-engine.ts`	Rule matching and hypothesis edge generation
Path Analyzer	`src/services/path-analyzer.ts`	Shortest-path and objective reachability
Identity Resolution	`src/services/identity-resolution.ts`	Canonical ID generation, marker matching
Identity Reconciliation	`src/services/identity-reconciliation.ts`	Alias node merging, edge retargeting
Graph Schema	`src/services/graph-schema.ts`	Node/edge type validation
Graph Health	`src/services/graph-health.ts`	Integrity checks and diagnostics
Finding Validation	`src/services/finding-validation.ts`	Input validation and normalization
State Persistence	`src/services/state-persistence.ts`	Atomic write-rename with snapshot rotation; replays the Mutation Journal on load for engagements with a nonce
Activity Chain	`src/services/activity-chain.ts`	Tamper-evident SHA-256 chain over agent/system events; signed checkpoints for O(events_since_checkpoint) verification (default-on for new engagements)
Mutation Journal	`src/services/mutation-journal.ts`	Write-ahead log of graph mutations; replay on load + compaction after snapshot. Gated on `engagement_nonce`
Deterministic ID	`src/services/deterministic-id.ts`	sha256-derived action and event IDs for engagements with `engagement_nonce`; `uuidv4` fallback for legacy
Frontier Leases	`src/services/frontier-leases.ts`	TTL leases on frontier items so two agents can't race on the same target
Agent Watchdog	`src/services/agent-watchdog.ts`	Periodic reaper for stale heartbeats and expired frontier leases
Decision Log	`src/services/decision-log.ts`	Derived view: per-action timeline (frontier_emitted → ... → completed) over the activity log
Introspection	`src/services/introspection.ts`	"Why did the agent do X?" — frontier item, log_thought chain, alternatives, validation, outcome for an action_id
Timeline	`src/services/timeline.ts`	Per-node and per-edge "what was true at time T" derivation over graph + activity log
Golden Replay	`src/services/golden-replay.ts`	Tape-driven byte-identical replay harness; canonical graph hash for regression detection
Sub-agent IPC	`src/services/subagent-ipc.ts`, `src/services/subagent-process-runner.ts`	Typed JSON-over-stdio contract + parent-side runner for the optional `subagent_isolation: 'process'` mode (default `'in_process'`)
Skill Index	`src/services/skill-index.ts`	TF-IDF search over skill library
Output Parsers	`src/services/parsers/`	21 parsers / 36 aliases: nmap, nxc, certipy, secretsdump, kerbrute, hashcat, responder, ldapsearch, enum4linux, rubeus, web dir enum, linpeas/linenum, nuclei, nikto, testssl/sslscan, pacu, prowler, burp, zap, sqlmap, wpscan
Parser Utils	`src/services/parser-utils.ts`	Shared parsing helpers and canonical ID generation
Credential Utils	`src/services/credential-utils.ts`	Credential normalization, lifecycle, and domain inference
Provenance Utils	`src/services/provenance-utils.ts`	Source attribution tracking
BloodHound Ingest	`src/services/bloodhound-ingest.ts`	SharpHound v4/v5 (CE) JSON → graph
AzureHound Ingest	`src/services/azurehound-ingest.ts`	AzureHound / ROADtools JSON → graph
Community Detection	`src/services/community-detection.ts`	Louvain modularity for graph clustering
Dashboard Server	`src/services/dashboard-server.ts`	HTTP + WebSocket for live visualization
Delta Accumulator	`src/services/delta-accumulator.ts`	Debounced graph change tracking for broadcasts
Cold Store	`src/services/cold-store.ts`	Promotion-only compaction for large network sweeps
Agent Manager	`src/services/agent-manager.ts`	Sub-agent task lifecycle
Retrospective	`src/services/retrospective.ts`	Post-engagement analysis and RLVR traces
CIDR	`src/services/cidr.ts`	CIDR parsing, expansion, and scope matching
Tool Check	`src/services/tool-check.ts`	Offensive tool availability detection
Process Tracker	`src/services/process-tracker.ts`	PID tracking for long-running scans
Lab Preflight	`src/services/lab-preflight.ts`	Lab readiness validation
Session Manager	`src/services/session-manager.ts`	Persistent interactive sessions, RingBuffer, ownership enforcement. Session close (operator, process exit, or shutdown) downgrades `HAS_SESSION` edges to `session_live=false`.
Session Adapters	`src/services/session-adapters.ts`	LocalPty (node-pty), SSH, and Socket transport adapters
Prompt Generator	`src/services/prompt-generator.ts`	Dynamic system prompt generation for primary and sub-agent roles
Report Generator	`src/services/report-generator.ts`	Per-finding sections, evidence chains, attack narrative, auto-remediation
Report HTML	`src/services/report-html.ts`	Self-contained HTML report renderer with themes and print CSS
Campaign Planner	`src/services/campaign-planner.ts`	Campaign assembly, progress tracking, abort conditions
Chain Scorer	`src/services/chain-scorer.ts`	Multi-hop credential chain scoring
OPSEC Tracker	`src/services/opsec-tracker.ts`	Dynamic noise budget tracking per host/domain/global
Pending Action Queue	`src/services/pending-action-queue.ts`	Operator approval gates for actions
Evidence Store	`src/services/evidence-store.ts`	Durable evidence blob storage with action/finding linkage. Records carry a `content_hash` (sha256) so identical content from two runs deduplicates and lookups accept either evidence_id (UUID) or content_hash. Streaming sinks finalize the hash on close.
Finding Ingestion	`src/services/finding-ingestion.ts`	Finding validation pipeline and graph mutation
Imperative Inference	`src/services/imperative-inference.ts`	Imperative (code-driven) inference rule execution
Scope Manager	`src/services/scope-manager.ts`	Engagement scope governance and validation
Graph Query	`src/services/graph-query.ts`	Structured graph queries with filtering
Objective Manager	`src/services/objective-manager.ts`	Objective CRUD, achievement evaluation, phase tracking
Session Tracker	`src/services/session-tracker.ts`	HAS_SESSION edge lifecycle, frontier marking, startup reconciliation
Config Manager	`src/services/config-manager.ts`	Config seeding, update validation, scope/opsec merging
Tool Telemetry	`src/services/tool-telemetry.ts`	Runtime tool call counting, timing, sequence analysis
Engine Context	`src/services/engine-context.ts`	Service container and dependency wiring

Tools¶

Module	File	Tools
State	`src/tools/state.ts`	`get_state`, `run_lab_preflight`, `run_graph_health`, `recompute_objectives`, `get_history`, `export_graph`
Scoring	`src/tools/scoring.ts`	`next_task`, `validate_action`
Findings	`src/tools/findings.ts`	`report_finding`, `get_evidence`
Exploration	`src/tools/exploration.ts`	`query_graph`, `find_paths`
Agents	`src/tools/agents.ts`	`register_agent`, `dispatch_agents`, `get_agent_context`, `update_agent`, `dispatch_subnet_agents`, `dispatch_campaign_agents`, `manage_campaign`, `agent_heartbeat`
Decision Log	`src/tools/decision-log.ts`	`get_decision_log`
Introspection	`src/tools/introspection.ts`	`explain_action`
Timeline	`src/tools/timeline.ts`	`get_timeline`
Skills	`src/tools/skills.ts`	`get_skill`
Logging	`src/tools/logging.ts`	`log_action_event`
Parse Output	`src/tools/parse-output.ts`	`parse_output`
Inference	`src/tools/inference.ts`	`suggest_inference_rule`
BloodHound	`src/tools/bloodhound.ts`	`ingest_bloodhound`
Tool Check	`src/tools/toolcheck.ts`	`check_tools`
Processes	`src/tools/processes.ts`	`track_process`, `check_processes`
Remediation	`src/tools/remediation.ts`	`correct_graph`
Retrospective	`src/tools/retrospective.ts`	`run_retrospective`
Sessions	`src/tools/sessions.ts`	`open_session`, `write_session`, `read_session`, `send_to_session`, `list_sessions`, `update_session`, `resize_session`, `signal_session`, `close_session`
Scope	`src/tools/scope.ts`	`update_scope`
Instructions	`src/tools/instructions.ts`	`get_system_prompt`
Reporting	`src/tools/reporting.ts`	`generate_report`
AzureHound	`src/tools/azurehound.ts`	`ingest_azurehound`

Dashboard¶

The dashboard is a React SPA built with Vite, served from src/dashboard-next/ (build output: dist/dashboard-next/). It exposes panels for engagements, campaigns, agents, sessions (xterm-based terminal multiplexer), pending actions, frontier, activity, evidence, settings, telemetry, and a sigma.js graph explorer with attack-path overlay, focus presets, community hulls, edit mode, and minimap.

Area	Source
Entry shell	`src/dashboard-next/src/main.tsx`, `App.tsx`
Layout	`src/dashboard-next/src/components/layout/` (Toolbar, Sidebar, OperatorLayout, TapeToggle)
Panels	`src/dashboard-next/src/components/panels/`
Graph explorer	`src/dashboard-next/src/components/graph/` (sigma.js + hooks)
State	`src/dashboard-next/src/stores/` (Zustand)
API client	`src/dashboard-next/src/lib/api.ts`
WebSocket	`src/dashboard-next/src/providers/ws-provider.tsx`

State Persistence¶

Graph state is persisted to state-<engagement-id>.json after every finding using atomic write-rename:

1. Serialize graph + metadata to JSON
2. Write to temporary file (state-<id>.json.tmp)
3. Atomic rename over the real file
4. Previous version moved to snapshot rotation

Features:

Snapshot rotation — keeps recent snapshots for rollback
Crash recovery — incomplete writes never corrupt state (temp file is discarded)
Resume anywhere — restart Claude Code, restart the server, come back days later
Post-engagement analysis — persisted state feeds retrospective analysis

Mutation Journal (Write-Ahead Log)¶

Engagements created with an engagement_nonce (the deterministic-ID family, see Foundations) layer a write-ahead log on top of the snapshot:

Every graph-affecting mutation (add_node, add_edge, merge_node_attrs, drop_edge) appends a MutationEntry { seq, ts, type, payload } to <state-file>.journal.jsonl and fsyncs before the in-memory mutation is applied.
On load, the engine reads the snapshot, then replays journal entries with seq > journalSnapshotSeq. A crash between journal append and the next snapshot rotation is recoverable.
Snapshot rotation truncates the journal up to the snapshot's seq. Crash mid-write leaves a partial line; the reader stops at it without poisoning subsequent replays.

Legacy engagements (no nonce) keep the debounced-snapshot-only path with no behavior change.

Foundations: Trust, Audit, Replay¶

Engagements created after the foundations work shipped get a coordinated set of guarantees. All gated on engagement_nonce populated (the strict-migration boundary for new engagements; legacy engagements keep their original UUID-based identity model).

Guarantee	What it gives you	Where it lives
Hash-chained activity log	Tamper-evident SHA-256 chain over agent/system events. Default-on for new engagements. Signed checkpoints emitted every 500 events / 30 minutes so verifiers don't have to re-walk genesis.	`src/services/activity-chain.ts`
Content-addressed evidence	Every evidence row carries `content_hash = sha256(content)`. Two runs with identical output deduplicate; tampering changes the address. Lookups accept either UUID or hash.	`src/services/evidence-store.ts`
Deterministic action / event IDs	`act_<16hex>` / `evt_<16hex>` derived from `sha256(nonce \\| agent_id \\| timestamp \\| command_signature \\| sequence)`. Same inputs → same IDs.	`src/services/deterministic-id.ts`
Caller-provided clocks	`engine.withClock(now, fn)` pins time across a sequence of mutations so timestamps don't leak wall-clock noise into golden-master fixtures.	`src/services/engine-context.ts` (`withClock`, `nowIso`)
Write-ahead log	Crash-safe state recovery; mutation lost only when the journal append itself fails.	`src/services/mutation-journal.ts`
Golden-master replay	Replaying a tape against a fresh engine produces a byte-identical state hash. Real-behavior change requires explicit re-recording.	`src/services/golden-replay.ts`, fixtures in `src/__tests__/golden-master/`
Frontier leases + heartbeat	Two agents can't claim the same frontier item; silent sub-agents get reaped.	`src/services/frontier-leases.ts`, `src/services/agent-watchdog.ts`

For the threat model context — what these mitigate, what's still residual — see Threat Model.

flowchart LR
    subgraph New["new engagement (engagement_nonce populated)"]
        ID[deterministic-id<br/>act_/evt_ from sha256]
        WAL[mutation-journal<br/>journal.jsonl]
        CHAIN[activity-chain<br/>+ signed checkpoints]
        EV[evidence-store<br/>content_hash sha256]
        LEASE[frontier-leases<br/>TTL on items]
        WATCH[agent-watchdog<br/>heartbeat reaper]
        REPLAY[golden-replay<br/>tape harness]
    end

    Action[action] --> ID
    ID --> WAL
    WAL --> Persist[snapshot.json]
    Action --> CHAIN
    Action --> EV
    Register[register_agent] --> LEASE
    Heartbeat[agent_heartbeat] --> WATCH
    WATCH -. interrupts stale .-> LEASE

    Replay[replayTape] --> ID
    Replay --> Persist
    Replay --> Hash[graph hash]

    classDef new fill:#10b981,stroke:#047857,color:#fff
    class ID,WAL,CHAIN,EV,LEASE,WATCH,REPLAY new

The whole substrate snaps together when the engagement carries a nonce: deterministic IDs become the keys for the journal, the journal feeds replay, replay validates the hash chain, the chain references content-addressed evidence. Each layer is independent at the implementation level but compose into a single audit story.

Session + Transport Architecture¶

Session Transport

Two MCP transports (stdio default, HTTP/SSE for remote). Persistent interactive sessions with 3 adapters (LocalPty, SSH, Socket), 128KB ring buffers, cursor-based I/O, and TTY quality tracking.

Broadcast Pipeline¶

When the graph changes, updates flow to the dashboard in real time. The dashboard also polls /api/state every 5 seconds as a fallback when WebSocket is disconnected.