Architecture¶
Overwatch inverts the typical "LLM-as-orchestrator" pattern. Instead of stuffing engagement state into a prompt, the orchestrator is a persistent MCP server that the LLM calls into.
System Diagram¶
Data Flow Example¶
Here's a concrete walkthrough of how data flows through the system during a typical engagement step:
Every step is traceable: action_id links validate_action → log_action_event → parse_output → report_finding. The activity log records the causal chain with tiered truncation that preserves milestone and causal-linkage events (validations, parse results, warnings, session states, errors) while trimming ephemeral events to stay within budget. Full evidence payloads are stored durably in the evidence store and referenced by evidence_id.
Design Decisions¶
Graph, Not Database¶
Engagements are directed property graphs — hosts, services, credentials, and the relationships between them. The graph structure means "credential X is valid on service Y which runs on host Z" is a traversable path, not three rows in a table.
The graph is powered by graphology, a robust JavaScript graph library, with shortest-path analysis via graphology-shortest-path and community detection via graphology-communities-louvain.
Community detection runs Louvain modularity optimization on an undirected projection of the graph. Each node gets a community_id attribute, materialized lazily and cached until the next topology change. Communities feed two consumers:
- Frontier — each
FrontierItemcarriescommunity_idandcommunity_unexplored_count, letting the LLM reason about cluster coverage - Dashboard — convex hull overlays color-code communities in the graph visualization
MCP Server, Not a Prompt¶
The orchestrator survives context compaction by design — it's not in the context window. After compaction, get_state() reconstructs a complete briefing from the graph. Zero information loss.
Two transports are supported:
- stdio — the default, using the Model Context Protocol over standard I/O. This is how Claude Code connects.
- HTTP/SSE — streamable HTTP transport for remote deployment, web-based consumers, and multiple simultaneous clients. Enable with
OVERWATCH_TRANSPORT=httpor the--httpCLI flag.
The core app bootstrap (src/app.ts) is transport-neutral — both transports share the same GraphEngine, skills, and services. Each HTTP session gets its own McpServer instance (SDK limitation: one connect() per server) but all sessions share the underlying graph.
Hybrid Scoring¶
The deterministic layer handles hard constraints:
- Scope enforcement — targets outside CIDRs/domains are rejected
- Deduplication — already-tested edges don't re-enter the frontier
- OPSEC vetoes — techniques exceeding the noise ceiling are filtered
- Dead host pruning — unreachable hosts are deprioritized
The LLM handles nuanced reasoning:
- Attack chain spotting — connecting discoveries across multiple hops
- Sequencing — determining what should happen before what
- Risk assessment — weighing reward against defensive posture
- Creative path discovery — finding non-obvious routes through the graph
Inference Rules¶
When findings are reported, deterministic rules fire automatically to generate hypothesis edges. Fifty-five built-in rules span AD, Linux privilege escalation, web application, MSSQL, and cloud domains:
| Domain | Rules | Examples |
|---|---|---|
| AD & Service | 21 | Kerberos → Domain, SMB Relay, Credential Fanout, ADCS ESC1–ESC8+, Delegation, Roasting, LAPS/gMSA, RBCD, DACL escalation, Shadow Credentials, GPO abuse, DCSync |
| ADCS | 14 | ESC1–ESC8, ESC9/10/11, ESC13, EDITF_ATTRIBUTESUBJECTALTNAME2 |
| Linux Privesc | 7 | SUID root, SSH key reuse, Docker escape, NFS no_root_squash, sudo NOPASSWD, dangerous capabilities, writable cron/systemd |
| Web | 8 | Webapp login spray, web login form, token→webapp auth, auth bypass escalation, default CMS creds, IMDSv1 SSRF |
| MSSQL | 2 | Linked server → REACHABLE, xp_cmdshell → code execution |
| Cloud | 3 | Overprivileged policy, public bucket, cross-account role |
Many rules use edge-triggered inference — they require a matching inbound edge (requires_edge field) in addition to the node property match. When a new or updated edge arrives, inference also re-evaluates its endpoints. Edge-triggered rules span AD (LAPS, gMSA, RBCD, DACL escalation, Shadow Credentials, GPO abuse, DCSync), cloud (cross-account role), and MSSQL (linked server + domain).
See Graph Model — Inference Rules for the full rule reference with triggers and productions. Custom rules can be added at runtime via suggest_inference_rule. See Concepts for how the rule lifecycle works.
Full Graph Access¶
The LLM isn't restricted to scored frontier items. query_graph gives unrestricted access to the entire graph for creative path discovery. find_paths provides shortest-path analysis between any nodes or toward objectives.
Component Overview¶
Core¶
| Component | File | Purpose |
|---|---|---|
| Entrypoint | src/index.ts |
Config loading, server init, tool registration |
| Config | src/config.ts |
Engagement config parsing and validation |
| Types | src/types.ts |
Shared types + Zod schemas |
Services¶
| Component | File | Purpose |
|---|---|---|
| Graph Engine | src/services/graph-engine.ts |
Core graph operations, state coordination |
| Engine Context | src/services/engine-context.ts |
Mutable state container, update callbacks |
| Frontier | src/services/frontier.ts |
Frontier item generation and filtering |
| Inference Engine | src/services/inference-engine.ts |
Rule matching and hypothesis edge generation |
| Path Analyzer | src/services/path-analyzer.ts |
Shortest-path and objective reachability |
| Identity Resolution | src/services/identity-resolution.ts |
Canonical ID generation, marker matching |
| Identity Reconciliation | src/services/identity-reconciliation.ts |
Alias node merging, edge retargeting |
| Graph Schema | src/services/graph-schema.ts |
Node/edge type validation |
| Graph Health | src/services/graph-health.ts |
Integrity checks and diagnostics |
| Finding Validation | src/services/finding-validation.ts |
Input validation and normalization |
| State Persistence | src/services/state-persistence.ts |
Atomic write-rename with snapshot rotation; replays the Mutation Journal on load for engagements with a nonce |
| Activity Chain | src/services/activity-chain.ts |
Tamper-evident SHA-256 chain over agent/system events; signed checkpoints for O(events_since_checkpoint) verification (default-on for new engagements) |
| Mutation Journal | src/services/mutation-journal.ts |
Write-ahead log of graph mutations; replay on load + compaction after snapshot. Gated on engagement_nonce |
| Deterministic ID | src/services/deterministic-id.ts |
sha256-derived action and event IDs for engagements with engagement_nonce; uuidv4 fallback for legacy |
| Frontier Leases | src/services/frontier-leases.ts |
TTL leases on frontier items so two agents can't race on the same target |
| Agent Watchdog | src/services/agent-watchdog.ts |
Periodic reaper for stale heartbeats and expired frontier leases |
| Decision Log | src/services/decision-log.ts |
Derived view: per-action timeline (frontier_emitted → ... → completed) over the activity log |
| Introspection | src/services/introspection.ts |
"Why did the agent do X?" — frontier item, log_thought chain, alternatives, validation, outcome for an action_id |
| Timeline | src/services/timeline.ts |
Per-node and per-edge "what was true at time T" derivation over graph + activity log |
| Golden Replay | src/services/golden-replay.ts |
Tape-driven byte-identical replay harness; canonical graph hash for regression detection |
| Sub-agent IPC | src/services/subagent-ipc.ts, src/services/subagent-process-runner.ts |
Typed JSON-over-stdio contract + parent-side runner for the optional subagent_isolation: 'process' mode (default 'in_process') |
| Skill Index | src/services/skill-index.ts |
TF-IDF search over skill library |
| Output Parsers | src/services/parsers/ |
21 parsers / 36 aliases: nmap, nxc, certipy, secretsdump, kerbrute, hashcat, responder, ldapsearch, enum4linux, rubeus, web dir enum, linpeas/linenum, nuclei, nikto, testssl/sslscan, pacu, prowler, burp, zap, sqlmap, wpscan |
| Parser Utils | src/services/parser-utils.ts |
Shared parsing helpers and canonical ID generation |
| Credential Utils | src/services/credential-utils.ts |
Credential normalization, lifecycle, and domain inference |
| Provenance Utils | src/services/provenance-utils.ts |
Source attribution tracking |
| BloodHound Ingest | src/services/bloodhound-ingest.ts |
SharpHound v4/v5 (CE) JSON → graph |
| AzureHound Ingest | src/services/azurehound-ingest.ts |
AzureHound / ROADtools JSON → graph |
| Community Detection | src/services/community-detection.ts |
Louvain modularity for graph clustering |
| Dashboard Server | src/services/dashboard-server.ts |
HTTP + WebSocket for live visualization |
| Delta Accumulator | src/services/delta-accumulator.ts |
Debounced graph change tracking for broadcasts |
| Cold Store | src/services/cold-store.ts |
Promotion-only compaction for large network sweeps |
| Agent Manager | src/services/agent-manager.ts |
Sub-agent task lifecycle |
| Retrospective | src/services/retrospective.ts |
Post-engagement analysis and RLVR traces |
| CIDR | src/services/cidr.ts |
CIDR parsing, expansion, and scope matching |
| Tool Check | src/services/tool-check.ts |
Offensive tool availability detection |
| Process Tracker | src/services/process-tracker.ts |
PID tracking for long-running scans |
| Lab Preflight | src/services/lab-preflight.ts |
Lab readiness validation |
| Session Manager | src/services/session-manager.ts |
Persistent interactive sessions, RingBuffer, ownership enforcement. Session close (operator, process exit, or shutdown) downgrades HAS_SESSION edges to session_live=false. |
| Session Adapters | src/services/session-adapters.ts |
LocalPty (node-pty), SSH, and Socket transport adapters |
| Prompt Generator | src/services/prompt-generator.ts |
Dynamic system prompt generation for primary and sub-agent roles |
| Report Generator | src/services/report-generator.ts |
Per-finding sections, evidence chains, attack narrative, auto-remediation |
| Report HTML | src/services/report-html.ts |
Self-contained HTML report renderer with themes and print CSS |
| Campaign Planner | src/services/campaign-planner.ts |
Campaign assembly, progress tracking, abort conditions |
| Chain Scorer | src/services/chain-scorer.ts |
Multi-hop credential chain scoring |
| OPSEC Tracker | src/services/opsec-tracker.ts |
Dynamic noise budget tracking per host/domain/global |
| Pending Action Queue | src/services/pending-action-queue.ts |
Operator approval gates for actions |
| Evidence Store | src/services/evidence-store.ts |
Durable evidence blob storage with action/finding linkage. Records carry a content_hash (sha256) so identical content from two runs deduplicates and lookups accept either evidence_id (UUID) or content_hash. Streaming sinks finalize the hash on close. |
| Finding Ingestion | src/services/finding-ingestion.ts |
Finding validation pipeline and graph mutation |
| Imperative Inference | src/services/imperative-inference.ts |
Imperative (code-driven) inference rule execution |
| Scope Manager | src/services/scope-manager.ts |
Engagement scope governance and validation |
| Graph Query | src/services/graph-query.ts |
Structured graph queries with filtering |
| Objective Manager | src/services/objective-manager.ts |
Objective CRUD, achievement evaluation, phase tracking |
| Session Tracker | src/services/session-tracker.ts |
HAS_SESSION edge lifecycle, frontier marking, startup reconciliation |
| Config Manager | src/services/config-manager.ts |
Config seeding, update validation, scope/opsec merging |
| Tool Telemetry | src/services/tool-telemetry.ts |
Runtime tool call counting, timing, sequence analysis |
| Engine Context | src/services/engine-context.ts |
Service container and dependency wiring |
Tools¶
| Module | File | Tools |
|---|---|---|
| State | src/tools/state.ts |
get_state, run_lab_preflight, run_graph_health, recompute_objectives, get_history, export_graph |
| Scoring | src/tools/scoring.ts |
next_task, validate_action |
| Findings | src/tools/findings.ts |
report_finding, get_evidence |
| Exploration | src/tools/exploration.ts |
query_graph, find_paths |
| Agents | src/tools/agents.ts |
register_agent, dispatch_agents, get_agent_context, update_agent, dispatch_subnet_agents, dispatch_campaign_agents, manage_campaign, agent_heartbeat |
| Decision Log | src/tools/decision-log.ts |
get_decision_log |
| Introspection | src/tools/introspection.ts |
explain_action |
| Timeline | src/tools/timeline.ts |
get_timeline |
| Skills | src/tools/skills.ts |
get_skill |
| Logging | src/tools/logging.ts |
log_action_event |
| Parse Output | src/tools/parse-output.ts |
parse_output |
| Inference | src/tools/inference.ts |
suggest_inference_rule |
| BloodHound | src/tools/bloodhound.ts |
ingest_bloodhound |
| Tool Check | src/tools/toolcheck.ts |
check_tools |
| Processes | src/tools/processes.ts |
track_process, check_processes |
| Remediation | src/tools/remediation.ts |
correct_graph |
| Retrospective | src/tools/retrospective.ts |
run_retrospective |
| Sessions | src/tools/sessions.ts |
open_session, write_session, read_session, send_to_session, list_sessions, update_session, resize_session, signal_session, close_session |
| Scope | src/tools/scope.ts |
update_scope |
| Instructions | src/tools/instructions.ts |
get_system_prompt |
| Reporting | src/tools/reporting.ts |
generate_report |
| AzureHound | src/tools/azurehound.ts |
ingest_azurehound |
Dashboard¶
The dashboard is a React SPA built with Vite, served from src/dashboard-next/
(build output: dist/dashboard-next/). It exposes panels for engagements,
campaigns, agents, sessions (xterm-based terminal multiplexer), pending
actions, frontier, activity, evidence, settings, telemetry, and a sigma.js
graph explorer with attack-path overlay, focus presets, community hulls,
edit mode, and minimap.
| Area | Source |
|---|---|
| Entry shell | src/dashboard-next/src/main.tsx, App.tsx |
| Layout | src/dashboard-next/src/components/layout/ (Toolbar, Sidebar, OperatorLayout, TapeToggle) |
| Panels | src/dashboard-next/src/components/panels/ |
| Graph explorer | src/dashboard-next/src/components/graph/ (sigma.js + hooks) |
| State | src/dashboard-next/src/stores/ (Zustand) |
| API client | src/dashboard-next/src/lib/api.ts |
| WebSocket | src/dashboard-next/src/providers/ws-provider.tsx |
State Persistence¶
Graph state is persisted to state-<engagement-id>.json after every finding using atomic write-rename:
1. Serialize graph + metadata to JSON
2. Write to temporary file (state-<id>.json.tmp)
3. Atomic rename over the real file
4. Previous version moved to snapshot rotation
Features:
- Snapshot rotation — keeps recent snapshots for rollback
- Crash recovery — incomplete writes never corrupt state (temp file is discarded)
- Resume anywhere — restart Claude Code, restart the server, come back days later
- Post-engagement analysis — persisted state feeds retrospective analysis
Mutation Journal (Write-Ahead Log)¶
Engagements created with an engagement_nonce (the deterministic-ID family, see Foundations) layer a write-ahead log on top of the snapshot:
- Every graph-affecting mutation (
add_node,add_edge,merge_node_attrs,drop_edge) appends aMutationEntry { seq, ts, type, payload }to<state-file>.journal.jsonlandfsyncs before the in-memory mutation is applied. - On load, the engine reads the snapshot, then replays journal entries with
seq > journalSnapshotSeq. A crash between journal append and the next snapshot rotation is recoverable. - Snapshot rotation truncates the journal up to the snapshot's seq. Crash mid-write leaves a partial line; the reader stops at it without poisoning subsequent replays.
Legacy engagements (no nonce) keep the debounced-snapshot-only path with no behavior change.
Foundations: Trust, Audit, Replay¶
Engagements created after the foundations work shipped get a coordinated set of guarantees. All gated on engagement_nonce populated (the strict-migration boundary for new engagements; legacy engagements keep their original UUID-based identity model).
| Guarantee | What it gives you | Where it lives |
|---|---|---|
| Hash-chained activity log | Tamper-evident SHA-256 chain over agent/system events. Default-on for new engagements. Signed checkpoints emitted every 500 events / 30 minutes so verifiers don't have to re-walk genesis. | src/services/activity-chain.ts |
| Content-addressed evidence | Every evidence row carries content_hash = sha256(content). Two runs with identical output deduplicate; tampering changes the address. Lookups accept either UUID or hash. |
src/services/evidence-store.ts |
| Deterministic action / event IDs | act_<16hex> / evt_<16hex> derived from sha256(nonce \| agent_id \| timestamp \| command_signature \| sequence). Same inputs → same IDs. |
src/services/deterministic-id.ts |
| Caller-provided clocks | engine.withClock(now, fn) pins time across a sequence of mutations so timestamps don't leak wall-clock noise into golden-master fixtures. |
src/services/engine-context.ts (withClock, nowIso) |
| Write-ahead log | Crash-safe state recovery; mutation lost only when the journal append itself fails. | src/services/mutation-journal.ts |
| Golden-master replay | Replaying a tape against a fresh engine produces a byte-identical state hash. Real-behavior change requires explicit re-recording. | src/services/golden-replay.ts, fixtures in src/__tests__/golden-master/ |
| Frontier leases + heartbeat | Two agents can't claim the same frontier item; silent sub-agents get reaped. | src/services/frontier-leases.ts, src/services/agent-watchdog.ts |
For the threat model context — what these mitigate, what's still residual — see Threat Model.
flowchart LR
subgraph New["new engagement (engagement_nonce populated)"]
ID[deterministic-id<br/>act_/evt_ from sha256]
WAL[mutation-journal<br/>journal.jsonl]
CHAIN[activity-chain<br/>+ signed checkpoints]
EV[evidence-store<br/>content_hash sha256]
LEASE[frontier-leases<br/>TTL on items]
WATCH[agent-watchdog<br/>heartbeat reaper]
REPLAY[golden-replay<br/>tape harness]
end
Action[action] --> ID
ID --> WAL
WAL --> Persist[snapshot.json]
Action --> CHAIN
Action --> EV
Register[register_agent] --> LEASE
Heartbeat[agent_heartbeat] --> WATCH
WATCH -. interrupts stale .-> LEASE
Replay[replayTape] --> ID
Replay --> Persist
Replay --> Hash[graph hash]
classDef new fill:#10b981,stroke:#047857,color:#fff
class ID,WAL,CHAIN,EV,LEASE,WATCH,REPLAY new
The whole substrate snaps together when the engagement carries a nonce: deterministic IDs become the keys for the journal, the journal feeds replay, replay validates the hash chain, the chain references content-addressed evidence. Each layer is independent at the implementation level but compose into a single audit story.
Session + Transport Architecture¶
Two MCP transports (stdio default, HTTP/SSE for remote). Persistent interactive sessions with 3 adapters (LocalPty, SSH, Socket), 128KB ring buffers, cursor-based I/O, and TTY quality tracking.
Broadcast Pipeline¶
When the graph changes, updates flow to the dashboard in real time. The dashboard also polls /api/state every 5 seconds as a fallback when WebSocket is disconnected.