Skip to content

Overwatch — Architecture & Codebase Review

Offensive security engagement orchestrator — MCP server with graph-based state management

Executive Summary

Overwatch is an MCP (Model Context Protocol) server that acts as the persistent state layer and reasoning substrate for LLM-powered penetration testing. Rather than stuffing engagement state into prompts, the LLM calls into a persistent graph engine that tracks every discovery, relationship, and hypothesis. After context compaction, a single get_state() call reconstructs a complete operational briefing with zero information loss.

The server exposes 51 MCP tools covering the full engagement lifecycle — from initial reconnaissance through post-engagement retrospective analysis. A directed property graph (built on graphology) models the attack surface: hosts, services, credentials, users, groups, AD objects, operator infrastructure, and their relationships. An inference engine generates hypothetical edges, a frontier computer prioritizes next actions, and a path analyzer finds shortest routes to objectives.


Architecture

┌──────────────────────────────────────────────────────────────────┐
│                    MCP Orchestrator Server                        │
│                                                                  │
│  ┌──────────────┐  ┌───────────────┐  ┌────────────────────┐    │
│  │ GraphEngine   │  │ Inference     │  │ FrontierComputer   │    │
│  │ (graphology)  │  │ Engine        │  │ (next actions)     │    │
│  └──────┬───────┘  └───────┬───────┘  └────────┬───────────┘    │
│         │                  │                    │                │
│  ┌──────▼──────────────────▼────────────────────▼────────────┐  │
│  │                 EngineContext (shared state)                │  │
│  └──────┬──────────────────┬────────────────────┬────────────┘  │
│         │                  │                    │                │
│  ┌──────▼───────┐  ┌──────▼───────┐  ┌────────▼───────────┐   │
│  │ Path         │  │ Identity     │  │ State              │   │
│  │ Analyzer     │  │ Resolution   │  │ Persistence        │   │
│  └──────────────┘  └──────────────┘  └────────────────────┘   │
│                                                                  │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │              51 MCP Tools (Zod-validated)                  │  │
│  │  state · findings · scoring · exploration · agents ·       │  │
│  │  logging · parsing · bloodhound · azurehound · inference   │  │
│  │  remediation · skills · toolcheck · processes · sessions   │  │
│  │  retrospective · scope · instructions · reporting          │  │
│  └──────────────────────────┬────────────────────────────────┘  │
│                              │                                   │
│  ┌───────────────────────────▼───────────────────────────────┐  │
│  │  Dashboard Server (HTTP + WebSocket, port 8384)            │  │
│  │  sigma.js WebGL graph · real-time delta broadcast          │  │
│  └───────────────────────────────────────────────────────────┘  │
└──────────────────────────────────┬───────────────────────────────┘
                                   │ stdio
              ┌────────────────────▼────────────────────┐
              │       LLM Operator (Claude/Opus)         │
              │    Primary Session + Sub-Agents          │
              └─────────────────────────────────────────┘

Design Principles

  • Graph-as-memory — All engagement state lives in the graph. After context compaction, get_state() reconstructs everything. No information loss across sessions.
  • Deterministic guardrails, LLM reasoning — Scope checks, deduplication, and OPSEC vetoes are enforced deterministically. The LLM handles attack chain reasoning, scoring, and sequencing.
  • Report early, report often — Every report_finding() triggers inference rules → new frontier items → reactive re-planning.
  • Identity resolution — Nodes are canonicalized on ingest. BloodHound SIDs, hostname variants, and credential fingerprints are merged automatically.

Graph Model

Node Types (21)

Type Description
host Network host (IP, hostname, OS)
service Running service (port, protocol, version)
domain Active Directory domain
user Domain or local user account
group AD security group
credential Authentication material (password, hash, ticket, cert)
share Network share (SMB, NFS)
certificate X.509 certificate
ca Certificate authority
cert_template AD CS certificate template
pki_store PKI store (NTAuth, issuance policy)
gpo Group Policy Object
ou Organizational Unit / container
subnet Network subnet
objective Engagement objective (virtual node)
webapp Web application (URL, technology, framework, auth type)
vulnerability Discovered vulnerability (CVE, CVSS, type, exploitability)
cloud_identity Cloud IAM principal (user, role, service account)
cloud_resource Cloud resource (S3 bucket, EC2, Lambda, Azure VM, etc.)
cloud_policy Cloud IAM policy or RBAC role assignment
cloud_network Cloud network construct (VPC, security group, subnet)

Edge Types (63)

Organized by domain:

  • NetworkREACHABLE, RUNS
  • Domain membershipMEMBER_OF, MEMBER_OF_DOMAIN, TRUSTS, SAME_DOMAIN
  • AccessADMIN_TO, HAS_SESSION, CAN_RDPINTO, CAN_PSREMOTE
  • CredentialsVALID_ON, OWNS_CRED, DERIVED_FROM, DUMPED_FROM, POTENTIAL_AUTH, TESTED_CRED, SHARED_CREDENTIAL
  • AD attack pathsCAN_DCSYNC, DELEGATES_TO, CAN_DELEGATE_TO, WRITEABLE_BY, GENERIC_ALL, GENERIC_WRITE, WRITE_OWNER, WRITE_DACL, ADD_MEMBER, FORCE_CHANGE_PASSWORD, ALLOWED_TO_ACT, CAN_READ_LAPS, CAN_READ_GMSA, RBCD_TARGET
  • ADCSCAN_ENROLL, ESC1ESC13, ISSUED_BY, OPERATES_CA
  • RoastingAS_REP_ROASTABLE, KERBEROASTABLE
  • Lateral movementRELAY_TARGET, NULL_SESSION
  • Web applicationHOSTS, AUTHENTICATED_AS, VULNERABLE_TO, EXPLOITS
  • Cloud infrastructureASSUMES_ROLE, HAS_POLICY, POLICY_ALLOWS, EXPOSED_TO, RUNS_ON, MANAGED_BY
  • ObjectivePATH_TO_OBJECTIVE
  • GenericRELATED

All edges carry confidence, discovered_at, discovered_by, and optional inferred flag. Edge endpoints are validated against a schema defining valid (source_type → target_type) combinations.

Inference Rules

Rules fire automatically when nodes are ingested. Each rule has: - Trigger — node type + optional property match + optional requires_edge (for edge-triggered rules) - Selectors — how to find related nodes (35 selector types including trigger_node, parent_host, domain_nodes, domain_users, edge_peers, enrollable_users, session_holders_on_host, ca_for_template, manage_ca_peers, and more) - Produces — edge type + confidence + condition

Example: "Host has SMB service with signing disabled → create RELAY_TARGET edge to domain hosts"

Fifty-three built-in rules span AD & service (21), ADCS (14), Linux privilege escalation (7), web application (6), MSSQL (2), and cloud infrastructure (3). Rules can be added at runtime via suggest_inference_rule and backfilled against existing graph nodes. See Graph Model — Inference Rules for the full reference.


Core Services

GraphEngine (src/services/graph-engine.ts — ~1,415 lines)

Central orchestrator wrapping all submodules. Key capabilities:

Area Methods
Mutations addNode, addEdge, ingestFinding, correctGraph
Inference runInferenceRules, backfillRule, addInferenceRule
Frontier computeFrontier, filterFrontier
Paths findPaths, findPathsToObjective
Validation validateAction (scope + OPSEC)
Queries queryGraph (type/filter/traversal)
State getState, exportGraph, getHealthReport
Persistence persist, loadState, rollback, listSnapshots

EngineContext (src/services/engine-context.ts)

Shared mutable state holder for all submodules: graph instance, config, inference rules, activity log, agent map, tracked processes, path graph cache, and onUpdate callbacks.

StatePersistence (src/services/state-persistence.ts)

Atomic write-rename persistence with snapshot rotation (max 5 snapshots, every 30 seconds). Serializes: graph + activity log + agents + tracked processes. Supports rollback to any snapshot.

InferenceEngine (src/services/inference-engine.ts)

Rule-based edge production. When nodes are ingested, matching rules fire to create hypothetical edges with confidence scores. Supports 15 selector types for relating trigger nodes to targets. Includes edge-triggered rules (requires_edge) for cross-node patterns like LAPS/gMSA readability and RBCD targeting.

FrontierComputer (src/services/frontier.ts)

Generates candidate next actions from two sources: 1. Incomplete nodes — missing key properties (e.g., host without services enumerated) 2. Untested inferred edges — hypothetical edges from inference awaiting validation

Each item carries fan-out estimates and OPSEC noise ratings.

PathAnalyzer (src/services/path-analyzer.ts)

BFS-based shortest path on an undirected confidence-weighted projection. Resolves objective targets from engagement config criteria. Computes per-hop and total path confidence. Cached path graph with invalidation on mutations.

Identity Resolution (src/services/identity-resolution.ts)

Resolves canonical IDs for nodes by type. Generates identity markers for matching (hostname variants, SIDs, domain-qualified usernames, credential fingerprints). Handles ambiguous BloodHound principals.

Identity Reconciliation (src/services/identity-reconciliation.ts)

Post-ingest merge logic. When a canonical node is added, finds alias nodes sharing identity markers and merges them — retargets edges, merges properties, logs convergence events. Supports bidirectional merge (weaker canonical merges into stronger existing node).

Graph Health (src/services/graph-health.ts)

Eight integrity checks: 1. Split host identities (multiple nodes claiming same IP/hostname) 2. Dangling edge references 3. Unresolved identity nodes 4. Credential identity ambiguities 5. Identity marker collisions 6. Shared credential material across accounts 7. Edge type constraint violations 8. Stale inferred edges

Output Parsers (src/services/parsers/)

Twenty-one deterministic parsers with 36 aliases:

Parser Input Output
nmap Nmap XML host + service nodes, RUNS edges, OS detection
nxc / netexec NXC stdout host + SMB services + shares + users, access edges, NULL_SESSION, linked SQL servers
certipy Certipy JSON CA + cert_template nodes, ESC vulnerability edges
secretsdump SAM/NTDS dump credential + user nodes, OWNS_CRED + DUMPED_FROM + MEMBER_OF_DOMAIN edges
kerbrute User enum / spray user + domain + credential nodes
hashcat Cracked hashes credential nodes (Kerberoast, AS-REP, NTLMv2, NTLM)
responder NTLMv2 captures host + user + credential nodes
ldapsearch LDIF / ldapdomaindump JSON user + group + host + domain nodes, UAC flags, group memberships
enum4linux JSON (-oJ) or text host + SMB service + user + group + share nodes, null session detection
rubeus Kerberoast / AS-REP / monitor user + credential nodes, OWNS_CRED edges (TGT/TGS detection)
gobuster / feroxbuster / ffuf Text or JSON service node enrichment with discovered_paths, login form detection
linpeas / linenum ANSI text host enrichment: kernel version, SUID binaries, docker socket, capabilities, cron jobs
nuclei JSON, JSONL, or text vulnerability + webapp nodes, VULNERABLE_TO edges (text: [id] [proto] [severity] url)
nikto Text or JSON per-path vulnerability + webapp nodes
testssl / sslscan JSON or text TLS vulnerability detection (Heartbleed, POODLE, DROWN, etc.)
pacu JSON cloud_identity + cloud_resource + cloud_policy nodes, IAM edges
prowler OCSF JSON-lines cloud_resource nodes, compliance findings, VULNERABLE_TO edges
burp Burp Suite XML vulnerability + webapp nodes, VULNERABLE_TO edges, CVE/CVSS
zap OWASP ZAP XML vulnerability + webapp nodes, VULNERABLE_TO edges
sqlmap sqlmap text/JSON vulnerability nodes (SQLi type, DBMS, technique), credential extraction
wpscan WPScan JSON plugin/theme vulnerabilities, user enumeration

All parsers use canonical ID generation with SHA-1 fingerprinting for credential deduplication. Parsers accept optional ParseContext (domain, source_host, cloud_account, cloud_region, network_zone) for ambient context.

BloodHound Ingestion (src/services/bloodhound-ingest.ts — 701 lines)

Full SharpHound v4/v5 JSON parser. Maps all BH object types (computers, users, groups, domains, OUs, GPOs, cert templates, CAs, PKI stores) to Overwatch nodes. Processes ACEs, group memberships, sessions, local admins, delegation, SPN targets. Builds cross-file SID maps for reference resolution.

Skill Index (src/services/skill-index.ts)

Local TF-IDF search over 34 markdown skill files. No external vector DB — runs entirely locally. Lightweight stemming, tag and name bonuses, ranked results with excerpts.

Dashboard Server (src/services/dashboard-server.ts)

HTTP + WebSocket server on port 8384 (configurable). Serves a sigma.js WebGL dashboard SPA. Broadcasts graph deltas to connected clients with 500ms debounced batching via DeltaAccumulator. Read-only — no mutations from browser. API endpoints: /api/state, /api/graph.

Lab Preflight (src/services/lab-preflight.ts)

Aggregate readiness checks for lab workflows across all 6 profiles (goad_ad, single_host, network, web_app, cloud, hybrid). Validates config, scope, tool availability, graph health, persistence safety, dashboard status, and graph stage. Profile is inferred from scope if not explicitly set; network must be set explicitly as it is never auto-inferred.

Session Manager (src/services/session-manager.ts)

Persistent interactive sessions with three transport adapters (local PTY via node-pty, SSH via node-pty, TCP socket for reverse shells). Each session has a 128KB ring buffer with absolute monotonic cursor positions for cursor-based reads. Ownership enforcement via claimed_by — single writer, many readers, force override. Sessions are ephemeral (not persisted across restarts).

Session Adapters (src/services/session-adapters.ts)

Three transport implementations: LocalPtyAdapter (node-pty spawn), SshAdapter (SSH via node-pty), SocketAdapter (net.createServer/connect for bind/reverse shells). Socket sessions start in pending state and transition to connected when a connection arrives.


MCP Tools (42)

All tools are wrapped in withErrorBoundary — unhandled errors return structured MCP error responses instead of crashing the server.

State & Lifecycle

Tool Purpose
get_state Full engagement briefing (primary recovery after compaction)
get_history Activity log with optional agent filtering
export_graph Complete graph dump for reporting
run_lab_preflight Lab readiness checks (profile-specific)
run_graph_health Full graph integrity report
recompute_objectives Re-evaluate objective achievement status

Findings & Parsing

Tool Purpose
report_finding Primary data ingestion — nodes + edges + evidence
get_evidence Retrieve full-fidelity evidence by ID or list stored evidence records
parse_output Deterministic tool output parsing (21 parsers, 36 aliases)
ingest_bloodhound SharpHound/bloodhound-python JSON ingestion
ingest_azurehound AzureHound / ROADtools JSON ingestion

Scoring & Planning

Tool Purpose
next_task Filtered frontier candidates for LLM scoring
validate_action Pre-execution scope + OPSEC sanity check
log_action_event Structured action lifecycle logging (plan → start → complete/fail)

Exploration

Tool Purpose
query_graph Structured graph queries with filtering and traversal
find_paths Shortest path analysis between nodes or to objectives

Agents

Tool Purpose
register_agent Dispatch sub-agent with scoped subgraph
dispatch_agents Batch agent dispatch from frontier items
dispatch_subnet_agents One agent per scope CIDR for parallel subnet enumeration
get_agent_context Scoped subgraph view for agents
update_agent Mark agent task completed/failed

Infrastructure

Tool Purpose
get_skill RAG skill search and retrieval (34 skills)
check_tools Offensive tool availability detection
track_process Register long-running scan for tracking
check_processes Check tracked process status
suggest_inference_rule Custom inference rule creation + backfill
correct_graph Transactional graph repair (drop/replace edges, patch nodes)
update_scope Confirmation-gated runtime scope expansion/contraction
get_system_prompt Dynamic agent instructions from engagement state
generate_report Full pentest report with findings, narrative, evidence, remediation
run_retrospective Post-engagement analysis (5 structured outputs)

Sessions

Tool Purpose
open_session Create persistent interactive session (SSH, PTY, socket)
write_session Write raw bytes to a session (I/O primitive)
read_session Cursor-based read from session buffer
send_to_session [Experimental] Write command + wait + read
list_sessions List sessions with metadata
update_session Update capabilities, title, ownership
resize_session Resize terminal dimensions (PTY only)
signal_session Send signal (SIGINT, SIGTERM, etc.)
close_session Close and destroy a session

Validation Pipeline

Every finding passes through a multi-stage validation pipeline before entering the graph:

Raw Input (report_finding / parse_output / ingest_bloodhound)
┌─────────────────────────┐
│  Finding Validation      │  Normalize credentials, check edge constraints,
│  (finding-validation.ts) │  verify node references exist
└────────────┬────────────┘
┌─────────────────────────┐
│  Identity Resolution     │  Generate canonical IDs, match identity markers,
│  (identity-resolution.ts)│  classify ambiguous principals
└────────────┬────────────┘
┌─────────────────────────┐
│  Graph Ingestion         │  Add nodes/edges, merge properties,
│  (graph-engine.ts)       │  track provenance (first_seen, sources)
└────────────┬────────────┘
┌─────────────────────────┐
│  Identity Reconciliation │  Merge alias nodes into canonicals,
│  (identity-reconcil...)  │  retarget edges, log convergence
└────────────┬────────────┘
┌─────────────────────────┐
│  Inference Engine        │  Fire matching rules, produce hypothetical
│  (inference-engine.ts)   │  edges, update frontier
└────────────┬────────────┘
┌─────────────────────────┐
│  Objective Evaluation    │  Check if any objective criteria are now
│  (graph-engine.ts)       │  satisfied by graph state
└────────────┬────────────┘
┌─────────────────────────┐
│  Persistence + Broadcast │  Atomic write-rename to disk,
│  (state-persistence.ts)  │  WebSocket delta to dashboard
└─────────────────────────┘

Retrospective Analysis

The run_retrospective tool produces five structured outputs:

  1. Inference Rule Suggestions — Edge patterns observed 3+ times without matching rules
  2. Skill Gap Analysis — Unused skills, missing skills for encountered scenarios, failed techniques
  3. Context Improvement Report — Logging quality assessment, trace quality metrics
  4. Attack Path Report — Client-deliverable markdown summarizing attack chains
  5. RLVR Training Traces — State→action→outcome triplets with heuristic rewards for model fine-tuning

Testing

73 test files across the codebase (see Vitest output for exact test count):

Area Files Coverage
Bootstrap config.test.ts, app-bootstrap.test.ts Config parsing and transport-neutral app bootstrap (51 tools)
Integration mcp-server.integration.test.ts, http-transport.integration.test.ts All 51 tools via stdio + HTTP/SSE transport
Core Engine graph-engine.test.ts Seeding, ingestion, inference, persistence, rollback, identity, cold store integration
Services 24 test files CIDR, BloodHound, parsers (21), identity resolution, identity reconciliation, health, credentials, credential lifecycle, preflight, retrospective, dashboard, delta accumulator, graph schema, session manager, community detection, prompt generator, report generator, parser utils + sprint test suites (compaction, web surface, hardening, cloud graph, Linux/network, architecture prep)
Tools 10+ test files Tool handlers: agents, findings, scoring, state, reporting, instructions, remediation, sessions, parse-output, activity logging, error boundary, processes
Dashboard 5 test files Boot, graph rendering, UI, WebSocket, main
CLI lab-smoke.test.ts End-to-end lab workflow

The integration tests spawn the actual MCP server (stdio and HTTP) and validate tool registration, state retrieval, health checks, BloodHound ingestion, output parsing, graph queries, agent lifecycle, retrospective analysis, and concurrent sessions.


Technology Stack

Component Technology
Runtime Node.js (ESM)
Language TypeScript (strict mode)
Protocol MCP via @modelcontextprotocol/sdk
Graph graphology + graphology-shortest-path + graphology-traversal
Validation Zod schemas (runtime + compile-time)
XML Parsing fast-xml-parser
Dashboard sigma.js (WebGL) + graphology-layout-forceatlas2
WebSocket ws
IDs uuid v4
Testing vitest
Persistence Atomic JSON write-rename with snapshot rotation

Project Structure

overwatch/
├── src/
│   ├── app.ts                      # Core bootstrap + transport-neutral tool registration
│   ├── index.ts                    # Stdio entrypoint + graceful shutdown
│   ├── config.ts                   # Config loading + validation
│   ├── types.ts                    # Zod schemas + TypeScript types
│   ├── tools/                      # 20 MCP tool modules
│   │   ├── state.ts                # get_state, preflight, health, history, export, recompute_objectives
│   │   ├── findings.ts             # report_finding
│   │   ├── scoring.ts              # next_task, validate_action
│   │   ├── exploration.ts          # query_graph, find_paths
│   │   ├── agents.ts               # register_agent, dispatch_agents, dispatch_subnet_agents, get_agent_context, update_agent
│   │   ├── logging.ts              # log_action_event
│   │   ├── parse-output.ts         # parse_output
│   │   ├── bloodhound.ts           # ingest_bloodhound
│   │   ├── azurehound.ts           # ingest_azurehound
│   │   ├── inference.ts            # suggest_inference_rule
│   │   ├── remediation.ts          # correct_graph
│   │   ├── retrospective.ts        # run_retrospective
│   │   ├── skills.ts               # get_skill
│   │   ├── toolcheck.ts            # check_tools
│   │   ├── processes.ts            # track_process, check_processes
│   │   ├── sessions.ts             # open/write/read/send_to/list/update/resize/signal/close_session
│   │   ├── scope.ts                # update_scope
│   │   ├── instructions.ts         # get_system_prompt
│   │   ├── reporting.ts            # generate_report
│   │   └── error-boundary.ts       # withErrorBoundary wrapper
│   ├── services/                   # Core business logic (43 modules)
│   │   ├── graph-engine.ts         # Central orchestrator
│   │   ├── engine-context.ts       # Shared mutable state (graph, config, rules, cold store)
│   │   ├── state-persistence.ts    # Atomic persistence + snapshots + cold store serialization
│   │   ├── inference-engine.ts     # Rule-based edge production (53 built-in rules)
│   │   ├── frontier.ts             # Frontier computation (6 item types)
│   │   ├── path-analyzer.ts        # OPSEC-weighted shortest paths (confidence/stealth/balanced)
│   │   ├── identity-resolution.ts  # Canonical ID generation
│   │   ├── identity-reconciliation.ts # Alias node merging
│   │   ├── finding-validation.ts   # Pre-ingest validation
│   │   ├── graph-schema.ts         # Edge endpoint constraints
│   │   ├── graph-health.ts         # 8 integrity checks + contextual AD filtering
│   │   ├── credential-utils.ts     # Credential classification + lifecycle
│   │   ├── credential-coverage.ts  # Credential × target coverage matrix
│   │   ├── parsers/              # 21 deterministic parsers (36 aliases)
│   │   ├── parser-utils.ts         # Canonical ID helpers
│   │   ├── provenance-utils.ts     # Node provenance normalization
│   │   ├── bloodhound-ingest.ts    # SharpHound v4/v5 (CE) JSON parser
│   │   ├── azurehound-ingest.ts    # AzureHound / ROADtools JSON parser
│   │   ├── cold-store.ts           # Promotion-only compaction for large network sweeps
│   │   ├── community-detection.ts  # Louvain modularity for graph clustering
│   │   ├── skill-index.ts          # TF-IDF skill search
│   │   ├── dashboard-server.ts     # HTTP + WebSocket server
│   │   ├── delta-accumulator.ts    # Graph delta batching
│   │   ├── lab-preflight.ts        # Lab readiness checks (6 profiles)
│   │   ├── agent-manager.ts        # Agent CRUD
│   │   ├── retrospective.ts        # Post-engagement analysis
│   │   ├── cidr.ts                 # CIDR parsing, expansion, scope matching
│   │   ├── tool-check.ts           # Tool availability detection
│   │   ├── process-tracker.ts      # PID tracking for long-running scans
│   │   ├── session-manager.ts      # Persistent sessions, RingBuffer, ownership
│   │   ├── session-adapters.ts     # LocalPty, SSH, Socket transport adapters
│   │   ├── prompt-generator.ts     # Dynamic system prompt generation
│   │   ├── report-generator.ts     # Per-finding sections, evidence chains, narrative, remediation
│   │   └── report-html.ts          # Self-contained HTML report renderer
│   ├── dashboard/                  # Browser SPA (6 files)
│   │   ├── index.html              # Slim HTML shell loading CDN deps + local scripts
│   │   ├── styles.css              # Dark theme, animations, responsive layout
│   │   ├── graph.js                # Sigma.js, FA2, drag, hover, path/attack/credential overlays, community hulls
│   │   ├── ui.js                   # Sidebar panels, node detail, search, keyboard shortcuts
│   │   ├── ws.js                   # WebSocket + HTTP polling, reconnect
│   │   └── main.js                 # Entry point wiring modules
│   └── cli/                        # CLI tools
│       ├── lab-smoke.ts            # Lab smoke test harness
│       ├── lab-smoke-lib.ts        # Smoke test library
│       └── retrospective.ts        # CLI retrospective runner
├── skills/                         # 34 offensive methodology guides
├── fixtures/                       # Test fixtures (GOAD synth data)
├── engagement.json                 # Example engagement config
├── package.json                    # Dependencies + scripts
└── tsconfig.json                   # TypeScript config (strict)

Quality Assessment

Strengths

  • Clean separation of concerns — Each service has a single responsibility. The EngineContext pattern avoids tight coupling between submodules.
  • Robust validation pipeline — Multi-stage: finding validation → schema check → identity resolution → reconciliation → inference. Bad data is rejected before it enters the graph.
  • Sophisticated identity resolution — Canonical ID generation, marker-based matching, bidirectional merge, provenance preservation. Handles the real-world messiness of BloodHound SIDs + manual findings + parser outputs colliding.
  • Error resilience — Every tool handler wrapped in error boundary. Server never crashes on tool errors.
  • Deterministic parsing — 21 parsers (36 aliases) covering the core offensive tool chain. Reduces LLM token cost by handling structured output mechanically.
  • Action lifecycle correlationaction_id links validate → start → complete → finding. Enables meaningful retrospectives and RLVR trace generation.
  • Comprehensive health checks — 8 checks catching real graph integrity issues.
  • Atomic persistence — Write-rename with snapshot rotation prevents data corruption on crash.

Design Considerations

  • GraphEngine is the largest module (~1,415 lines). It delegates well to submodules but acts as a facade for 40+ methods. Could benefit from interface segregation if it grows further.
  • BFS path analysis is appropriate for pentest-scale graphs (hundreds to low thousands of nodes). For enterprise-scale BloodHound imports (tens of thousands), the undirected projection cache will be important.
  • Inference rule selectors are string-based — The 15 selectors are resolved at runtime. Rule definitions rely on convention rather than compile-time safety.