End-to-End Walkthrough¶
A narrated example taking an engagement from empty graph to Domain Admin on a GOAD-like Active Directory lab.
This is the deep dive
Read this when you want to understand what the AI actually does during an engagement — every tool call, every inference, every decision. If you just want to get Overwatch running, the Quick Start is faster. If you want to see specific lab playbooks, see GOAD AD Lab, HTB / Single Host, or HTB / Network.
Phase 0 — Configuration¶
Create engagement.json:
{
"id": "eng-goad-001",
"name": "GOAD Lab Assessment",
"created_at": "2026-03-20T10:00:00Z",
"scope": {
"cidrs": ["192.168.56.0/24"],
"domains": ["north.sevenkingdoms.local", "sevenkingdoms.local"],
"exclusions": [],
"hosts": []
},
"objectives": [
{
"id": "obj-da",
"description": "Achieve Domain Admin on sevenkingdoms.local",
"target_node_type": "credential",
"target_criteria": { "privileged": true, "cred_domain": "sevenkingdoms.local" },
"achieved": false
}
],
"opsec": {
"name": "ctf",
"max_noise": 1.0,
"blacklisted_techniques": [],
"notes": "Lab environment. No restrictions."
}
}
Start the server and connect Claude Code:
Phase 1 — Bootstrap¶
Get State¶
The primary session starts by loading the engagement briefing:
Response includes the scope, empty graph, and zero frontier items. The graph has been seeded with CIDR scope nodes.
Lab Preflight¶
Verifies:
- Engagement config is valid
- Tools installed: nmap, nxc, impacket-*, bloodhound-python, certipy, hashcat
- Graph is healthy (0 nodes, 0 edges — expected for a fresh start)
- Dashboard is running on
http://localhost:8384
Check Tools¶
Returns a list of offensive tools found on $PATH with version info. Missing tools get warnings but don't block the engagement.
Graph state: 0 nodes, 0 edges, 0 frontier items.
Phase 2 — Discovery¶
Nmap Scan¶
The LLM validates and executes a port scan:
→ Call validate_action with:
description: "Full TCP port scan of 192.168.56.0/24"
technique: "portscan"
Returns action_id: "act-001", valid: true.
The LLM runs nmap and parses the results:
→ Call parse_output with:
tool_name: "nmap"
output: "<?xml version='1.0'?>
<nmaprun>
<host><address addr='192.168.56.10'/><ports>
<port portid='88'><state state='open'/><service name='kerberos-sec'/></port>
<port portid='389'><state state='open'/><service name='ldap'/></port>
<port portid='445'><state state='open'/><service name='microsoft-ds'/></port>
</ports></host>
<host><address addr='192.168.56.11'/><ports>
<port portid='445'><state state='open'/><service name='microsoft-ds'/></port>
<port portid='1433'><state state='open'/><service name='ms-sql-s'/></port>
</ports></host>
<host><address addr='192.168.56.12'/><ports>
<port portid='80'><state state='open'/><service name='http'/></port>
<port portid='445'><state state='open'/><service name='microsoft-ds'/></port>
</ports></host>
</nmaprun>"
agent_id: "primary"
action_id: "act-001"
Parser output:
{
"parsed": true,
"nodes_parsed": 9,
"edges_parsed": 6,
"ingested": {
"new_nodes": ["host-192-168-56-10", "svc-192-168-56-10-88", "svc-192-168-56-10-389",
"svc-192-168-56-10-445", "host-192-168-56-11", "svc-192-168-56-11-445",
"svc-192-168-56-11-1433", "host-192-168-56-12", "svc-192-168-56-12-80"],
"new_edges": 6,
"inferred_edges": 3
}
}
Inference rules fired automatically:
- Kerberos → Domain:
svc-192-168-56-10-88→MEMBER_OF_DOMAINedge created - MSSQL + Domain:
svc-192-168-56-11-1433→POTENTIAL_AUTHedges from future domain creds
Graph state: 9 nodes, 9 edges (6 RUNS + 3 inferred), 12+ frontier items.
Dashboard: Three host nodes appear with service clusters. Inferred edges shown as dashed lines.
Phase 3 — SMB Enumeration¶
Check Frontier¶
Returns candidates including:
- "Enumerate SMB on 192.168.56.10:445" (
incomplete_node) - "Enumerate SMB on 192.168.56.11:445" (
incomplete_node) - "Enumerate HTTP on 192.168.56.12:80" (
incomplete_node)
The LLM scores them — SMB on the DC (56.10) is highest priority because Kerberos indicates it's a domain controller.
NXC SMB Enumeration¶
→ Call validate_action with:
description: "SMB enumeration with null session on 192.168.56.10"
target_node: "svc-192-168-56-10-445"
technique: "smb_enum"
→ Call log_action_event with action_id: "act-002", event_type: "action_started"
The LLM runs nxc smb 192.168.56.10 --shares -u '' -p '' and parses:
→ Call parse_output with:
tool_name: "nxc"
output: "SMB 192.168.56.10 445 WINTERFELL [*] Windows Server 2019 Build 17763 x64 (name:WINTERFELL) (domain:north.sevenkingdoms.local) (signing:True) (SMBv1:False)
SMB 192.168.56.10 445 WINTERFELL [+] Enumerated shares
SMB 192.168.56.10 445 WINTERFELL Share Permissions Remark
SMB 192.168.56.10 445 WINTERFELL ----- ----------- ------
SMB 192.168.56.10 445 WINTERFELL SYSVOL READ Logon server share
SMB 192.168.56.10 445 WINTERFELL NETLOGON READ Logon server share"
agent_id: "primary"
action_id: "act-002"
New nodes: host updated with hostname WINTERFELL, domain north.sevenkingdoms.local, share nodes created.
Graph state: 13 nodes, 15 edges, frontier growing.
Phase 4 — BloodHound Ingestion¶
If bloodhound-python data is available, bulk import it:
This ingests users, groups, computers, ACLs, sessions, and local admin relationships. A typical GOAD lab produces ~50-100 nodes and ~200-500 edges.
Graph state: ~80 nodes, ~300 edges. Frontier explodes with inferred edges.
Dashboard: The graph now shows clear AD structure — domain controllers as large central nodes, users clustered around groups, credential edges fanning out.
Phase 5 — Kerberoasting¶
Frontier Shows the Path¶
The LLM spots a Kerberoastable service account in the BloodHound data. Frontier item: "Test Kerberoast against user-north-sql_svc" (inferred_edge, confidence: 0.5).
Execute Kerberoast¶
→ Call validate_action with:
description: "Kerberoast sql_svc on north.sevenkingdoms.local"
target_node: "user-north-sql_svc"
technique: "kerberoast"
→ Call log_action_event with action_id: "act-005", event_type: "action_started"
The LLM runs impacket-GetUserSPNs and gets a TGS hash. Then cracks it:
→ Call parse_output with:
tool_name: "hashcat"
output: "$krb5tgs$23$*sql_svc$NORTH.SEVENKINGDOMS.LOCAL$...:Password123!"
agent_id: "primary"
action_id: "act-005"
New nodes created:
cred-plaintext-sql_svc(type: credential, cred_type: plaintext, cred_value: Password123!)
Inference rules fire:
- Credential Fanout →
POTENTIAL_AUTHedges to every compatible service (SMB, MSSQL, WinRM)
Graph state: ~85 nodes, ~320 edges. New credential node with fan-out to all services.
Dashboard: Credential node pulses briefly (new node animation). Amber POTENTIAL_AUTH edges fan out to multiple services.
Phase 6 — Lateral Movement¶
Agent Dispatch¶
The frontier now shows POTENTIAL_AUTH edges to multiple services. The LLM dispatches sub-agents for parallel testing:
→ Call register_agent with:
agent_id: "agent-lateral-smb"
frontier_item_id: "fi-potential-auth-sql-svc-445"
skill: "lateral-movement"
→ Call register_agent with:
agent_id: "agent-lateral-mssql"
frontier_item_id: "fi-potential-auth-sql-svc-1433"
skill: "lateral-movement"
Each sub-agent:
- Calls
get_agent_context→ scoped subgraph with the credential and target service - Calls
get_skillwithquery: "lateral movement"→ methodology - Validates and executes authentication test
- Reports findings
Agent A finds: sql_svc has admin on 192.168.56.11 via SMB → reports ADMIN_TO edge (confidence: 1.0)
Agent B finds: sql_svc authenticates to MSSQL → reports VALID_ON edge (confidence: 1.0)
Primary Monitors¶
The primary session sees new frontier items from agent findings:
- "DCSync via sql_svc on WINTERFELL" (if sql_svc has replication rights)
- "Dump credentials on 192.168.56.11" (has ADMIN_TO)
- "MSSQL command execution on 192.168.56.11" (has VALID_ON to MSSQL)
Phase 7 — Privilege Escalation¶
Secretsdump¶
→ Call validate_action with:
description: "Secretsdump on 192.168.56.11 with sql_svc creds"
target_node: "host-192-168-56-11"
technique: "credential_dumping"
→ Call log_action_event with action_id: "act-010", event_type: "action_started"
The LLM runs impacket-secretsdump and parses:
→ Call parse_output with:
tool_name: "secretsdump"
output: "Administrator:500:aad3b435...:8846f7eaee8fb117...:::
Guest:501:aad3b435...:31d6cfe0d16ae931...:::
sql_svc:1001:aad3b435...:a87f3a337d73085...:::"
agent_id: "primary"
action_id: "act-010"
New credentials created: Local admin NTLM hashes. Credential Fanout fires again → more POTENTIAL_AUTH edges.
DCSync¶
With the right privileges (or after finding a path to Domain Admin):
→ Call parse_output with:
tool_name: "secretsdump"
output: "north.sevenkingdoms.local\\Administrator:500:aad3b435...:b8a3f328b...:::
north.sevenkingdoms.local\\krbtgt:502:aad3b435...:9d765b48...:::"
agent_id: "primary"
action_id: "act-012"
Objective achieved! The engine matches cred-ntlm-administrator against the objective criteria (privileged: true, cred_domain: sevenkingdoms.local) and marks obj-da as achieved.
Graph state: ~100 nodes, ~400 edges. Objective node turns green in the dashboard.
Phase 8 — Retrospective¶
Generates:
| File | Contents |
|---|---|
report.md |
Timeline of the attack path: discovery → Kerberoast → lateral → DCSync |
inference-suggestions.json |
"MSSQL service on domain host often has xp_cmdshell enabled" |
skill-gaps.json |
ADCS skills unused (no certificates found), web skills unused |
context-improvements.json |
"Consider running BloodHound earlier for faster path discovery" |
training-traces.json |
12 state→action→outcome traces for RLVR |
Key Takeaways¶
- Report early, report often — each
report_findingtriggered inference rules that surfaced the next step - Parsers save tokens —
parse_outputhandled nmap, nxc, hashcat, and secretsdump without LLM interpretation - Inference chains compound — SMB scan → credential fanout → lateral movement path appeared automatically
- Sub-agents parallelize — two services tested simultaneously instead of sequentially
- The graph tells the story — the retrospective reconstructed the full attack path from graph state