Key Concepts¶
Overwatch uses several domain-specific terms throughout its tools and documentation. This page defines each one.
Engagement Graph¶
The core data structure. A directed property graph where nodes represent discovered entities (hosts, services, credentials, users) and edges represent relationships between them (RUNS, VALID_ON, ADMIN_TO, etc.). Every tool reads from or writes to this graph.
The graph is powered by graphology and persisted to disk after every change. See Graph Model for the full schema.
Frontier Item¶
A candidate next action generated from the graph. The deterministic layer produces frontier items by scanning the graph for:
| Type | Meaning | Example |
|---|---|---|
incomplete_node |
A node missing expected properties or relationships | Host with no service enumeration |
untested_edge |
An edge that exists but hasn't been validated | POTENTIAL_AUTH credential → service |
inferred_edge |
A hypothesis edge created by an inference rule | RELAY_TARGET from SMB signing disabled |
network_discovery |
CIDR scope with undiscovered hosts | Network sweep of 10.10.10.0/24 |
network_pivot |
Host reachable via pivot but without a session | Host in same subnet as session-holder |
credential_test |
Untested credential/target pair from the coverage matrix | Test jdoe:NTLM against DC01 (SMB) |
Frontier items include graph metrics (hops to objective, fan-out estimate, node degree) but are not scored — scoring is the LLM's job. The deterministic layer only filters out items that are out-of-scope, duplicated, or exceed the OPSEC noise ceiling.
Access frontier items via next_task.
Inference Rules¶
Deterministic rules that fire automatically when matching nodes are ingested. They generate hypothesis edges — low-confidence relationships the LLM should evaluate.
Lifecycle:
1. Agent reports a finding (new node/edge enters the graph)
2. Engine checks all registered rules against the new node
3. Matching rules produce new edges (confidence 0.3–0.7)
4. New edges become frontier items (type: inferred_edge)
5. LLM sees them via next_task, decides whether to test
6. If tested successfully, confidence is raised to 1.0
Example: When a service node with smb_signing: false is ingested, the "SMB Signing → Relay" rule fires and creates RELAY_TARGET edges from all compromised hosts to that service.
Fifty-five built-in rules ship with Overwatch across five domains: AD & service (21), ADCS (14), Linux privilege escalation (7), web application (8), MSSQL (2), and cloud infrastructure (3). This includes edge-triggered rules that require both a node property match and a matching inbound edge (e.g., LAPS readable requires laps: true + inbound GENERIC_ALL). When a new edge arrives, the engine re-evaluates inference on its endpoints.
Custom rules can be added at runtime via suggest_inference_rule. See Graph Model — Inference Rules for the full rule reference with triggers and productions.
Confidence¶
A 0.0 to 1.0 value on every node and edge indicating how certain the information is:
| Range | Meaning | Example |
|---|---|---|
0.0 – 0.3 |
Hypothesis | Inferred POTENTIAL_AUTH edge from credential fanout |
0.3 – 0.7 |
Likely | Service version from banner grab, unverified credential |
0.7 – 0.9 |
Strong evidence | Successful authentication attempt |
1.0 |
Confirmed | Verified admin access, dumped credentials |
Confidence affects frontier item prioritization — lower-confidence edges are more valuable to test because they have the most uncertainty to resolve.
OPSEC Noise¶
A 0.0 to 1.0 rating on actions and edge types indicating how likely they are to trigger detection:
| Rating | Level | Examples |
|---|---|---|
0.0 – 0.2 |
Silent | DNS queries, passive enumeration |
0.2 – 0.4 |
Quiet | Targeted port scans, LDAP queries |
0.4 – 0.6 |
Moderate | SMB enumeration, Kerberoasting |
0.6 – 0.8 |
Loud | Password spraying, brute force |
0.8 – 1.0 |
Very loud | Mass scanning, exploit attempts |
The engagement's OPSEC profile sets a max_noise ceiling. Actions exceeding this ceiling are:
- Filtered from the frontier by the deterministic layer
- Rejected by
validate_action
Blacklisted techniques (e.g., zerologon) are rejected regardless of noise level.
See Configuration for profile options.
Compaction¶
When the LLM's context window overflows, Claude Code compacts — summarizing the conversation history to free up space. This would normally lose engagement state.
Overwatch survives compaction because the graph lives outside the context window. After compaction:
- Claude Code starts a fresh context
- The
AGENTS.mdinstructions tell it to callget_state()first get_state()reconstructs a complete briefing from the graph:- Scope and objectives
- All discoveries and access
- Current frontier items
- Active agents
- Recent activity
- The LLM resumes exactly where it left off
This also works across server restarts, session handoffs, and multi-day engagements.
Startup Reconciliation: On restart, the engine automatically reconciles runtime-dependent state:
- HAS_SESSION edges are downgraded from session_live=true to session_live=false since no runtime sessions survive a restart.
- Running agents are marked as interrupted since the sub-agent processes no longer exist.
- Tracked processes are checked for PID liveness; dead PIDs are marked as completed.
Node ID Conventions¶
Every node needs a unique, deterministic ID. Overwatch uses these conventions:
| Node Type | Pattern | Example |
|---|---|---|
| Host | host-<ip> |
host-10-10-10-5 |
| Service | svc-<ip>-<port> |
svc-10-10-10-5-445 |
| Domain | domain-<name> |
domain-target-local |
| User | user-<domain>-<username> |
user-target-local-administrator |
| Group | group-<domain>-<name> |
group-target-local-domain-admins |
| Credential | cred-<type>-<user> |
cred-ntlm-administrator |
| Share | share-<host>-<name> |
share-10-10-10-5-c$ |
| Certificate | cert-<template> |
cert-user-template |
| Objective | obj-<id> |
obj-da |
Consistent IDs enable automatic deduplication — reporting the same node twice merges properties instead of creating duplicates.
Action Lifecycle¶
Every significant action follows a structured lifecycle for traceability:
1. validate_action(description, target, technique)
→ Returns action_id + valid/invalid
2. log_action_event(action_id, event_type="action_started")
→ Records start time in activity log
3. Execute the tool/command (bash, nmap, nxc, etc.)
4. parse_output(tool_name, output, action_id, ...)
— or —
report_finding(nodes, edges, action_id, ...)
→ Ingests results into graph (action_id and agent_id are optional)
5. log_action_event(action_id, event_type="action_completed")
— or —
log_action_event(action_id, event_type="action_failed")
→ Records outcome in activity log
The action_id links all steps together. This enables:
- Retrospective analysis — which actions led to which discoveries
- RLVR training traces — state→action→outcome triplets
- Audit trail — every graph change is attributable to a specific action
Action ID determinism. Engagements created with an engagement_nonce (the strict-migration boundary; see Configuration) get deterministic action_ids of the form act_<16hex> derived from sha256(nonce | agent_id | timestamp | command_signature | sequence). Same inputs always produce the same action_id. Legacy engagements (no nonce) keep uuidv4 IDs.
Frontier leases. When an agent claims a frontier item via register_agent, the engine takes a TTL lease (default 600s). A second agent attempting to claim the same item gets lease_conflict rather than racing. Heartbeats extend the lease; terminal status releases it; the agent watchdog reaps expired leases.
"Why did the agent do X?" — the explain_action tool projects an action_id's full chain (frontier item, log_thought, alternatives, validation, approval, outcome) into a single answer. The get_decision_log tool gives the same view across many decisions at once.
Agent Heartbeat and Watchdog¶
Long-running sub-agents call agent_heartbeat({task_id}) periodically (recommended every 30–60s) to signal liveness. The runtime watchdog walks running tasks on an interval and marks any whose heartbeat_at is older than heartbeat_ttl_seconds (default 120s) as interrupted, releasing their frontier leases at the same moment.
Tasks that never heartbeat are exempt from the watchdog — preserves backward-compat for tools that complete in a single MCP turn.
Heartbeat events are excluded from the hash chain (high-volume, low-stakes) but persist in the activity log so dashboards can show liveness.
Operator Infrastructure¶
Anything the operator stands up to receive incoming connections — Responder, ntlmrelayx, fake LDAP, socat redirector, reverse-shell catcher, HTTP/SMB capture endpoint — is a first-class graph object: a mock_service node.
Why this matters: without it, captured credentials float free of the listener that caught them, and retrospectives can't tell "we found 3 hashes" from "our Responder caught 3 hashes." With it, the capture chain is structural and queryable.
flowchart LR
OP([operator user]):::user
MS["mock_service<br/>Responder<br/>0.0.0.0:445"]:::mock
SESS["session<br/>kind=socket, mode=listen"]:::sess
HOST[(attacker host)]:::host
CRED["credential<br/>CORP/victim NTLMv2"]:::cred
USR["user<br/>victim"]:::user
TARGET[(target host)]:::host
SESS -- "serves_mock_service_id" --> MS
MS -- OPERATED_BY --> OP
MS -- RUNS_ON --> HOST
MS -- "BAITED (auto)" --> CRED
CRED -- OWNS_CRED --> USR
CRED -- "VALID_ON (after test)" --> TARGET
classDef mock fill:#d97706,stroke:#92400e,color:#fff
classDef cred fill:#f0b54a,stroke:#92400e,color:#000
classDef user fill:#afa9ec,stroke:#3730a3,color:#000
classDef host fill:#6e9eff,stroke:#1e40af,color:#fff
classDef sess fill:#94a3b8,stroke:#334155,color:#000
The capture chain in plain words:
- The operator opens a listening session with
open_session kind=socket mode=listen mock_service_purpose=responder. The server auto-registers amock_servicenode and stampscapabilities.serves_mock_service_idonto the session. - The target environment fires off a poisoned NetBIOS query and authenticates to the listener.
- Whoever parses the capture (Responder log parser, manual
report_finding, future agent) reports the credential withvia_mock_service_idset. - The built-in
rule-baited-credentialinference rule fires and emits aBAITEDedge from the listener to the credential. - When
close_sessionis called, the server stampsstopped_aton the listener so the dashboard renders it inactive and retrospectives know the active window.
Idempotency. register_mock_service dedupes on (purpose, bind_host, bind_port, agent_id). Re-registering is safe and refreshes last_seen_at without duplicating the node.
OPSEC defaults. opsec_loud=true is the default for responder, ntlmrelayx, fake_ldap, and smb_capture. The OPSEC scorer can read this directly instead of hardcoding a noisy-tool list.
See register_mock_service and Playbook — Operator Infrastructure.
Audit Trail¶
Five independent layers make engagement evidence defensible. The first three are default-on for new engagements (i.e. those created with an engagement_nonce); the latter two are explicit opt-ins.
1. Activity Log with Causal Linkage¶
Every action emits structured events (action_planned, action_validated, action_started, action_completed/action_failed, plus parse and finding events). Each event carries action_id, frontier_item_id, and agent_id, so the entire decision chain is reconstructable as a directed graph — not as a stream of timestamps you have to correlate.
The activity log also captures mock_service_registered / mock_service_refreshed events with provenance: 'operator', plus phase_entered / phase_exited events when phase-aware policy transitions fire. The operator-infrastructure timeline interleaves cleanly with discovery and exploitation events.
2. Hash Chain with Signed Checkpoints (hash_chain_enabled, default true)¶
Every qualifying activity event (provenance ∈ {agent, system}, excluding thought and heartbeat) gets a prev_hash and event_hash, forming a tamper-evident chain. verify_activity_chain walks the chain and confirms it's intact.
Signed checkpoints are emitted every 500 events / 30 minutes (configurable). Verifiers can resume from the latest checkpoint instead of replaying from genesis — O(events_since_checkpoint) instead of O(n). Checkpoints carry an optional signing_key_id slot for Ed25519-signed audit deliverables (signing implementation is staged for follow-up).
If anyone — the AI itself, a bug, a malicious operator — modifies an old entry, the chain breaks and the system can prove it. Ingested events (from external transcripts) get chain_excluded: true so they don't pollute the live chain.
3. Content-Addressed Evidence¶
Every evidence row carries content_hash = sha256(content). Two runs that produce identical output deduplicate to a single row; tampering with content on disk changes the address (the manifest's content_hash no longer resolves). Lookups via get_evidence accept either the legacy UUID evidence_id or the new content_hash.
Streaming sinks (createBlobStream for live run_bash/run_tool output) accumulate the hash incrementally and finalize it when the stream closes, so the evidence record's hash always matches the bytes actually durable on disk.
4. Evidence Stream Integrity¶
run_bash and run_tool stream stdout/stderr directly into the evidence store rather than buffering, with backpressure-aware writes. Every action's evidence file has a manifest record that says either:
bytes_written: N, capture_error: null— the full output is on disk, and it's exactly N bytes.bytes_written: N, capture_error: "<reason>"— partial capture; the system explicitly knows how much was lost and why.
Evidence can never be silently truncated. If your report cites an nmap scan, the manifest tells you whether you have the whole thing.
5. JSON-RPC Tape Proxy¶
overwatch-mcp-tape sits between the AI client and the Overwatch MCP server, recording every wire-level frame in both directions to a JSONL tape. The tape captures things the server might never log itself (malformed requests, requests that errored before reaching a handler, batched calls).
After the engagement, register_tape_session imports the tape and emits tape_session_started events linked to the live activity log. A retrospective can then ask:
- "Did the AI actually validate every command before running it?" → join tape
tools/call validate_actionagainst tapetools/call run_bash. - "Did the AI claim a result the server never produced?" → diff tape responses against AI transcript turns.
This is the difference between trusting the AI's narrative and being able to independently verify it.
Deterministic Layer vs LLM Layer¶
Overwatch splits decision-making into two layers:
Deterministic layer (the server) handles:
- Scope enforcement (CIDR/domain matching)
- Deduplication (already-tested edges)
- OPSEC hard vetoes (noise ceiling, blacklisted techniques)
- Dead host pruning
- Inference rule execution
- Graph persistence
- Frontier generation
LLM layer (Claude) handles:
- Attack chain spotting across multiple hops
- Sequencing (what should happen before what)
- Risk/reward assessment given defensive posture
- Creative path discovery beyond the frontier
- Tool command construction
- Output interpretation (for unsupported tools)
- Agent dispatch decisions
The deterministic layer is a guardrail, not a brain. It filters the obviously impossible. The LLM does the offensive thinking.
Credential Lifecycle¶
Credentials in Overwatch have a lifecycle tracked via the credential_status property:
| Status | Meaning |
|---|---|
active |
Credential is current and usable |
stale |
Credential may still work but hasn't been verified recently |
expired |
Credential has passed its valid_until time |
rotated |
Credential has been observed as changed |
The engine automatically:
- Degrades outbound
POTENTIAL_AUTHedges from stale/expired credentials (confidence × 0.5) - Deprioritizes frontier items sourced from stale/expired credentials (confidence × 0.1)
- Tracks derivation chains via
DERIVED_FROMedges (e.g., hash → cracked password) - Infers credential domains from graph topology when not explicitly provided
Credential Expiry Estimation¶
The engine estimates credential expiry automatically based on credential type and domain policy:
| Credential Type | Default Lifetime | Policy Override |
|---|---|---|
kerberos_tgt |
10 hours | Domain password_policy.maxAge |
kerberos_tgs |
10 hours | — |
token |
1 hour | — |
| Password types | Domain policy maxAge |
Requires pwd_last_set on the owning user |
When a domain has a password_policy with maxAge set and the associated user has pwd_last_set, the engine computes password expiry as pwd_last_set + maxAge. This feeds into graduated frontier scoring:
| Time Remaining | Frontier Score Multiplier |
|---|---|
| < 30 minutes | 0.3× (urgent — use it or lose it) |
| < 2 hours | 0.7× (expiring soon) |
| > 2 hours | 1.0× (healthy) |
| Stale/expired | 0.1× (deprioritized) |
The chain scorer similarly applies graduated quality points: healthy credentials score 3 points, expiring (<2h) score 2, near-expiry (<30min) score 1.
Credential Provenance¶
The getCredentialProvenance() function traces a credential's full provenance chain by walking DERIVED_FROM and DUMPED_FROM edges. This enables:
- Identifying the original source of a cracked hash
- Tracing which host a credential was dumped from
- Understanding multi-hop derivation (e.g., NTDS dump → hash → cracked password → TGT)
The dashboard visualizes provenance chains via the /api/evidence-chains/:nodeId endpoint.
See Graph Model — Credential Lifecycle Properties for the full property reference.
Credential Coverage Matrix¶
The engine tracks which credentials have been tested against which targets, surfacing untested pairs as credential_test frontier items. This gives the LLM a systematic "spray progress" view rather than relying on ad-hoc enumeration.
How it works:
- Collect usable credentials — active, non-stale, non-expired credentials with
isCredentialUsableForAuth() - Collect auth targets — hosts running auth-accepting services (SMB, RDP, SSH, WinRM, MSSQL, etc.)
- Build tested set — scan
TESTED_CRED,VALID_ON,HAS_SESSION,ADMIN_TOedges to identify already-tested pairs - Rank untested pairs — priority based on credential type × service type × hops-to-objective × same-domain boost
Priority scoring:
| Credential Type | Weight | Service Type | Weight |
|---|---|---|---|
| Plaintext password | 1.0 | SMB | 0.9 |
| NTLM hash | 0.9 | RDP | 0.85 |
| AES256 key | 0.85 | SSH / WinRM | 0.8 |
| Kerberos TGT | 0.8 | MSSQL | 0.7 |
| SSH key | 0.8 | LDAP | 0.7 |
| Token / Certificate | 0.7 | HTTP(S) | 0.5 |
Coverage stats appear in get_state().credential_coverage and in the system prompt as "Credential Spray Progress." The dashboard shows a coverage progress bar with top untested pairs.
IAM Policy Simulation¶
Overwatch includes a cloud IAM policy simulator (evaluateIAM()) that evaluates whether an identity is permitted to perform an action on a resource. The simulator understands the permission evaluation semantics of all three major cloud providers:
| Provider | Evaluation Logic |
|---|---|
| AWS | Deny-overrides-allow — explicit deny in any policy wins. Evaluates all attached policies, then checks for matching allow. |
| Azure | RBAC scope hierarchy — permissions at broader scopes (/subscriptions/...) inherit to child resources. Role assignments are evaluated against scope prefixes. |
| GCP | Deny policy precedence — deny policies are evaluated first, then allow. Supports wildcard action matching. |
The simulator traverses the graph to collect all policies reachable from an identity (including via group memberships and role assumptions), then evaluates them against the requested action and resource.
Usage: The IAM simulator is used internally by graph analysis tools and is accessible via the evidence chain API. It helps answer questions like "Can this service account delete S3 buckets?" or "Does this Azure role assignment cover this resource?"
Web Attack Path Modeling¶
Overwatch models web application attack surfaces using three graph constructs:
-
api_endpointnodes — represent individual API endpoints withpath,method,auth_required, andresponse_typeproperties. Connected to their parent webapp viaHAS_ENDPOINTedges. -
AUTH_BYPASSedges — from avulnerabilitynode to awebapporapi_endpointwhen the vulnerability enables authentication bypass. This is a bidirectional edge type — the engine considers both directions for path traversal. -
Inference rules — Two rules automate web attack path discovery:
- Token → Webapp Auth — When a credential with
cred_type=tokenexists and a webapp has anAUTHENTICATED_ASedge, creates aVALID_ONedge (confidence 0.75) - Auth Bypass Escalation — When a vulnerability has an
AUTH_BYPASSedge to a webapp, creates anEXPLOITSedge to the parent host (confidence 0.8)
- Token → Webapp Auth — When a credential with
Four inference selectors support web attack path rules: default_credential_candidates, cms_credentials, hosted_webapps, and vulnerable_webapps.
See Graph Model — Web Application Surface for edge definitions and Graph Model — Inference Rules for the full rule reference.
Identity Resolution¶
Overwatch automatically resolves node identities on ingest to prevent duplicates and merge fragmented data:
- Canonical ID generation — Each node type has deterministic ID rules (e.g.,
host-<ip>,user-<domain>-<username>) - Identity markers — Nodes carry markers for matching: hostname variants, SIDs, domain-qualified names, credential fingerprints
- Alias merging — When a canonical node is added and an existing node shares its identity markers, the weaker node is merged into the canonical one — edges are retargeted, properties merged
- Provenance preservation — Merged nodes retain
first_seen_at,sources, and discovery metadata from both originals
This handles the real-world messiness of BloodHound SIDs, manual findings, and parser outputs colliding on the same entity. Nodes can be canonical (primary), unresolved (ambiguous), or superseded (merged into another).
Sessions¶
Overwatch maintains persistent interactive sessions — long-lived bidirectional I/O channels that survive across MCP tool calls. Sessions support SSH, local PTY, and TCP socket transports (for reverse shells and listeners).
I/O Model¶
The core primitives are write_session (raw bytes) and read_session (cursor-based). Each session has a 128KB ring buffer with absolute monotonic positions. Agents track end_pos from each read and pass it as from_pos on the next to get only new output.
send_to_session is an experimental convenience tool that writes a command, waits for output to settle (idle timeout or regex match), and returns the captured output in one call.
TTY Quality¶
Sessions track their terminal capability via tty_quality:
| Level | Description | Example |
|---|---|---|
none |
No terminal | Non-interactive exec |
dumb |
Raw I/O only | Raw reverse shell |
partial |
Line editing | After python3 -c 'import pty; ...' |
full |
Full PTY | SSH, local shell, fully upgraded shell |
Quality can be upgraded at runtime via update_session after a shell upgrade.
Ownership¶
Sessions have a claimed_by field (agent ID). When set, only the claiming agent can write or control the session. Any agent can read. Use update_session to transfer ownership or force: true to override. Unclaimed sessions are open to all.
Lifecycle¶
Sessions follow this state machine:
Socket sessions (reverse shells, listeners) start in pending and transition to connected when a connection is established. PTY and SSH sessions connect immediately. Sessions are ephemeral across server restarts — PTY file descriptors cannot be serialized.
Listener Mode and Mock-Service Binding¶
open_session with kind=socket mode=listen mock_service_purpose=<purpose> does two things atomically:
- Creates the listener session.
- Auto-registers a
mock_servicegraph node and stampscapabilities.serves_mock_service_idon the session, so anything captured through that listener can be attributed back to the operator-controlled infrastructure that caught it.
Closing the session stamps stopped_at on the bound mock_service node — the listener stays in the graph for retrospective analysis but is rendered inactive in the dashboard.
See Operator Infrastructure for the full capture chain.
See Session Tools for the full API reference.
Graph Compaction (Cold Store)¶
During large network sweeps, hundreds of hosts may respond to ping without offering any services. To keep the hot graph focused on actionable targets, Overwatch uses a cold store — an in-memory census that tracks these low-interest hosts outside the main graphology graph.
Temperature Classification¶
Every host ingested into the graph is classified as hot or cold:
| Condition | Temperature | Reason |
|---|---|---|
| Non-host node type | Hot | Always — services, users, credentials, etc. need full graph participation |
Host with alive !== true |
Hot | Dead or unconfirmed hosts need scope tracking |
| Host with hostname or OS | Hot | Identity-bearing — needed for reconciliation |
| Host with interesting edges | Hot | HAS_SESSION, ADMIN_TO, RUNS, HOSTS, etc. |
| Alive IP-only host, no services | Cold | Pure ping response — census only |
Promotion¶
Cold nodes are promoted to the hot graph automatically when:
- A new edge references them (edge promotion guard)
- A later finding adds services, hostname, or OS
- A pivot session makes them reachable (pivot reachability inference)
- A scope expansion brings them into scope (
update_scope)
Promotion is one-way — hot nodes are never demoted back to cold. This avoids cache invalidation complexity.
Visibility¶
get_state() includes cold_node_count and cold_nodes_by_subnet (top 5) in the graph summary, giving the LLM awareness of the census without cluttering the frontier.
Engagement State vs Graph State¶
These are related but distinct:
- Graph state — the raw graphology graph (nodes, edges, properties). This is what gets persisted to
state-<id>.json. - Engagement state — the synthesized view returned by
get_state(). It includes graph summaries, computed frontier items, objective progress, agent status, and recent activity. This is derived from the graph state plus runtime data (active agents, activity log).
Both survive compaction and restarts. The graph state is the source of truth; the engagement state is a computed view of it.
Campaigns¶
A campaign is a coordinated set of frontier items sharing a common theme (e.g., "spray all SMB services with captured DA hash"). The campaign planner groups related frontier items, assigns them to agents in parallel via dispatch_campaign_agents, and tracks collective progress with abort conditions (e.g., too many failures).
Campaigns are created via manage_campaign and have states: active, paused, completed, aborted.
Technique Priors¶
Knowledge-base statistics on how likely a technique is to succeed, derived from historical engagement data. The TechniqueStats include success rate, average noise, and sample count. These are surfaced in validate_action and frontier scoring to help the LLM prioritize.
Adaptive Prompt Token Budgeting¶
The system prompt generated by get_system_prompt uses priority-based trimming to stay within a configurable max_prompt_tokens budget (default: 8000 tokens). Sections are scored by urgency (expiring credentials > active campaigns > scope suggestions) and lower-priority sections are compressed or replaced with summaries when the budget is tight.
Finding Deduplication¶
When the model processes the same tool output multiple times (common after context compaction), the engine deduplicates findings using SHA-256 content hashing. The hash covers tool_name, sorted node signatures (stable properties only), sorted edge keys, and the first 500 characters of raw_output. Exact duplicates within a 5-minute rolling window are rejected immediately with { deduplicated: true } — no graph mutations occur. Findings with the same nodes but different properties pass through for property merging.
Tool Call Telemetry¶
Runtime instrumentation that tracks per-tool call counts, error rates, response times, and call sequence patterns. This data is not persisted — it exists only for the current server process. The telemetry is exported in the tool_telemetry section of run_retrospective output and helps identify unused tools, slow tools, and common tool-call sequences.
Inference Rule Effectiveness¶
Each inference rule's performance is tracked via inferred_by_rule and confirmed_at properties on edges. The engine computes per-rule confirmation rates (confirmed / total inferred edges) and surfaces rules with ≥3 edges in get_state() under inference_rule_effectiveness. Rules are dynamically ordered by confirmation rate — high-performing rules run first, and rules with 0 confirmations over ≥5 attempts are deprioritized.