AIT Workbench¶

AIT has a main operator surface, but it is not a monolithic runtime. Seam, Assay, and meshmapper stay independent peer tools. The workbench coordinates them, manages run directories, captures logs, tracks artifacts, and renders reports.

Seam has a rich CLI for a narrower purpose: operating the in-path offensive instrument. Seam manages rules, transcripts, profiles, transport modes, and robustness scenarios. ait coordinates Seam plus Assay plus meshmapper.

Positioning¶

The workbench is a workflow supervisor:

starts lab workflows and tool subprocesses through public entrypoints
writes cases, rules, scenarios, and run manifests
records stdout/stderr logs
calls existing CLIs and local APIs, especially Seam's rule, transcript, profile, and API surfaces
collects transcripts, graphs, paths, findings, robustness bundles, and reports under one run directory

It must not:

replace Seam's transport implementation
replace Assay's validation/oracle logic
replace meshmapper's graph inference
treat agent self-report as evidence
hide safety boundaries such as no transparent TLS interception

M0: Safety And Correctness Gate¶

Before broadening usability, fix silent failures and unsafe defaults that could make a run misleading or overexposed:

default Seam tap/proxy examples and CLI defaults to loopback; require explicit opt-in for remote data-plane listeners
require a per-process API token for Seam control endpoints
write transcripts with owner-only permissions and add default secret-header redaction for report surfaces
make in-path HTTP clients use bounded timeouts and prevent redirect escapes unless explicitly enabled
fix protocol classification and JSON-RPC id preservation for MCP traffic
make transform rule load fail on invalid regex and non-map mutation intermediates
validate Assay findings against the canonical shared schema
make confidence-interval dependencies explicit and avoid zero-width fallback intervals
add Python CI and schema-drift checks

M1: Seam Operator CLI¶

Seam is operationally useful before the root workbench drives it.

seam doctor
seam rules list
seam rules test --rules rules/ --fixture examples/a2a-agent-card.json
seam rules trace --transcript out.json
seam rules explain --rules rules/ --rule a2a_prompt_laundering_replace
seam transcript inspect --transcript out.json --schema schemas/transcript.schema.json
seam session status --transcript out.json
seam session tail --transcript out.json --limit 5
seam transcript redact --transcript out.json --out out.redacted.json --schema schemas/transcript.schema.json
seam profile list
seam profile run lab --mode proxy --upstream http://127.0.0.1:8500 --rules rules/

Seam does not call Assay or meshmapper. The current helper surface is enough for ait to validate rules, inspect transcripts, launch profile-backed tap or proxy runs, summarize session state, and create report-safe redacted transcript copies.

Acceptance: an operator can start a safe local proxy, apply a rule, see whether it fired, inspect the transcript, verify hashes, and debug common failures without hand-stitching curl calls.

M2: Guided `ait` CLI¶

The root CLI coordinates existing tools through subprocesses and files. It uses public tool surfaces rather than duplicating Seam, Assay, or meshmapper internals. M2 adds scenario-aware lab runs and run inspection so docs and demos can point to one stable operator path.

Primary commands:

ait doctor
ait lab list
ait demo full-agent-mesh --scenario content_rewrite --trials 1
ait lab run langgraph-refund --trials 3
ait lab run content-decision --trials 3
ait lab run full-agent-mesh --trials 3
ait lab run full-agent-mesh --scenario content_rewrite --trials 1
ait run inspect --run .ait/runs/<run-id>
ait workbench serve --run .ait/runs/<run-id>
ait compare --runs .ait/runs/l6 .ait/runs/l7 --out .ait/runs/comparison.json
ait operate proxy --upstream http://127.0.0.1:8500 --rules agentic-redteam/seam/rules
ait map run --transcript .ait/runs/<run-id>/transcripts/proxy.json
ait map suggest --run .ait/runs/<map-run-id>
ait prove from-run --run .ait/runs/<run-id> --case cases/refund_tripwire.yaml
ait capture --upstream http://127.0.0.1:8500
ait assess --case cases/refund_family.yaml --seam http://127.0.0.1:8401
ait report --run .ait/runs/<run-id>

Run layout:

.ait/runs/<timestamp>-<slug>/
  run.json
  logs/
  seam/
  assay/
  meshmapper/
  transcripts/
  graphs/
  paths/
  findings/
  reports/

ait doctor checks local tool readiness and the no-transparent-TLS caveat.

ait lab run is the first supported path for new users. It runs a complete lab script and captures logs, manifests, important artifact paths, checksums, lab id, scenario, trial count, and exit status.

ait demo is the polished full-chain demo path. It runs the selected lab and prints the dashboard command, report path, finding verdict, transcript count, rewrite count, graph/path files, hypothesis classes, and expectation status.

ait operate is the Seam-first field path. It runs or records a Seam tap, proxy, or stdio wrapper, stores a run manifest, and can serve the cockpit without invoking Assay.

ait map is the targeting path. It calls meshmapper over saved transcripts and discovery/config artifacts, then prints or stores ranked path suggestions.

ait prove is the optional validation path. It calls Assay only when the operator needs a finding backed by oracle evidence.

ait run inspect summarizes a run directory in text or JSON. It reports the lab scenario, exit status, report path, finding summary, transcript file count, transcript record count, graph, paths, and expectations file.

ait workbench serve exposes a local operator cockpit with tabs for operate traffic, selected-message inspection, Seam proxy state, meshmapper targeting hypotheses, optional Assay validation, artifacts, and logs.

ait compare summarizes completed run folders across labs so operators can compare findings, graph refs, rewrite counts, hypothesis classes, robustness summaries, and report links.

ait assess is the first path toward user-owned targets. It calls assay run with a saved case or case family and records the command and output in the run manifest.

M3-M7: Interactive Workbench And Cockpit¶

The workbench now has terminal and browser modes:

ait workbench
ait workbench serve --run .ait/runs/<run-id>
ait workbench lab full-agent-mesh --scenario content_rewrite --trials 1 --serve

The browser cockpit is a React app served by the Python ait process. It still reads only run artifacts through /status, /api/ui-config, and /artifact/...; it does not replace Seam, Assay, or meshmapper internals.

Implemented views:

active processes and ports
live A2A/MCP/HTTP traffic feed and selected-message inspection
Seam proxy command, listener/upstream, rule matches, rewrite counts, and transcript tail
meshmapper graph visual, node/edge summaries, hypothesis classes, selected path highlighting, graph refs, trust gaps, and provenance warnings
optional Assay validation progress and oracle observations
report, artifact, and log links

M4: Real-Deciding Targets¶

Lab L5 exercises a target that is not merely prewired to confirm a route name:

deterministic heuristic target whose decision changes only when Seam rewrites traffic
baseline mode where direct and laundered routes both fail without the rewrite
proxy-mode run where a Seam mutate.replace rule fires
Assay findings based on oracle evidence, not fixture route names

Acceptance: a demo can fail for the right reason, pass for the right reason, and show the difference in transcripts and reports.

M5: Payload Craft Integration¶

Assay originates richer attacks without sacrificing reproducibility:

assay craft materializes case-family files from a versioned technique corpus.
ait assess can run a saved case-family artifact after assay craft materializes it.
Optional live/adaptive generation remains offline and explicit future work; the saved case-family is the replayable validation input.

Acceptance: operators can vary framings and mutations while reviewers can rerun the exact same payload bytes.