AIT Workbench Guide¶
The root ait CLI coordinates tools through public CLIs and files. It is not a replacement for Seam, Assay, or meshmapper.
Run A Lab¶
For the canonical visible validation path, use demo:
python3 -m ait.cli demo full-agent-mesh --scenario content_rewrite --trials 1
It runs Lab L6, then prints the run directory, dashboard command, report path, finding verdict, transcript count, rewrite count, graph/path files, hypothesis classes, and expectation status.
The lower-level lab command remains available:
python3 -m ait.cli lab run full-agent-mesh --scenario content_rewrite --trials 1
List technique-oriented demo packs:
python3 -m ait.cli demo list
Run a packaged demo by technique:
python3 -m ait.cli demo run a2a-content-rewrite --trials 1
Inspect the run:
python3 -m ait.cli run inspect --run .ait/runs/<run-id>
Run Directory¶
.ait/runs/<run-id>/
run.json
logs/
lab/
transcripts/
graph.json
paths.json
finding.json
robustness/
report/
run.json records commands, exit code, lab id, scenario, trial count, artifact paths, and checksums.
Seam-First Operation¶
Use Seam directly or through ait operate when you need live in-path control:
- testing a specific rule
- tailing a transcript while traffic is flowing
- wrapping a local MCP stdio server
- debugging a rewrite that did not match
- running a standalone proxy outside a lab
Use ait when the task is a complete workflow that should produce a run directory.
python3 -m ait.cli operate proxy \
--upstream http://127.0.0.1:8500 \
--rules agentic-redteam/seam/rules \
--serve
Map a finished capture:
python3 -m ait.cli map run \
--transcript .ait/runs/<operate-run>/transcripts/proxy.json
python3 -m ait.cli map suggest --run .ait/runs/<map-run>
Validate impact only when needed:
python3 -m ait.cli prove from-run \
--run .ait/runs/<operate-run> \
--case agentic-redteam/assay/cases/refund_tripwire.yaml
Live View¶
python3 -m ait.cli workbench --run .ait/runs/<run-id>
python3 -m ait.cli workbench lab full-agent-mesh --scenario content_rewrite --trials 1
The terminal workbench summarizes active run metadata, logs, ports, transcript counts, latest Seam hashes, rule rewrite counts, meshmapper graph/path totals, Assay route stats, expectations, and report paths. It reads public artifacts only; it does not reimplement Seam, Assay, or meshmapper internals.
Watch mode refreshes snapshots while a run is active:
python3 -m ait.cli workbench --run .ait/runs/<run-id> --watch --interval 1
python3 -m ait.cli workbench lab full-agent-mesh --scenario content_rewrite --trials 1 --watch --record-status
--record-status appends JSON snapshots to workbench/status.jsonl inside the run directory.
Local Cockpit¶
Serve the same workbench snapshot through the localhost cockpit:
python3 -m ait.cli workbench serve \
--run .ait/runs/<run-id> \
--listen 127.0.0.1:8787
The cockpit exposes /status as JSON and artifact links rooted inside the
selected run directory. It does not start targets, store data in a database, or
reimplement Seam, Assay, or meshmapper internals.
For live viewing, start the cockpit before the lab process:
python3 -m ait.cli workbench lab full-agent-mesh \
--scenario content_rewrite \
--trials 1 \
--serve \
--listen 127.0.0.1:8788
The cockpit tabs are:
- Operate: filterable A2A/MCP/HTTP records as they appear.
- Message: selected before/after decoded values and rewrite diffs.
- Seam Ops: listener, upstream, rules, counters, recent transcript tail, and equivalent CLI commands.
- Map: interactive graph visual, node/edge/path filters, selected path details, and provenance warnings.
- Validate: optional impact board with direct/laundered route delta, oracle evidence, trial matrix, confidence fields, and robustness axes.
- Artifacts: grouped files plus log previews.
Use those drilldowns to walk from "A2A traffic crossed Seam" to "the rule fired" to "meshmapper found a likely path" to "the oracle observed the side effect when validation was required." See Operator Cockpit for the tab-by-tab guide.
In the L6 content_rewrite demo, some 403 records are expected controls.
POST /direct status=403 means the baseline route refused the privileged
refund, which is the desired control behavior. POST /laundered status=403
means an attack/control attempt was blocked before the rewritten content caused
the planner to accept. The cockpit labels those rows by outcome instead of
treating every 403 as a generic failure.
Serve a single transcript when you only ran a tap/proxy and do not have a full AIT run directory:
python3 -m ait.cli workbench serve \
--transcript /tmp/seam-tap.json \
--listen 127.0.0.1:8787
Observe A Transcript¶
ait observe is a terminal traffic feed for a Seam transcript:
python3 -m ait.cli observe --transcript /tmp/seam-tap.json --follow
It prints the transcript file, sequence number, flow, direction, logical source and destination when available, protocol/kind, operation, rule id, text/tool details, and status. This is the quickest way to answer "is traffic crossing my tap?" while a client is active.
Compare Runs¶
Compare completed run folders when preparing demos or checking whether a rewrite survives across labs:
python3 -m ait.cli compare \
--runs .ait/runs/l6-content .ait/runs/l7-crewai .ait/runs/l8-autogen \
--out .ait/runs/comparison.json
Render Markdown or HTML packets for demo prep:
python3 -m ait.cli compare \
--runs .ait/runs/l6-content .ait/runs/l7-crewai .ait/runs/l8-autogen \
--out .ait/runs/comparison.md \
--format markdown
python3 -m ait.cli compare \
--runs .ait/runs/l6-content .ait/runs/l7-crewai .ait/runs/l8-autogen \
--out .ait/runs/comparison.html \
--format html
The comparison summarizes exit codes, findings, rule rewrites, graph refs, hypothesis classes, robustness summaries, and report links. It is an operator artifact, not a new proof engine.
Bind A Hypothesis To Assay¶
meshmapper paths are hypotheses. To turn one into an Assay starting point, write an explicit binding scaffold and fill the target-specific routes and oracle:
python3 -m ait.cli bind hypothesis \
--run .ait/runs/<run-id> \
--hypothesis-id <hypothesis-id-or-index> \
--out bindings/<hypothesis-id>.yaml
After editing every TODO, run Assay through AIT:
python3 -m ait.cli assess \
--hypotheses .ait/runs/<run-id>/lab/paths.json \
--hypothesis-id <hypothesis-id> \
--binding .ait/runs/<run-id>/bindings/<hypothesis-id>.yaml \
--seam http://127.0.0.1:8401