AIT Workbench Guide¶

The root ait CLI coordinates tools through public CLIs and files. It is not a replacement for Seam, Assay, or meshmapper.

Run A Lab¶

For the canonical visible validation path, use demo:

python3 -m ait.cli demo full-agent-mesh --scenario content_rewrite --trials 1

It runs Lab L6, then prints the run directory, dashboard command, report path, finding verdict, transcript count, rewrite count, graph/path files, hypothesis classes, and expectation status.

The lower-level lab command remains available:

python3 -m ait.cli lab run full-agent-mesh --scenario content_rewrite --trials 1

List technique-oriented demo packs:

python3 -m ait.cli demo list

Run a packaged demo by technique:

python3 -m ait.cli demo run a2a-content-rewrite --trials 1

Inspect the run:

python3 -m ait.cli run inspect --run .ait/runs/<run-id>

Run Directory¶

.ait/runs/<run-id>/
  run.json
  logs/
  lab/
    transcripts/
    graph.json
    paths.json
    finding.json
    robustness/
    report/

run.json records commands, exit code, lab id, scenario, trial count, artifact paths, and checksums.

Seam-First Operation¶

Use Seam directly or through ait operate when you need live in-path control:

testing a specific rule
tailing a transcript while traffic is flowing
wrapping a local MCP stdio server
debugging a rewrite that did not match
running a standalone proxy outside a lab

Use ait when the task is a complete workflow that should produce a run directory.

python3 -m ait.cli operate proxy \
  --upstream http://127.0.0.1:8500 \
  --rules agentic-redteam/seam/rules \
  --serve

Map a finished capture:

python3 -m ait.cli map run \
  --transcript .ait/runs/<operate-run>/transcripts/proxy.json

python3 -m ait.cli map suggest --run .ait/runs/<map-run>

Validate impact only when needed:

python3 -m ait.cli prove from-run \
  --run .ait/runs/<operate-run> \
  --case agentic-redteam/assay/cases/refund_tripwire.yaml

Live View¶

python3 -m ait.cli workbench --run .ait/runs/<run-id>
python3 -m ait.cli workbench lab full-agent-mesh --scenario content_rewrite --trials 1

The terminal workbench summarizes active run metadata, logs, ports, transcript counts, latest Seam hashes, rule rewrite counts, meshmapper graph/path totals, Assay route stats, expectations, and report paths. It reads public artifacts only; it does not reimplement Seam, Assay, or meshmapper internals.

Watch mode refreshes snapshots while a run is active:

python3 -m ait.cli workbench --run .ait/runs/<run-id> --watch --interval 1
python3 -m ait.cli workbench lab full-agent-mesh --scenario content_rewrite --trials 1 --watch --record-status

--record-status appends JSON snapshots to workbench/status.jsonl inside the run directory.

Local Cockpit¶

Serve the same workbench snapshot through the localhost cockpit:

python3 -m ait.cli workbench serve \
  --run .ait/runs/<run-id> \
  --listen 127.0.0.1:8787

The cockpit exposes /status as JSON and artifact links rooted inside the selected run directory. It does not start targets, store data in a database, or reimplement Seam, Assay, or meshmapper internals.

For live viewing, start the cockpit before the lab process:

python3 -m ait.cli workbench lab full-agent-mesh \
  --scenario content_rewrite \
  --trials 1 \
  --serve \
  --listen 127.0.0.1:8788

The cockpit tabs are:

Operate: filterable A2A/MCP/HTTP records as they appear.
Message: selected before/after decoded values and rewrite diffs.
Seam Ops: listener, upstream, rules, counters, recent transcript tail, and equivalent CLI commands.
Map: interactive graph visual, node/edge/path filters, selected path details, and provenance warnings.
Validate: optional impact board with direct/laundered route delta, oracle evidence, trial matrix, confidence fields, and robustness axes.
Artifacts: grouped files plus log previews.

Use those drilldowns to walk from "A2A traffic crossed Seam" to "the rule fired" to "meshmapper found a likely path" to "the oracle observed the side effect when validation was required." See Operator Cockpit for the tab-by-tab guide.

In the L6 content_rewrite demo, some 403 records are expected controls. POST /direct status=403 means the baseline route refused the privileged refund, which is the desired control behavior. POST /laundered status=403 means an attack/control attempt was blocked before the rewritten content caused the planner to accept. The cockpit labels those rows by outcome instead of treating every 403 as a generic failure.

Serve a single transcript when you only ran a tap/proxy and do not have a full AIT run directory:

python3 -m ait.cli workbench serve \
  --transcript /tmp/seam-tap.json \
  --listen 127.0.0.1:8787

Observe A Transcript¶

ait observe is a terminal traffic feed for a Seam transcript:

python3 -m ait.cli observe --transcript /tmp/seam-tap.json --follow

It prints the transcript file, sequence number, flow, direction, logical source and destination when available, protocol/kind, operation, rule id, text/tool details, and status. This is the quickest way to answer "is traffic crossing my tap?" while a client is active.

Compare Runs¶

Compare completed run folders when preparing demos or checking whether a rewrite survives across labs:

python3 -m ait.cli compare \
  --runs .ait/runs/l6-content .ait/runs/l7-crewai .ait/runs/l8-autogen \
  --out .ait/runs/comparison.json

Render Markdown or HTML packets for demo prep:

python3 -m ait.cli compare \
  --runs .ait/runs/l6-content .ait/runs/l7-crewai .ait/runs/l8-autogen \
  --out .ait/runs/comparison.md \
  --format markdown

python3 -m ait.cli compare \
  --runs .ait/runs/l6-content .ait/runs/l7-crewai .ait/runs/l8-autogen \
  --out .ait/runs/comparison.html \
  --format html

The comparison summarizes exit codes, findings, rule rewrites, graph refs, hypothesis classes, robustness summaries, and report links. It is an operator artifact, not a new proof engine.

Bind A Hypothesis To Assay¶

meshmapper paths are hypotheses. To turn one into an Assay starting point, write an explicit binding scaffold and fill the target-specific routes and oracle:

python3 -m ait.cli bind hypothesis \
  --run .ait/runs/<run-id> \
  --hypothesis-id <hypothesis-id-or-index> \
  --out bindings/<hypothesis-id>.yaml

After editing every TODO, run Assay through AIT:

python3 -m ait.cli assess \
  --hypotheses .ait/runs/<run-id>/lab/paths.json \
  --hypothesis-id <hypothesis-id> \
  --binding .ait/runs/<run-id>/bindings/<hypothesis-id>.yaml \
  --seam http://127.0.0.1:8401