Assay Robustness¶
Assay robustness scenarios repeat the same proof claim across controlled variation. They do not ask an agent whether the attack worked; each generated finding still depends on an oracle observation.
Run a scenario against an already running Seam API:
cd agentic-redteam/assay
python3 -m assay.cli robustness run \
--scenario scenarios/refund_robustness.yaml \
--seam http://127.0.0.1:8401 \
--out bundles/refund_m6
Assay does not start labs, containers, framework runtimes, or Seam. A scenario consumes saved cases and sends their probes through the configured Seam API. Runtime and model-profile axes are labels supplied by a lab wrapper.
Scenario Shape¶
id: refund_robustness_m6
description: Sweep refund framing survival.
case: ../cases/refund_framed_tripwire.yaml
trials: 3
alpha: 0.05
framings: [delegated_subtask, tool_response, authority_spoof]
axes:
route_order: [direct_first, laundered_first]
runtime: [langgraph]
model_profile: [deterministic_no_llm]
repetitions: 2
variables:
- label: default
values: {}
expect:
delta_confirmed: true
Supported axes:
route_order:direct_firstorlaundered_first.variables: named value maps for existing{{...}}case templates.case_matrix: additional saved cases, useful when labs provide separate L1/L2/L3 cases.runtime: metadata labels such aslanggraph; Assay records but does not start runtimes.model_profile: metadata labels such asdeterministic_no_llm; live model profiles are opt-in lab work.repetitions: repeat each axis combination.
The Lab L4 LangGraph workflow writes a generated runtime-axis scenario under its out/ directory. The tracked scenarios/langgraph_runtime_robustness.yaml is an example that points at that generated case file.
Bundle Output¶
bundles/refund_m6/
scenario.yaml
results.json
summary.json
findings/<run_id>.json
Each finding is schema-valid and records the surviving axis values under stats.robust_across. summary.json rolls those booleans up by framing, route order, case label, variable set, runtime, model profile, and repetition.