Skip to content

Assay Craft

Assay craft turns a saved technique corpus plus variables into a replayable case family. The generated artifact is reviewed first, then executed with assay run --case-family.

Generate A Family

python3 -m assay.cli craft \
  --intent refund \
  --techniques techniques/agentic.yaml \
  --vars vars/refund.yaml \
  --out cases/refund_family.yaml

The saved family includes stable probe_id, technique_id, and mutation_id fields for every generated probe. Assay never sends generated payloads directly from a live model; probes are saved first so the operator can inspect and replay exactly what will run.

Inspect Before Running

python3 -m assay.cli craft inspect \
  --case-family cases/refund_family.yaml

AIT exposes the same inspection path from the root workbench:

python3 -m ait.cli assay inspect-family \
  --case-family agentic-redteam/assay/cases/refund_family.yaml \
  --json

Use JSON when another tool needs to consume the summary:

python3 -m assay.cli craft inspect \
  --case-family cases/refund_family.yaml \
  --json

Inspection prints route and probe summaries: method, path, header keys, body JSON keys, framings, techniques, mutations, and negative controls. The saved case-family file remains the replayable payload artifact.

List Techniques

python3 -m assay.cli craft list-techniques \
  --techniques techniques/agentic.yaml

From the root workbench:

python3 -m ait.cli assay list-techniques \
  --techniques agentic-redteam/assay/techniques/agentic.yaml \
  --json

The default technique corpus covers delegated subtask, tool response, authority spoof, prompt laundering, indirect instruction, value echo, and defensive negative controls. Mutations include identity, delimiter wrapping, nested JSON smuggling, Unicode/confusable text, base64 value wrapping, and markdown/spacing variants.

Run The Family

python3 -m assay.cli run \
  --case-family cases/refund_family.yaml \
  --seam http://127.0.0.1:8401 \
  --out finding.json \
  --trials 10

The finding reports route totals, per-technique totals, per-mutation totals, and a technique-by-mutation matrix. Negative controls are reported separately so an unexpected control success is visible, but oracle observations remain the only proof source.

Proof Sweep In The Cockpit

For an operator proof sweep:

  1. Craft the family and keep the generated file.
  2. Inspect the family and confirm the techniques/mutations are the ones you want.
  3. Run the case family through a Seam API target.
  4. Open the AIT cockpit and use Assay -> Techniques, Mutations, and Negative Controls to see which probes survived.
  5. Use Oracle Evidence to confirm the side effect, then Replay Artifacts to open the finding, report, transcripts, and saved case-family file.

The matrix cells link back to matching trial rows so a successful technique/mutation pair can be traced to its probe id, mutation id, transcript refs, and oracle observation.

Saved-Artifact Live Craft

Live craft profiles are artifact generators, not proof runners. The default profile is offline/template-based and does not call a model:

python3 -m assay.cli craft live \
  --profile profiles/openai.yaml \
  --intent refund \
  --vars vars/refund.yaml \
  --out cases/refund_live_family.yaml

The output is a normal case-family file with an additional generation block: profile id, provider, model/runtime label, prompt template id, source checksum, timestamp, generated probe count, and live_model_called: false. Inspect it before running:

python3 -m assay.cli craft inspect \
  --case-family cases/refund_live_family.yaml

Reports show the generation provenance, but the verdict still depends only on oracle observations.

Provider Profiles

Live craft currently supports two artifact-first providers:

  • template: expands techniques and mutations from the profile itself.
  • subprocess: runs a local generator command that must write a complete case-family artifact to --out.

A subprocess profile passes --intent, --vars, and --out to the command:

python3 -m assay.cli craft live \
  --profile profiles/subprocess.yaml \
  --intent refund \
  --vars vars/refund.yaml \
  --out cases/refund_subprocess_family.yaml

Subprocess generation is still offline from proof execution. Assay validates and annotates the saved family before it can be passed to assay run.