Skip to content

Roadmap

Current:

  • aipostex is the planned Discover lane for MCP/A2A/agent attack-surface discovery: endpoint inventories, Agent Cards, registries, tools, resources, prompts, schemas, auth/signing posture, and candidate technique tags.
  • Seam M1-M8/T35 implemented, including rules, mutate.replace, mutate.insert, mutate.merge, local payload-file references, operator CLI helpers, live expected-rule controls, API miss diagnostics, rule-pack artifacts, robustness scenarios, version output, release packaging, and release smoke automation.
  • Assay M1-M10 implemented as optional impact validation: saved cases, framed trials, file/callback/read oracles, multi-trial findings, explicit hypothesis binding, robustness scenario sweeps, runtime/model labels, reports, deterministic case-family craft/sweep, craft inspection, technique listing, negative-control reporting, and artifact-first craft providers.
  • meshmapper M1-M6 implemented as operational targeting: multi-source ingestion, deterministic graph, structural/provenance refs, M4 rules, unvalidated hypotheses, topology robustness bundles, and workbench summaries.
  • Lab L1 local deterministic refund laundering implemented.
  • Lab L2 Docker mini-mesh implemented.
  • Lab L3 deterministic framework-style matrix implemented for LangGraph, CrewAI, AutoGen, OpenAI Agents, and Microsoft Agent Framework shapes.
  • Lab L4 real LangGraph runtime mesh implemented with deterministic graph nodes and proof report output.
  • Lab L5 content-decision mesh implemented; success depends on a Seam rewrite changing message content.
  • Lab L6 full agent mesh implemented with Dockerized services, A2A/MCP/memory flows, internal Seam intercepts, and multi-hypothesis output.
  • Root ait workbench M8/T40 implemented for doctor checks, scenario-aware lab runs, terminal workbench views, React operator cockpit serving, single-transcript dashboards, transcript observation, operate Seam-first aliases, map targeting and launch aliases, prove optional validation aliases and templates, range exercise wrappers, run inspection, capture manifests, assessment handoff, report lookup, Seam field console depth, meshmapper targeting suggestions, validation planning, stored Seam action comparison/notes, and offensive demo packs.
  • Lab L7 CrewAI decision mesh implemented with stub defaults and opt-in real-runtime adapter metadata.
  • Lab L8 AutoGen decision mesh implemented with stub defaults and opt-in real-runtime adapter metadata.
  • Lab L9 OpenAI Agents-style decision mesh and Lab L10 Microsoft Agent Framework-style decision mesh implemented with deterministic stub defaults and opt-in real-runtime adapters over the L5 proof chain.
  • Seam operator CLI completion implemented: session status/tail/start guidance, rules explain, transcript inspection, and the offensive operator handbook.
  • Docs D1 and onboarding reset implemented: the first path is a Lab L6 proof run, with rewrite, assessment, artifact, evidence, troubleshooting, and command-audit pages.
  • T10 demo-first readiness implemented: ait demo, richer localhost dashboard summaries, artifact/log links, and cross-run comparison.
  • T10.1/T11/T12/T13/T14/T15/T16 implemented: known-good L6 demo checklist, dashboard drilldown tables, Markdown/HTML comparison packets, deeper deterministic L7/L8 real-mode runtime traces, live transcript observation, tabbed operator cockpit, and the React meshmapper visual console.
  • T17/T18/T19/T20 implemented: Seam offensive console data in the cockpit, graph-to-case binding scaffolds, technique-oriented offensive demo packs, and deterministic framework-depth labels through L10.
  • T21 implemented: cockpit quality sweep with semantic traffic outcomes, Assay proof board, meshmapper graph interaction, and tighter Seam console views.
  • T22 implemented: Assay offensive proof UX for craft/case-family sweeps, technique/mutation matrices, negative controls, oracle evidence, and replay artifacts.
  • T23-T25 implemented: meshmapper path-to-case evidence, cockpit-triggered Seam diagnostics, and technique-oriented offensive demo packs.
  • T26-T29 implemented: docs and cockpit reframe around Operate / Map / Validate, Seam-first ait operate shortcuts, operational ait map suggestions, and optional ait prove validation workflows.
  • R1 professional range scaffold implemented: docs, Ansible role boundaries, Ludus example config, raw Proxmox/OpenTofu examples, and demo-recording strategy.
  • R2 Ludus range pack implemented: compact, standard, and full-split profiles with matching Ludus configs, Ansible inventories, and range helper scripts.
  • R3 raw Proxmox/OpenTofu pack implemented: compact, standard, and full-split tfvars profiles, profile-aware plan templates, inventory generation, and plan/apply/destroy wrapper scripts.
  • T34 implemented: stored Seam trace, test, tail, inspect, and verify actions now have parsed cockpit summaries, pass/miss/fail status, and clickable seq-level diagnostics.
  • T35 implemented: Seam injection primitives, payload-file references, A2A/MCP insertion and merge rule packs, fixtures, and complete command reference.
  • T36-T40 implemented: map-to-rule launch plans, operator-fillable Assay validation templates, range exercise descriptors/wrappers, deeper real-runtime adapters through L10, and stored Seam action comparison/notes.
  • T41-T45 implemented: recoverable map-launch lifecycles, range check/execute/reset/collect automation, richer offensive demo metadata, validation-template inspect/promote, and optional L7-L10 runtime smoke commands.
  • T46/T48 implemented: explicit map-launch execution, cockpit/API launch actions, launch action history in /status, and fixture-backed demo-pack verification with Seam rule tests and negative-control checks.
  • T54/T58/T57/T56 implemented: aipostex discovery artifact contract, ait discover import/summarize, cockpit Attack Surface tab, discovery tag-to-rule-family mappings, fixture-backed discovery-oriented demo packs, and meshmapper attack-surface graph fusion.

Strategic direction:

  • Field use is now Discover / Operate / Map / Validate.
  • aipostex should discover and probe MCP/A2A attack surfaces, then write saved artifacts.
  • AIT should ingest those artifacts, map them through meshmapper, launch Seam rule families, and optionally generate Assay validation templates.
  • Seam remains the live offensive in-path tool; meshmapper remains targeting; Assay remains optional impact proof.

Next recommended work:

  1. T55 aipostex attack-surface pack: MCP/A2A testing for tool poisoning, schema poisoning, prompt/resource leakage, tool shadowing, delegation pivots, registry trust, memory/session exposure, and cross-agent credential relay candidates against the published discovery contract.
  2. T59 discovery-driven lab scenarios: expand L6/range scenarios for registry trust, tool shadowing, schema poisoning, resource leakage, memory/session exposure, delegation pivots, and credential relay.
  3. T60 optional Assay validation from discovery: generate validation templates from discovery candidates and meshmapper paths without making proof mandatory.
  4. T61 Seam discovered-surface live launch polish: tighter cockpit launch controls for discovery candidates, expected-rule counters, and rule trace artifacts.
  5. T62 discovery comparison and drift: compare attack-surface artifacts across scans/ranges and highlight new, removed, or higher-risk surfaces.

Later integration:

  • T59 expands L6/range scenarios for registry trust, tool shadowing, schema poisoning, resource leakage, memory/session exposure, delegation pivots, and credential relay.
  • T60 adds optional Assay validation templates from discovery rows and meshmapper paths.
  • Assay synthesizes richer route plans from explicit discovery/path bindings; current M5-lite binding is explicit and human-authored.
  • meshmapper expands topology sweeps with real-framework and discovery-derived graphs.
  • Reporting expands from single-lab proof reports to cross-lab comparison packets.
  • Optional live-model profiles and additional real-framework runtime labs follow once the hardening and workbench rails are stable.
  • Professional range exercises expand from local Docker to Ludus and raw Proxmox while preserving the same transcript, graph, finding, report, and cockpit artifact contract.