Skip to content

Roadmap

Tracking progress, priorities, and planned work for aipostex.


Shipped

Completed milestones listed in reverse chronological order.

v0.1.2 depth (Track 1 + Track 5)

  • Jupyter notebooks --mine-secrets: optional parallel fetch of each listed .ipynb with cell-level credential mining (same patterns as read-notebook).
  • Credchain: MLflow run findings with run_id metadata register mlflow-run-id credentials; auto-chain suggests mlflow artifacts --run-id <id>.
  • MLflow: runs sensitive-parameter and artifact URI extraction covered by tests including Snowflake-style artifact_uri values; see docs/modules/mlflow.md.
  • File discovery: rules without content_patterns keep max_file_size: 0 at load time (filename-only); rules with content patterns still default to a 10MB read cap. New bundled rules: ML training CSV headers under dataset/fine-tune paths; Hugging Face dataset_infos.json / dataset_dict.json under cache or datasets paths.

CVE Template Expansion (Multi-Service)

Expanded CVE coverage from MCP-only (7 templates) to 6 service families (17 CVE templates). New templates:

CVE Service Severity Capability
CVE-2025-6514 mcp-remote Critical (9.6) Client-side RCE via OAuth authorization_endpoint injection
CVE-2025-53109 Filesystem MCP High (8.4) Symlink bypass → sandbox escape → code execution
CVE-2025-53110 Filesystem MCP High (7.3) Directory containment bypass via prefix matching
CVE-2025-63389 Ollama Critical (9.1) Missing authentication on all API endpoints ≤ v0.12.3
CVE-2025-11201 MLflow Critical (9.8) Directory traversal in model version creation → arbitrary file write → RCE
CVE-2025-48889 Gradio High (7.5) Arbitrary file copy via flagging endpoint path manipulation
CVE-2025-51471 Ollama Medium (6.9) Cross-domain token theft via malicious WWW-Authenticate realm
CVE-2025-34351 Ray High (8.8) Token authentication disabled by default in v2.52.0
GHSA-8fr4-5q9j-m8gm vLLM Critical (9.8) RCE via auto_map config class bypassing trust_remote_code
CVE-2025-68664 LangChain Critical (9.3) Secret exposure via deserialization
GHSA-mrw7-hf4f-83pf vLLM High (8.6) Deserialization DoS/RCE via sparse tensor memory corruption

OpenAI-Compat Tool Enumeration

Added openai-compat tool-enum for read-only function and tool capability probing, attacker-defined tool injection validation, and forced tool invocation checks against exposed OpenAI-compatible endpoints.

OpenAI-Compat Prompt Injection Testing

Added openai-compat prompt-test for read-only instruction-override, role-confusion, delimiter-escape, jailbreak, and refusal-bypass probing against exposed OpenAI-compatible endpoints.

Deeper Module Exploitation (Tier 2)

Extended four exploit modules with higher-impact capabilities:

Module New Subcommands Capability
Ollama exfiltrate Model weight blob download probing (HEAD + Range GET proof)
Jupyter start-kernel, reverse-shell-proof, pip-proof Kernel creation, outbound socket proof, pip install proof
Ray pip-inject, cluster-info Runtime env pip injection, cluster resource exfiltration
MLflow tamper-proof Experiment/run creation + parameter logging (write access proof)

MCP Stdio Transport (Tier 2)

Refactored mcp.Client to use the Transport interface internally. NewStdioClient creates a client backed by a subprocess (NDJSON over stdin/stdout). CLI flags --transport stdio, --stdio-command, and --stdio-args enable enum and poison operations against local MCP servers without HTTP.

Auto-Badge and Evidence Consistency

All template-based (vulncheck) findings now automatically receive proof metadata (proof_stage, proof_strength) and auto-generated HTTP request/response evidence. Detection templates are badged VERIFIED; exploit templates are badged EXPLOITED. The HTML report renders these badges and evidence blocks consistently.

Scan Mode Separation

Templates carry a type field (detection or exploit). The engine respects a --mode flag: detect (default, safe) runs only detection templates; full includes 46 exploit templates that send active payloads (SSRF, RCE, path traversal, inference, terminal creation, job submission, artifact exfiltration, data extraction, prompt injection, and weak-auth compute theft validation). The scan mode is recorded in finding metadata and displayed in the HTML report engagement strip.

Executive HTML Report

Self-contained HTML report with sticky sidebar TOC, severity summary cards, per-host finding cards with expand/collapse, proof badges, collapsible evidence blocks, exploitation summary table, and print-optimized CSS. Suitable for executive delivery.

Reporting Pipeline

Eight output formats: Console, JSON, JSONL, CSV, HTML, SARIF, Markdown, PDF. The report command generates consolidated reports from JSON or JSONL findings. The summary command produces executive summaries. The bundle command packages engagements into zip archives. The graph command generates finding correlation graphs (Mermaid/Graphviz).

Post-Exploitation Modules

17 dedicated exploit modules:

Module Subcommands Key Capabilities
Ollama 10 Enum, prompt extraction, inference, model copy/create/delete/poison, weight exfiltration
Vector DBs 3 ChromaDB/Weaviate/Qdrant enum, extract, sensitive search (27 patterns)
Jupyter 8 Server enum, kernel listing, notebook read, code execution, start-kernel, reverse-shell-proof, pip-proof
MCP 3 Config analysis, remote enum (HTTP + stdio), poison (SSRF/RCE/path-traversal variants)
OpenAI-Compatible 9 Auth sweep, enum, inference validation, prompt extraction, tool enumeration, prompt testing, throughput, proxy, LiteLLM
LiteLLM 4 Config extraction, budget/spend probe, proxy chain analysis, credential discovery
Ray 8 Dashboard enum, job listing/logs/artifacts, job submission, runtime-env, pip-inject, cluster-info
MLflow 9 Tracking enum, experiments, runs, artifacts, registry, model versions, tamper-proof
Gradio 7 Config enum, predict, queue probe, upload/download, file chain, serve probe
BentoML 4 Service enum, route listing, metrics, prediction
Triton 7 Model enum, detail, config, repository index, SHM probe, inference, model load/unload
TorchServe 7 Model enum, detail, metrics, prediction, model register/scale/unregister
HuggingFace TGI/TEI 5 Service auto-detect, model enum, metrics, generation, embedding
TensorFlow Serving 5 Model discovery, metadata/signature extraction, metrics, inference
Kubeflow 6 Pipeline/run/experiment enum, notebooks, pipeline run submission
W&B 5 Server enum, projects, runs, artifacts, secret extraction
A2A 11 Agent card enum, skills, tasks, streaming, push hijack, MCP pivot, tool injection

Vulnerability Templates

131 YAML templates across 22 service categories (85 detection, 46 exploit). Templates support matchers, extractors, severity, CVSS, and proof metadata.

Network Fingerprinting

30 HTTP-based service probes covering Ollama, ChromaDB, Weaviate, Qdrant, vLLM, LiteLLM, LM Studio, LocalAI, Jupyter, MLflow, Gradio, Streamlit, Ray, Open WebUI, MCP (SSE/Inspector/MCPJam), OpenAI-compatible, Triton, TorchServe (inference + management), TF Serving, HF TGI/TEI, LangServe, Kubeflow, BentoML, W&B, A2A. CIDR expansion, concurrent probing, and automatic template matching.

File Discovery

49 rules across 5 packs (api_keys, mcp_configs, local_llm, vectordb_rag, core_assessment). Pattern-matching for API keys, model files, MCP configs, vector DB data, fine-tuning datasets, and RAG pipelines.

Core Framework

  • 100+ operator commands across scanning, reporting, templates, and 18 exploit modules
  • Finding deduplication, merge, and collection management
  • OPSEC controls: stealth mode, proxy (HTTP/HTTPS/SOCKS5), TLS skip, concurrency caps, User-Agent rotation
  • Operator progression: workflow recommendations chain discovery into exploitation
  • Safe by default: --mode detect and --force-exploit gating
  • CI/CD with GitHub Actions (build, test, lint)

Exploit Template Expansion (Tier 1)

Added 5 exploit templates for services that previously had detection-only coverage:

Template Service Severity Capability
jupyter-exploit-001-terminal-rce Jupyter Critical Terminal creation = RCE proof
ray-exploit-001-job-submit-rce Ray Critical Job submission = cluster RCE
mlflow-exploit-001-artifact-exfil MLflow High Experiment -> run -> artifact exfiltration chain
gradio-exploit-001-file-read Gradio Critical Arbitrary file read via /file=
chromadb-exploit-001-data-exfil ChromaDB High Collection enumeration -> document extraction

Credential Chain-Loading

Automatic credential extraction from scan findings (internal/credchain). When discover network or assess network discovers credentials in findings (Jupyter tokens, OpenAI API keys, HF tokens, Anthropic keys, bearer tokens, generic API keys), they are automatically injected into workflow recommendations. Console output shows a credential chain summary.


Development Tracks

Work is organized into five capability tracks. Each track has an offensive rationale and concrete deliverables.

Step-by-step plans and verification checklists for upcoming milestones: development/plans/README.md.


Track 1: Extraction Depth

Make existing modules find more and prove deeper impact on services we already touch.

Item Module What it adds Priority
MLflow: emit artifact_uri and param connection strings as findings mlflow Extracts S3/GCS/Snowflake URIs already parsed in Run.ArtifactURI and Run.Params v0.1.2
Jupyter: mine notebook cells for secrets jupyter Regex scan of cell source for API keys, credentials, connection strings during notebooks subcommand v0.1.2
Ollama: model value scoring ollama Score models by parameter size, family, quantization for impact prioritization v0.2.0
Vector DBs: semantic proximity search vectordb Query by embedding vector to find semantically similar sensitive data (not just keyword regex) v0.8.0
LiteLLM: dedicated exploit module litellm command Config key extraction, budget/spend enumeration, proxy chain analysis, credential discovery shipped v0.2.0
MCP: multi-step exploitation chains mcp Use path traversal results to feed credential extraction, then pivot v0.8.0
Gradio: SSRF through prediction endpoints gradio Test if prediction endpoints make outbound requests based on user input v0.8.0

Track 2: New Attack Surfaces

Cover attack surfaces the tool doesn't touch yet.

Item Type What it adds Priority
Model supply chain validation model-scan command Detect pickle/PyTorch deserialization RCE vectors in .pkl/.pt/.bin files; validate model hashes against registry manifests shipped v0.2.0
Kubernetes AI workload discovery NEW fingerprint probes + templates K8s API enumeration for GPU nodes, model serving deployments (KServe, Seldon, BentoML), training jobs v0.8.0
Cloud AI service probing NEW fingerprint probes + templates SageMaker endpoints, Bedrock runtime, Vertex AI, Azure OpenAI/ML — auth validation, model enumeration v0.8.0
Embedding injection vectordb inject + metadata-inject Add poisoned documents to vector stores to manipulate RAG responses (guarded by --force-exploit) shipped v0.2.0
LLM output exfiltration testing openai-compat exfil-test Test for data exfiltration via LLM responses (encoding, side-channel, steganography) v0.8.0

Track 3: Exploitation Chaining

Connect individual findings into attack paths. Complete the operator workflow.

Item Type What it adds Priority
validate command NEW command Re-check findings against live targets, confidence scoring (confirmed/stale/remediated), delta reporting v0.8.0
Automatic credential forwarding credchain enhancement When file discovery finds an API key, auto-inject it into the matching exploit module invocation (not just suggestions) v0.8.0
Campaign reporting report enhancement Cross-engagement diff, temporal comparison, auth pattern summary, model value rollup in one artifact v0.8.0
Engagement state persistence NEW engagement command Save/resume multi-phase engagements, track which findings led to which exploits v0.8.0

Track 4: Detection & Probing

Expand fingerprinting and template coverage to new services.

Item Type What it adds Priority
BentoML fingerprint probe + 3 templates Model serving framework, widely deployed, REST + gRPC shipped v0.2.0
Seldon Core fingerprint probe + 2-3 templates K8s-native model serving, prediction + feedback APIs v0.8.0
KServe fingerprint probe + 2-3 templates Serverless model inference on K8s, InferenceService CRD v0.8.0
Weights & Biases fingerprint probe + 3 templates Experiment tracking — often exposes API keys, training data, model artifacts shipped v0.2.0
LangSmith / LangFuse fingerprint probe + templates LLM observability — traces, prompts, completions, evaluation data v0.8.0
Flowise / Dify / n8n-AI fingerprint probes + templates Low-code LLM app builders — common in enterprise, often unauthenticated v0.8.0
CVE template refresh ongoing New CVE-specific templates as AI infrastructure CVEs are published ongoing

Track 5: File Discovery

Expand on-disk detection to cover categories we miss today.

Item Rules What it adds Priority
Model file detection 3-4 rules .safetensors, .gguf, .onnx, .pkl, .pt — model files indicate training/serving, .pkl/.pt indicate deserialization risk v0.1.2
Training data detection 2-3 rules .parquet, .arrow, .tfrecord, .csv with ML column patterns — datasets indicate data exposure v0.1.2
Kubernetes AI configs 2-3 rules InferenceService, ServingRuntime, TrainingJob YAML manifests v0.8.0
CI/CD pipeline configs 2-3 rules GitHub Actions / GitLab CI model training workflows with credential references v0.8.0
Experiment tracking configs 2 rules W&B wandb/settings, MLflow mlflow.db, Neptune config files with API keys shipped v0.2.0

Release Plan

v0.1.2 (immediate)

  • Track 1: MLflow artifact_uri extraction, Jupyter cell secret mining
  • Track 5: Model file + training data discovery rules
  • Doc truth sync (completed)
  • Target: 95%+ lab coverage

v1.3.0 (shipped)

Kubernetes in-cluster lateral movement — deepens the k8s module from read-on-one-apiserver into proven traversal of identities and namespaces within a single cluster:

  • access-reviewSelfSubjectRulesReview maps the current identity's real authorization (read-confirmed when it self-reviews, honest reachable when it can't, e.g. anon 403).
  • sa-loot (--force-exploit) — exec-steal a pod's service-account token, re-authenticate, and report a measured privilege delta. Write capability is measured with a non-persisting dry-run create (?dryRun=All) so it works for any identity including system:anonymous; escalation is claimed only when the stolen identity can write where the foothold cannot.
  • --all-namespaces / -A on enum and secret-read — one identity reaching every namespace it can, with correct per-namespace attribution.

Validated live on an extended k3s sandbox (an over-granted pipeline-runner SA); the secure cluster (:6444) reports 401/not-weak on every verb. Out of scope (not honestly provable on single-node k3s): cross-cluster/mesh movement, kubelet :10250 pivot, cloud-metadata SSRF, hostPath escape.

v1.2.0 (shipped)

Kubernetes AI-workload module + A2A listener-confirmed probes:

  • k8s module (WS2) — rbac-probe, enum, secret-read, artifact-read, pod-exec against a kube-apiserver directly; kube-apiserver fingerprint + 3 anon-access templates; validated against a real k3s vuln/secure fixture (anon enum → secret exfil → in-pod RCE).
  • A2A out-of-band listener confirmation (WS3) — card-spoof/push-hijack take an http --callback-url with a per-run nonce; only a real inbound callback upgrades influenced → exploited. 5 new a2a-exploit templates (008–012), fire-on-vuln / quiet-on-secure.

v1.1.0 (shipped)

Post-v1.0 offensive expansion + dev/lab infrastructure:

  • A2A offensive primitives — 5 new single-node verbs (auth-probe, msg-integrity, sender-spoof, delegate-probe, card-spoof), proof-carrying via applyProofMetadata, live-validated against a real a2a-sdk agent. A2A stays single-node by design — interception, mesh-mapping, and differential-privilege proof are adjacent tooling, out of aipostex's find-and-pop-a-node scope.
  • Output honesty fixes — findings sort severity-descending (critical first) across all exploit modules; vectordb search-sensitive collapses overlapping match windows; wandb treats /healthz as a liveness probe (no bogus version=ready!).
  • Single-service sandbox (lab) — a dev-machine Docker harness (up → point the tool at one real product → confirm the module is honest → down); closed the long-pending real-W&B proof and is the realism dev loop.
  • Multi-estate (lab) — GROUP_ID-parameterized Proxmox deploy for N non-colliding, isolated estates on one host (validated live: 3 estates, ~99s parallel reset-wave).

v1.0.0 (shipped)

First stable release. 17 exploit modules, proof-carrying findings (stage/strength), a single --force-exploit gate, scored benchmark lab at 100% coverage / RRR-clean, and real-instance proofs for A2A + MCP (and W&B via the sandbox). Supersedes the v0.7.0 maturity line below.

v0.7.0

Version bump from v0.2.0 reflecting cumulative maturity: 17 exploit modules, 123 templates, 30 probes, 8 output formats, model-scan, full report pipeline, and four rounds of Tier 2 review hardening. All v0.2.0 items below are shipped.

v0.2.0 (shipped)

  • Track 2: Model supply chain validation (model-scan) — shipped
  • Track 1: LiteLLM dedicated module — shipped
  • Track 4: BentoML + W&B probes and templates — shipped
  • Track 3: automatic credential forwarding — partial (suggestions generated, not auto-executed)

Future (post-1.3)

  • Track 3: validate / proof-carrying re-validation (re-probe → confidence → delta) — deferred by direction; the proof metadata already exists, the re-validation loop is a later cycle
  • Track 2: cloud AI service probing (SageMaker / Bedrock / Vertex) — (Kubernetes AI-workload discovery shipped in v1.2.0–v1.3.0)
  • Embedding / RAG poisoning; ANP / ACP sibling agent-protocol modules (enum-only skeleton until real targets exist)
  • Track 1: Vector DB semantic proximity search, MCP multi-step chains
  • Track 4: Seldon, KServe, LangSmith, Flowise probes
  • Track 3: campaign reporting, engagement state persistence

Recently Shipped (v0.1.1)

Exploit templates for 5 previously detection-only services:

Service Template Capability
Triton triton-exploit-001-inference-abuse Unauthenticated model inference
TorchServe torchserve-exploit-001-model-register Unauthenticated model registration
TF Serving tfserving-exploit-001-predict-abuse Unauthenticated prediction
Kubeflow kubeflow-enum-001-pipeline-access Pipeline/experiment enumeration
A2A a2a-exploit-001-task-inject JSON-RPC task injection

9 new file discovery rules (33→42): GitHub PAT, Slack Bot Token, Jira/Atlassian, Brave Search, LangChain, WandB, Azure OpenAI, database connection strings, LangChain RAG config.

Lab validation: 100% coverage (170/170), contracts 87 passed / 0 failed, RRR matrix clean (0 over-claims, 0 uncovered), 29 endpoints, 5 VMs (4 targets + attack box) — full live deploy + e2e, 2026-06-28.


Not Planned

Out of scope for the current project direction.

Feature Rationale
GUI / web interface aipostex is a CLI tool designed for operator workflows
Agent-based scanning Autonomous scanning introduces uncontrolled risk; operator-driven progression is a core design principle
Cloud-hosted SaaS The tool is designed for local/lab use by security operators