Roadmap¶

Tracking progress, priorities, and planned work for aipostex.

Shipped¶

Completed milestones listed in reverse chronological order.

v0.1.2 depth (Track 1 + Track 5)¶

Jupyter notebooks --mine-secrets: optional parallel fetch of each listed .ipynb with cell-level credential mining (same patterns as read-notebook).
Credchain: MLflow run findings with run_id metadata register mlflow-run-id credentials; auto-chain suggests mlflow artifacts --run-id <id>.
MLflow: runs sensitive-parameter and artifact URI extraction covered by tests including Snowflake-style artifact_uri values; see docs/modules/mlflow.md.
File discovery: rules without content_patterns keep max_file_size: 0 at load time (filename-only); rules with content patterns still default to a 10MB read cap. New bundled rules: ML training CSV headers under dataset/fine-tune paths; Hugging Face dataset_infos.json / dataset_dict.json under cache or datasets paths.

CVE Template Expansion (Multi-Service)¶

Expanded CVE coverage from MCP-only (7 templates) to 6 service families (17 CVE templates). New templates:

CVE	Service	Severity	Capability
CVE-2025-6514	mcp-remote	Critical (9.6)	Client-side RCE via OAuth authorization_endpoint injection
CVE-2025-53109	Filesystem MCP	High (8.4)	Symlink bypass → sandbox escape → code execution
CVE-2025-53110	Filesystem MCP	High (7.3)	Directory containment bypass via prefix matching
CVE-2025-63389	Ollama	Critical (9.1)	Missing authentication on all API endpoints ≤ v0.12.3
CVE-2025-11201	MLflow	Critical (9.8)	Directory traversal in model version creation → arbitrary file write → RCE
CVE-2025-48889	Gradio	High (7.5)	Arbitrary file copy via flagging endpoint path manipulation
CVE-2025-51471	Ollama	Medium (6.9)	Cross-domain token theft via malicious WWW-Authenticate realm
CVE-2025-34351	Ray	High (8.8)	Token authentication disabled by default in v2.52.0
GHSA-8fr4-5q9j-m8gm	vLLM	Critical (9.8)	RCE via auto_map config class bypassing trust_remote_code
CVE-2025-68664	LangChain	Critical (9.3)	Secret exposure via deserialization
GHSA-mrw7-hf4f-83pf	vLLM	High (8.6)	Deserialization DoS/RCE via sparse tensor memory corruption

OpenAI-Compat Tool Enumeration¶

Added openai-compat tool-enum for read-only function and tool capability probing, attacker-defined tool injection validation, and forced tool invocation checks against exposed OpenAI-compatible endpoints.

OpenAI-Compat Prompt Injection Testing¶

Added openai-compat prompt-test for read-only instruction-override, role-confusion, delimiter-escape, jailbreak, and refusal-bypass probing against exposed OpenAI-compatible endpoints.

Deeper Module Exploitation (Tier 2)¶

Extended four exploit modules with higher-impact capabilities:

Module	New Subcommands	Capability
Ollama	`exfiltrate`	Model weight blob download probing (HEAD + Range GET proof)
Jupyter	`start-kernel`, `reverse-shell-proof`, `pip-proof`	Kernel creation, outbound socket proof, pip install proof
Ray	`pip-inject`, `cluster-info`	Runtime env pip injection, cluster resource exfiltration
MLflow	`tamper-proof`	Experiment/run creation + parameter logging (write access proof)

MCP Stdio Transport (Tier 2)¶

Refactored mcp.Client to use the Transport interface internally. NewStdioClient creates a client backed by a subprocess (NDJSON over stdin/stdout). CLI flags --transport stdio, --stdio-command, and --stdio-args enable enum and poison operations against local MCP servers without HTTP.

Auto-Badge and Evidence Consistency¶

All template-based (vulncheck) findings now automatically receive proof metadata (proof_stage, proof_strength) and auto-generated HTTP request/response evidence. Detection templates are badged VERIFIED; exploit templates are badged EXPLOITED. The HTML report renders these badges and evidence blocks consistently.

Scan Mode Separation¶

Templates carry a type field (detection or exploit). The engine respects a --mode flag: detect (default, safe) runs only detection templates; full includes 46 exploit templates that send active payloads (SSRF, RCE, path traversal, inference, terminal creation, job submission, artifact exfiltration, data extraction, prompt injection, and weak-auth compute theft validation). The scan mode is recorded in finding metadata and displayed in the HTML report engagement strip.

Executive HTML Report¶

Self-contained HTML report with sticky sidebar TOC, severity summary cards, per-host finding cards with expand/collapse, proof badges, collapsible evidence blocks, exploitation summary table, and print-optimized CSS. Suitable for executive delivery.

Reporting Pipeline¶

Eight output formats: Console, JSON, JSONL, CSV, HTML, SARIF, Markdown, PDF. The report command generates consolidated reports from JSON or JSONL findings. The summary command produces executive summaries. The bundle command packages engagements into zip archives. The graph command generates finding correlation graphs (Mermaid/Graphviz).

Post-Exploitation Modules¶

17 dedicated exploit modules:

Module	Subcommands	Key Capabilities
Ollama	10	Enum, prompt extraction, inference, model copy/create/delete/poison, weight exfiltration
Vector DBs	3	ChromaDB/Weaviate/Qdrant enum, extract, sensitive search (27 patterns)
Jupyter	8	Server enum, kernel listing, notebook read, code execution, start-kernel, reverse-shell-proof, pip-proof
MCP	3	Config analysis, remote enum (HTTP + stdio), poison (SSRF/RCE/path-traversal variants)
OpenAI-Compatible	9	Auth sweep, enum, inference validation, prompt extraction, tool enumeration, prompt testing, throughput, proxy, LiteLLM
LiteLLM	4	Config extraction, budget/spend probe, proxy chain analysis, credential discovery
Ray	8	Dashboard enum, job listing/logs/artifacts, job submission, runtime-env, pip-inject, cluster-info
MLflow	9	Tracking enum, experiments, runs, artifacts, registry, model versions, tamper-proof
Gradio	7	Config enum, predict, queue probe, upload/download, file chain, serve probe
BentoML	4	Service enum, route listing, metrics, prediction
Triton	7	Model enum, detail, config, repository index, SHM probe, inference, model load/unload
TorchServe	7	Model enum, detail, metrics, prediction, model register/scale/unregister
HuggingFace TGI/TEI	5	Service auto-detect, model enum, metrics, generation, embedding
TensorFlow Serving	5	Model discovery, metadata/signature extraction, metrics, inference
Kubeflow	6	Pipeline/run/experiment enum, notebooks, pipeline run submission
W&B	5	Server enum, projects, runs, artifacts, secret extraction
A2A	11	Agent card enum, skills, tasks, streaming, push hijack, MCP pivot, tool injection

Vulnerability Templates¶

131 YAML templates across 22 service categories (85 detection, 46 exploit). Templates support matchers, extractors, severity, CVSS, and proof metadata.

Network Fingerprinting¶

30 HTTP-based service probes covering Ollama, ChromaDB, Weaviate, Qdrant, vLLM, LiteLLM, LM Studio, LocalAI, Jupyter, MLflow, Gradio, Streamlit, Ray, Open WebUI, MCP (SSE/Inspector/MCPJam), OpenAI-compatible, Triton, TorchServe (inference + management), TF Serving, HF TGI/TEI, LangServe, Kubeflow, BentoML, W&B, A2A. CIDR expansion, concurrent probing, and automatic template matching.

File Discovery¶

49 rules across 5 packs (api_keys, mcp_configs, local_llm, vectordb_rag, core_assessment). Pattern-matching for API keys, model files, MCP configs, vector DB data, fine-tuning datasets, and RAG pipelines.

Core Framework¶

100+ operator commands across scanning, reporting, templates, and 18 exploit modules
Finding deduplication, merge, and collection management
OPSEC controls: stealth mode, proxy (HTTP/HTTPS/SOCKS5), TLS skip, concurrency caps, User-Agent rotation
Operator progression: workflow recommendations chain discovery into exploitation
Safe by default: --mode detect and --force-exploit gating
CI/CD with GitHub Actions (build, test, lint)

Exploit Template Expansion (Tier 1)¶

Added 5 exploit templates for services that previously had detection-only coverage:

Template	Service	Severity	Capability
`jupyter-exploit-001-terminal-rce`	Jupyter	Critical	Terminal creation = RCE proof
`ray-exploit-001-job-submit-rce`	Ray	Critical	Job submission = cluster RCE
`mlflow-exploit-001-artifact-exfil`	MLflow	High	Experiment -> run -> artifact exfiltration chain
`gradio-exploit-001-file-read`	Gradio	Critical	Arbitrary file read via /file=
`chromadb-exploit-001-data-exfil`	ChromaDB	High	Collection enumeration -> document extraction

Credential Chain-Loading¶

Automatic credential extraction from scan findings (internal/credchain). When discover network or assess network discovers credentials in findings (Jupyter tokens, OpenAI API keys, HF tokens, Anthropic keys, bearer tokens, generic API keys), they are automatically injected into workflow recommendations. Console output shows a credential chain summary.

Development Tracks¶

Work is organized into five capability tracks. Each track has an offensive rationale and concrete deliverables.

Step-by-step plans and verification checklists for upcoming milestones: development/plans/README.md.

Track 1: Extraction Depth¶

Make existing modules find more and prove deeper impact on services we already touch.

Item	Module	What it adds	Priority
MLflow: emit `artifact_uri` and param connection strings as findings	mlflow	Extracts S3/GCS/Snowflake URIs already parsed in `Run.ArtifactURI` and `Run.Params`	v0.1.2
Jupyter: mine notebook cells for secrets	jupyter	Regex scan of cell source for API keys, credentials, connection strings during `notebooks` subcommand	v0.1.2
Ollama: model value scoring	ollama	Score models by parameter size, family, quantization for impact prioritization	v0.2.0
Vector DBs: semantic proximity search	vectordb	Query by embedding vector to find semantically similar sensitive data (not just keyword regex)	v0.8.0
LiteLLM: dedicated exploit module	`litellm` command	Config key extraction, budget/spend enumeration, proxy chain analysis, credential discovery	shipped v0.2.0
MCP: multi-step exploitation chains	mcp	Use path traversal results to feed credential extraction, then pivot	v0.8.0
Gradio: SSRF through prediction endpoints	gradio	Test if prediction endpoints make outbound requests based on user input	v0.8.0

Track 2: New Attack Surfaces¶

Cover attack surfaces the tool doesn't touch yet.

Item	Type	What it adds	Priority
Model supply chain validation	`model-scan` command	Detect pickle/PyTorch deserialization RCE vectors in `.pkl`/`.pt`/`.bin` files; validate model hashes against registry manifests	shipped v0.2.0
Kubernetes AI workload discovery	NEW fingerprint probes + templates	K8s API enumeration for GPU nodes, model serving deployments (KServe, Seldon, BentoML), training jobs	v0.8.0
Cloud AI service probing	NEW fingerprint probes + templates	SageMaker endpoints, Bedrock runtime, Vertex AI, Azure OpenAI/ML — auth validation, model enumeration	v0.8.0
Embedding injection	vectordb `inject` + `metadata-inject`	Add poisoned documents to vector stores to manipulate RAG responses (guarded by `--force-exploit`)	shipped v0.2.0
LLM output exfiltration testing	openai-compat `exfil-test`	Test for data exfiltration via LLM responses (encoding, side-channel, steganography)	v0.8.0

Track 3: Exploitation Chaining¶

Connect individual findings into attack paths. Complete the operator workflow.

Item	Type	What it adds	Priority
`validate` command	NEW command	Re-check findings against live targets, confidence scoring (confirmed/stale/remediated), delta reporting	v0.8.0
Automatic credential forwarding	credchain enhancement	When file discovery finds an API key, auto-inject it into the matching exploit module invocation (not just suggestions)	v0.8.0
Campaign reporting	report enhancement	Cross-engagement diff, temporal comparison, auth pattern summary, model value rollup in one artifact	v0.8.0
Engagement state persistence	NEW `engagement` command	Save/resume multi-phase engagements, track which findings led to which exploits	v0.8.0

Track 4: Detection & Probing¶

Expand fingerprinting and template coverage to new services.

Item	Type	What it adds	Priority
BentoML	fingerprint probe + 3 templates	Model serving framework, widely deployed, REST + gRPC	shipped v0.2.0
Seldon Core	fingerprint probe + 2-3 templates	K8s-native model serving, prediction + feedback APIs	v0.8.0
KServe	fingerprint probe + 2-3 templates	Serverless model inference on K8s, InferenceService CRD	v0.8.0
Weights & Biases	fingerprint probe + 3 templates	Experiment tracking — often exposes API keys, training data, model artifacts	shipped v0.2.0
LangSmith / LangFuse	fingerprint probe + templates	LLM observability — traces, prompts, completions, evaluation data	v0.8.0
Flowise / Dify / n8n-AI	fingerprint probes + templates	Low-code LLM app builders — common in enterprise, often unauthenticated	v0.8.0
CVE template refresh	ongoing	New CVE-specific templates as AI infrastructure CVEs are published	ongoing

Track 5: File Discovery¶

Expand on-disk detection to cover categories we miss today.

Item	Rules	What it adds	Priority
Model file detection	3-4 rules	`.safetensors`, `.gguf`, `.onnx`, `.pkl`, `.pt` — model files indicate training/serving, `.pkl`/`.pt` indicate deserialization risk	v0.1.2
Training data detection	2-3 rules	`.parquet`, `.arrow`, `.tfrecord`, `.csv` with ML column patterns — datasets indicate data exposure	v0.1.2
Kubernetes AI configs	2-3 rules	`InferenceService`, `ServingRuntime`, `TrainingJob` YAML manifests	v0.8.0
CI/CD pipeline configs	2-3 rules	GitHub Actions / GitLab CI model training workflows with credential references	v0.8.0
Experiment tracking configs	2 rules	W&B `wandb/settings`, MLflow `mlflow.db`, Neptune config files with API keys	shipped v0.2.0

Release Plan¶

v0.1.2 (immediate)¶

Track 1: MLflow artifact_uri extraction, Jupyter cell secret mining
Track 5: Model file + training data discovery rules
Doc truth sync (completed)
Target: 95%+ lab coverage

v1.3.0 (shipped)¶

Kubernetes in-cluster lateral movement — deepens the k8s module from read-on-one-apiserver into proven traversal of identities and namespaces within a single cluster:

access-review — SelfSubjectRulesReview maps the current identity's real authorization (read-confirmed when it self-reviews, honest reachable when it can't, e.g. anon 403).
sa-loot (--force-exploit) — exec-steal a pod's service-account token, re-authenticate, and report a measured privilege delta. Write capability is measured with a non-persisting dry-run create (?dryRun=All) so it works for any identity including system:anonymous; escalation is claimed only when the stolen identity can write where the foothold cannot.
--all-namespaces / -A on enum and secret-read — one identity reaching every namespace it can, with correct per-namespace attribution.

Validated live on an extended k3s sandbox (an over-granted pipeline-runner SA); the secure cluster (:6444) reports 401/not-weak on every verb. Out of scope (not honestly provable on single-node k3s): cross-cluster/mesh movement, kubelet :10250 pivot, cloud-metadata SSRF, hostPath escape.

v1.2.0 (shipped)¶

Kubernetes AI-workload module + A2A listener-confirmed probes:

k8s module (WS2) — rbac-probe, enum, secret-read, artifact-read, pod-exec against a kube-apiserver directly; kube-apiserver fingerprint + 3 anon-access templates; validated against a real k3s vuln/secure fixture (anon enum → secret exfil → in-pod RCE).
A2A out-of-band listener confirmation (WS3) — card-spoof/push-hijack take an http --callback-url with a per-run nonce; only a real inbound callback upgrades influenced → exploited. 5 new a2a-exploit templates (008–012), fire-on-vuln / quiet-on-secure.

v1.1.0 (shipped)¶

Post-v1.0 offensive expansion + dev/lab infrastructure:

A2A offensive primitives — 5 new single-node verbs (auth-probe, msg-integrity, sender-spoof, delegate-probe, card-spoof), proof-carrying via applyProofMetadata, live-validated against a real a2a-sdk agent. A2A stays single-node by design — interception, mesh-mapping, and differential-privilege proof are adjacent tooling, out of aipostex's find-and-pop-a-node scope.
Output honesty fixes — findings sort severity-descending (critical first) across all exploit modules; vectordb search-sensitive collapses overlapping match windows; wandb treats /healthz as a liveness probe (no bogus version=ready!).
Single-service sandbox (lab) — a dev-machine Docker harness (up → point the tool at one real product → confirm the module is honest → down); closed the long-pending real-W&B proof and is the realism dev loop.
Multi-estate (lab) — GROUP_ID-parameterized Proxmox deploy for N non-colliding, isolated estates on one host (validated live: 3 estates, ~99s parallel reset-wave).

v1.0.0 (shipped)¶

First stable release. 17 exploit modules, proof-carrying findings (stage/strength), a single --force-exploit gate, scored benchmark lab at 100% coverage / RRR-clean, and real-instance proofs for A2A + MCP (and W&B via the sandbox). Supersedes the v0.7.0 maturity line below.

v0.7.0¶

Version bump from v0.2.0 reflecting cumulative maturity: 17 exploit modules, 123 templates, 30 probes, 8 output formats, model-scan, full report pipeline, and four rounds of Tier 2 review hardening. All v0.2.0 items below are shipped.

v0.2.0 (shipped)¶

Track 2: Model supply chain validation (model-scan) — shipped
Track 1: LiteLLM dedicated module — shipped
Track 4: BentoML + W&B probes and templates — shipped
Track 3: automatic credential forwarding — partial (suggestions generated, not auto-executed)

Future (post-1.3)¶

Track 3: validate / proof-carrying re-validation (re-probe → confidence → delta) — deferred by direction; the proof metadata already exists, the re-validation loop is a later cycle
Track 2: cloud AI service probing (SageMaker / Bedrock / Vertex) — (Kubernetes AI-workload discovery shipped in v1.2.0–v1.3.0)
Embedding / RAG poisoning; ANP / ACP sibling agent-protocol modules (enum-only skeleton until real targets exist)
Track 1: Vector DB semantic proximity search, MCP multi-step chains
Track 4: Seldon, KServe, LangSmith, Flowise probes
Track 3: campaign reporting, engagement state persistence

Recently Shipped (v0.1.1)¶

Exploit templates for 5 previously detection-only services:

Service	Template	Capability
Triton	`triton-exploit-001-inference-abuse`	Unauthenticated model inference
TorchServe	`torchserve-exploit-001-model-register`	Unauthenticated model registration
TF Serving	`tfserving-exploit-001-predict-abuse`	Unauthenticated prediction
Kubeflow	`kubeflow-enum-001-pipeline-access`	Pipeline/experiment enumeration
A2A	`a2a-exploit-001-task-inject`	JSON-RPC task injection

9 new file discovery rules (33→42): GitHub PAT, Slack Bot Token, Jira/Atlassian, Brave Search, LangChain, WandB, Azure OpenAI, database connection strings, LangChain RAG config.

Lab validation: 100% coverage (170/170), contracts 87 passed / 0 failed, RRR matrix clean (0 over-claims, 0 uncovered), 29 endpoints, 5 VMs (4 targets + attack box) — full live deploy + e2e, 2026-06-28.

Not Planned¶

Out of scope for the current project direction.

Feature	Rationale
GUI / web interface	aipostex is a CLI tool designed for operator workflows
Agent-based scanning	Autonomous scanning introduces uncontrolled risk; operator-driven progression is a core design principle
Cloud-hosted SaaS	The tool is designed for local/lab use by security operators