Roadmap¶
Tracking progress, priorities, and planned work for aipostex.
Shipped¶
Completed milestones listed in reverse chronological order.
v0.1.2 depth (Track 1 + Track 5)¶
- Jupyter
notebooks --mine-secrets: optional parallel fetch of each listed.ipynbwith cell-level credential mining (same patterns asread-notebook). - Credchain: MLflow run findings with
run_idmetadata registermlflow-run-idcredentials; auto-chain suggestsmlflow artifacts --run-id <id>. - MLflow:
runssensitive-parameter and artifact URI extraction covered by tests including Snowflake-styleartifact_urivalues; seedocs/modules/mlflow.md. - File discovery: rules without
content_patternskeepmax_file_size: 0at load time (filename-only); rules with content patterns still default to a 10MB read cap. New bundled rules: ML training CSV headers under dataset/fine-tune paths; Hugging Facedataset_infos.json/dataset_dict.jsonunder cache or datasets paths.
CVE Template Expansion (Multi-Service)¶
Expanded CVE coverage from MCP-only (7 templates) to 6 service families (17 CVE templates). New templates:
| CVE | Service | Severity | Capability |
|---|---|---|---|
| CVE-2025-6514 | mcp-remote | Critical (9.6) | Client-side RCE via OAuth authorization_endpoint injection |
| CVE-2025-53109 | Filesystem MCP | High (8.4) | Symlink bypass → sandbox escape → code execution |
| CVE-2025-53110 | Filesystem MCP | High (7.3) | Directory containment bypass via prefix matching |
| CVE-2025-63389 | Ollama | Critical (9.1) | Missing authentication on all API endpoints ≤ v0.12.3 |
| CVE-2025-11201 | MLflow | Critical (9.8) | Directory traversal in model version creation → arbitrary file write → RCE |
| CVE-2025-48889 | Gradio | High (7.5) | Arbitrary file copy via flagging endpoint path manipulation |
| CVE-2025-51471 | Ollama | Medium (6.9) | Cross-domain token theft via malicious WWW-Authenticate realm |
| CVE-2025-34351 | Ray | High (8.8) | Token authentication disabled by default in v2.52.0 |
| GHSA-8fr4-5q9j-m8gm | vLLM | Critical (9.8) | RCE via auto_map config class bypassing trust_remote_code |
| CVE-2025-68664 | LangChain | Critical (9.3) | Secret exposure via deserialization |
| GHSA-mrw7-hf4f-83pf | vLLM | High (8.6) | Deserialization DoS/RCE via sparse tensor memory corruption |
OpenAI-Compat Tool Enumeration¶
Added openai-compat tool-enum for read-only function and tool capability probing, attacker-defined tool injection validation, and forced tool invocation checks against exposed OpenAI-compatible endpoints.
OpenAI-Compat Prompt Injection Testing¶
Added openai-compat prompt-test for read-only instruction-override, role-confusion, delimiter-escape, jailbreak, and refusal-bypass probing against exposed OpenAI-compatible endpoints.
Deeper Module Exploitation (Tier 2)¶
Extended four exploit modules with higher-impact capabilities:
| Module | New Subcommands | Capability |
|---|---|---|
| Ollama | exfiltrate |
Model weight blob download probing (HEAD + Range GET proof) |
| Jupyter | start-kernel, reverse-shell-proof, pip-proof |
Kernel creation, outbound socket proof, pip install proof |
| Ray | pip-inject, cluster-info |
Runtime env pip injection, cluster resource exfiltration |
| MLflow | tamper-proof |
Experiment/run creation + parameter logging (write access proof) |
MCP Stdio Transport (Tier 2)¶
Refactored mcp.Client to use the Transport interface internally. NewStdioClient creates a client backed by a subprocess (NDJSON over stdin/stdout). CLI flags --transport stdio, --stdio-command, and --stdio-args enable enum and poison operations against local MCP servers without HTTP.
Auto-Badge and Evidence Consistency¶
All template-based (vulncheck) findings now automatically receive proof metadata (proof_stage, proof_strength) and auto-generated HTTP request/response evidence. Detection templates are badged VERIFIED; exploit templates are badged EXPLOITED. The HTML report renders these badges and evidence blocks consistently.
Scan Mode Separation¶
Templates carry a type field (detection or exploit). The engine respects a --mode flag: detect (default, safe) runs only detection templates; full includes 46 exploit templates that send active payloads (SSRF, RCE, path traversal, inference, terminal creation, job submission, artifact exfiltration, data extraction, prompt injection, and weak-auth compute theft validation). The scan mode is recorded in finding metadata and displayed in the HTML report engagement strip.
Executive HTML Report¶
Self-contained HTML report with sticky sidebar TOC, severity summary cards, per-host finding cards with expand/collapse, proof badges, collapsible evidence blocks, exploitation summary table, and print-optimized CSS. Suitable for executive delivery.
Reporting Pipeline¶
Eight output formats: Console, JSON, JSONL, CSV, HTML, SARIF, Markdown, PDF. The report command generates consolidated reports from JSON or JSONL findings. The summary command produces executive summaries. The bundle command packages engagements into zip archives. The graph command generates finding correlation graphs (Mermaid/Graphviz).
Post-Exploitation Modules¶
17 dedicated exploit modules:
| Module | Subcommands | Key Capabilities |
|---|---|---|
| Ollama | 10 | Enum, prompt extraction, inference, model copy/create/delete/poison, weight exfiltration |
| Vector DBs | 3 | ChromaDB/Weaviate/Qdrant enum, extract, sensitive search (27 patterns) |
| Jupyter | 8 | Server enum, kernel listing, notebook read, code execution, start-kernel, reverse-shell-proof, pip-proof |
| MCP | 3 | Config analysis, remote enum (HTTP + stdio), poison (SSRF/RCE/path-traversal variants) |
| OpenAI-Compatible | 9 | Auth sweep, enum, inference validation, prompt extraction, tool enumeration, prompt testing, throughput, proxy, LiteLLM |
| LiteLLM | 4 | Config extraction, budget/spend probe, proxy chain analysis, credential discovery |
| Ray | 8 | Dashboard enum, job listing/logs/artifacts, job submission, runtime-env, pip-inject, cluster-info |
| MLflow | 9 | Tracking enum, experiments, runs, artifacts, registry, model versions, tamper-proof |
| Gradio | 7 | Config enum, predict, queue probe, upload/download, file chain, serve probe |
| BentoML | 4 | Service enum, route listing, metrics, prediction |
| Triton | 7 | Model enum, detail, config, repository index, SHM probe, inference, model load/unload |
| TorchServe | 7 | Model enum, detail, metrics, prediction, model register/scale/unregister |
| HuggingFace TGI/TEI | 5 | Service auto-detect, model enum, metrics, generation, embedding |
| TensorFlow Serving | 5 | Model discovery, metadata/signature extraction, metrics, inference |
| Kubeflow | 6 | Pipeline/run/experiment enum, notebooks, pipeline run submission |
| W&B | 5 | Server enum, projects, runs, artifacts, secret extraction |
| A2A | 11 | Agent card enum, skills, tasks, streaming, push hijack, MCP pivot, tool injection |
Vulnerability Templates¶
131 YAML templates across 22 service categories (85 detection, 46 exploit). Templates support matchers, extractors, severity, CVSS, and proof metadata.
Network Fingerprinting¶
30 HTTP-based service probes covering Ollama, ChromaDB, Weaviate, Qdrant, vLLM, LiteLLM, LM Studio, LocalAI, Jupyter, MLflow, Gradio, Streamlit, Ray, Open WebUI, MCP (SSE/Inspector/MCPJam), OpenAI-compatible, Triton, TorchServe (inference + management), TF Serving, HF TGI/TEI, LangServe, Kubeflow, BentoML, W&B, A2A. CIDR expansion, concurrent probing, and automatic template matching.
File Discovery¶
49 rules across 5 packs (api_keys, mcp_configs, local_llm, vectordb_rag, core_assessment). Pattern-matching for API keys, model files, MCP configs, vector DB data, fine-tuning datasets, and RAG pipelines.
Core Framework¶
- 100+ operator commands across scanning, reporting, templates, and 18 exploit modules
- Finding deduplication, merge, and collection management
- OPSEC controls: stealth mode, proxy (HTTP/HTTPS/SOCKS5), TLS skip, concurrency caps, User-Agent rotation
- Operator progression: workflow recommendations chain discovery into exploitation
- Safe by default:
--mode detectand--force-exploitgating - CI/CD with GitHub Actions (build, test, lint)
Exploit Template Expansion (Tier 1)¶
Added 5 exploit templates for services that previously had detection-only coverage:
| Template | Service | Severity | Capability |
|---|---|---|---|
jupyter-exploit-001-terminal-rce |
Jupyter | Critical | Terminal creation = RCE proof |
ray-exploit-001-job-submit-rce |
Ray | Critical | Job submission = cluster RCE |
mlflow-exploit-001-artifact-exfil |
MLflow | High | Experiment -> run -> artifact exfiltration chain |
gradio-exploit-001-file-read |
Gradio | Critical | Arbitrary file read via /file= |
chromadb-exploit-001-data-exfil |
ChromaDB | High | Collection enumeration -> document extraction |
Credential Chain-Loading¶
Automatic credential extraction from scan findings (internal/credchain). When discover network or assess network discovers credentials in findings (Jupyter tokens, OpenAI API keys, HF tokens, Anthropic keys, bearer tokens, generic API keys), they are automatically injected into workflow recommendations. Console output shows a credential chain summary.
Development Tracks¶
Work is organized into five capability tracks. Each track has an offensive rationale and concrete deliverables.
Step-by-step plans and verification checklists for upcoming milestones: development/plans/README.md.
Track 1: Extraction Depth¶
Make existing modules find more and prove deeper impact on services we already touch.
| Item | Module | What it adds | Priority |
|---|---|---|---|
MLflow: emit artifact_uri and param connection strings as findings |
mlflow | Extracts S3/GCS/Snowflake URIs already parsed in Run.ArtifactURI and Run.Params |
v0.1.2 |
| Jupyter: mine notebook cells for secrets | jupyter | Regex scan of cell source for API keys, credentials, connection strings during notebooks subcommand |
v0.1.2 |
| Ollama: model value scoring | ollama | Score models by parameter size, family, quantization for impact prioritization | v0.2.0 |
| Vector DBs: semantic proximity search | vectordb | Query by embedding vector to find semantically similar sensitive data (not just keyword regex) | v0.8.0 |
| LiteLLM: dedicated exploit module | litellm command |
Config key extraction, budget/spend enumeration, proxy chain analysis, credential discovery | shipped v0.2.0 |
| MCP: multi-step exploitation chains | mcp | Use path traversal results to feed credential extraction, then pivot | v0.8.0 |
| Gradio: SSRF through prediction endpoints | gradio | Test if prediction endpoints make outbound requests based on user input | v0.8.0 |
Track 2: New Attack Surfaces¶
Cover attack surfaces the tool doesn't touch yet.
| Item | Type | What it adds | Priority |
|---|---|---|---|
| Model supply chain validation | model-scan command |
Detect pickle/PyTorch deserialization RCE vectors in .pkl/.pt/.bin files; validate model hashes against registry manifests |
shipped v0.2.0 |
| Kubernetes AI workload discovery | NEW fingerprint probes + templates | K8s API enumeration for GPU nodes, model serving deployments (KServe, Seldon, BentoML), training jobs | v0.8.0 |
| Cloud AI service probing | NEW fingerprint probes + templates | SageMaker endpoints, Bedrock runtime, Vertex AI, Azure OpenAI/ML — auth validation, model enumeration | v0.8.0 |
| Embedding injection | vectordb inject + metadata-inject |
Add poisoned documents to vector stores to manipulate RAG responses (guarded by --force-exploit) |
shipped v0.2.0 |
| LLM output exfiltration testing | openai-compat exfil-test |
Test for data exfiltration via LLM responses (encoding, side-channel, steganography) | v0.8.0 |
Track 3: Exploitation Chaining¶
Connect individual findings into attack paths. Complete the operator workflow.
| Item | Type | What it adds | Priority |
|---|---|---|---|
validate command |
NEW command | Re-check findings against live targets, confidence scoring (confirmed/stale/remediated), delta reporting | v0.8.0 |
| Automatic credential forwarding | credchain enhancement | When file discovery finds an API key, auto-inject it into the matching exploit module invocation (not just suggestions) | v0.8.0 |
| Campaign reporting | report enhancement | Cross-engagement diff, temporal comparison, auth pattern summary, model value rollup in one artifact | v0.8.0 |
| Engagement state persistence | NEW engagement command |
Save/resume multi-phase engagements, track which findings led to which exploits | v0.8.0 |
Track 4: Detection & Probing¶
Expand fingerprinting and template coverage to new services.
| Item | Type | What it adds | Priority |
|---|---|---|---|
| BentoML | fingerprint probe + 3 templates | Model serving framework, widely deployed, REST + gRPC | shipped v0.2.0 |
| Seldon Core | fingerprint probe + 2-3 templates | K8s-native model serving, prediction + feedback APIs | v0.8.0 |
| KServe | fingerprint probe + 2-3 templates | Serverless model inference on K8s, InferenceService CRD | v0.8.0 |
| Weights & Biases | fingerprint probe + 3 templates | Experiment tracking — often exposes API keys, training data, model artifacts | shipped v0.2.0 |
| LangSmith / LangFuse | fingerprint probe + templates | LLM observability — traces, prompts, completions, evaluation data | v0.8.0 |
| Flowise / Dify / n8n-AI | fingerprint probes + templates | Low-code LLM app builders — common in enterprise, often unauthenticated | v0.8.0 |
| CVE template refresh | ongoing | New CVE-specific templates as AI infrastructure CVEs are published | ongoing |
Track 5: File Discovery¶
Expand on-disk detection to cover categories we miss today.
| Item | Rules | What it adds | Priority |
|---|---|---|---|
| Model file detection | 3-4 rules | .safetensors, .gguf, .onnx, .pkl, .pt — model files indicate training/serving, .pkl/.pt indicate deserialization risk |
v0.1.2 |
| Training data detection | 2-3 rules | .parquet, .arrow, .tfrecord, .csv with ML column patterns — datasets indicate data exposure |
v0.1.2 |
| Kubernetes AI configs | 2-3 rules | InferenceService, ServingRuntime, TrainingJob YAML manifests |
v0.8.0 |
| CI/CD pipeline configs | 2-3 rules | GitHub Actions / GitLab CI model training workflows with credential references | v0.8.0 |
| Experiment tracking configs | 2 rules | W&B wandb/settings, MLflow mlflow.db, Neptune config files with API keys |
shipped v0.2.0 |
Release Plan¶
v0.1.2 (immediate)¶
- Track 1: MLflow artifact_uri extraction, Jupyter cell secret mining
- Track 5: Model file + training data discovery rules
- Doc truth sync (completed)
- Target: 95%+ lab coverage
v1.3.0 (shipped)¶
Kubernetes in-cluster lateral movement — deepens the k8s module from read-on-one-apiserver into proven traversal of identities and namespaces within a single cluster:
access-review—SelfSubjectRulesReviewmaps the current identity's real authorization (read-confirmedwhen it self-reviews, honestreachablewhen it can't, e.g. anon 403).sa-loot(--force-exploit) — exec-steal a pod's service-account token, re-authenticate, and report a measured privilege delta. Write capability is measured with a non-persisting dry-run create (?dryRun=All) so it works for any identity includingsystem:anonymous; escalation is claimed only when the stolen identity can write where the foothold cannot.--all-namespaces/-Aonenumandsecret-read— one identity reaching every namespace it can, with correct per-namespace attribution.
Validated live on an extended k3s sandbox (an over-granted pipeline-runner SA); the secure cluster (:6444) reports 401/not-weak on every verb. Out of scope (not honestly provable on single-node k3s): cross-cluster/mesh movement, kubelet :10250 pivot, cloud-metadata SSRF, hostPath escape.
v1.2.0 (shipped)¶
Kubernetes AI-workload module + A2A listener-confirmed probes:
k8smodule (WS2) —rbac-probe,enum,secret-read,artifact-read,pod-execagainst a kube-apiserver directly; kube-apiserver fingerprint + 3 anon-access templates; validated against a real k3s vuln/secure fixture (anon enum → secret exfil → in-pod RCE).- A2A out-of-band listener confirmation (WS3) —
card-spoof/push-hijacktake anhttp--callback-urlwith a per-run nonce; only a real inbound callback upgradesinfluenced → exploited. 5 newa2a-exploittemplates (008–012), fire-on-vuln / quiet-on-secure.
v1.1.0 (shipped)¶
Post-v1.0 offensive expansion + dev/lab infrastructure:
- A2A offensive primitives — 5 new single-node verbs (
auth-probe,msg-integrity,sender-spoof,delegate-probe,card-spoof), proof-carrying viaapplyProofMetadata, live-validated against a real a2a-sdk agent. A2A stays single-node by design — interception, mesh-mapping, and differential-privilege proof are adjacent tooling, out of aipostex's find-and-pop-a-node scope. - Output honesty fixes — findings sort severity-descending (critical first) across all exploit modules; vectordb
search-sensitivecollapses overlapping match windows;wandbtreats/healthzas a liveness probe (no bogusversion=ready!). - Single-service sandbox (lab) — a dev-machine Docker harness (
up→ point the tool at one real product → confirm the module is honest →down); closed the long-pending real-W&B proof and is the realism dev loop. - Multi-estate (lab) —
GROUP_ID-parameterized Proxmox deploy for N non-colliding, isolated estates on one host (validated live: 3 estates, ~99s parallel reset-wave).
v1.0.0 (shipped)¶
First stable release. 17 exploit modules, proof-carrying findings (stage/strength), a single --force-exploit gate, scored benchmark lab at 100% coverage / RRR-clean, and real-instance proofs for A2A + MCP (and W&B via the sandbox). Supersedes the v0.7.0 maturity line below.
v0.7.0¶
Version bump from v0.2.0 reflecting cumulative maturity: 17 exploit modules, 123 templates, 30 probes, 8 output formats, model-scan, full report pipeline, and four rounds of Tier 2 review hardening. All v0.2.0 items below are shipped.
v0.2.0 (shipped)¶
- Track 2: Model supply chain validation (
model-scan) — shipped - Track 1: LiteLLM dedicated module — shipped
- Track 4: BentoML + W&B probes and templates — shipped
- Track 3: automatic credential forwarding — partial (suggestions generated, not auto-executed)
Future (post-1.3)¶
- Track 3:
validate/ proof-carrying re-validation (re-probe → confidence → delta) — deferred by direction; the proof metadata already exists, the re-validation loop is a later cycle - Track 2: cloud AI service probing (SageMaker / Bedrock / Vertex) — (Kubernetes AI-workload discovery shipped in v1.2.0–v1.3.0)
- Embedding / RAG poisoning; ANP / ACP sibling agent-protocol modules (
enum-only skeleton until real targets exist) - Track 1: Vector DB semantic proximity search, MCP multi-step chains
- Track 4: Seldon, KServe, LangSmith, Flowise probes
- Track 3: campaign reporting, engagement state persistence
Recently Shipped (v0.1.1)¶
Exploit templates for 5 previously detection-only services:
| Service | Template | Capability |
|---|---|---|
| Triton | triton-exploit-001-inference-abuse |
Unauthenticated model inference |
| TorchServe | torchserve-exploit-001-model-register |
Unauthenticated model registration |
| TF Serving | tfserving-exploit-001-predict-abuse |
Unauthenticated prediction |
| Kubeflow | kubeflow-enum-001-pipeline-access |
Pipeline/experiment enumeration |
| A2A | a2a-exploit-001-task-inject |
JSON-RPC task injection |
9 new file discovery rules (33→42): GitHub PAT, Slack Bot Token, Jira/Atlassian, Brave Search, LangChain, WandB, Azure OpenAI, database connection strings, LangChain RAG config.
Lab validation: 100% coverage (170/170), contracts 87 passed / 0 failed, RRR matrix clean (0 over-claims, 0 uncovered), 29 endpoints, 5 VMs (4 targets + attack box) — full live deploy + e2e, 2026-06-28.
Not Planned¶
Out of scope for the current project direction.
| Feature | Rationale |
|---|---|
| GUI / web interface | aipostex is a CLI tool designed for operator workflows |
| Agent-based scanning | Autonomous scanning introduces uncontrolled risk; operator-driven progression is a core design principle |
| Cloud-hosted SaaS | The tool is designed for local/lab use by security operators |