Coverage & Roadmap¶
Current Coverage Matrix¶
| Service | Implemented | Next Additions | Later-Phase |
|---|---|---|---|
| Ollama | Enum, prompts (System field + Modelfile parsing), generate, show, running, copy/create/delete/poison, weight exfiltration | Better value scoring, model tamper validation | Resource exhaustion |
| OpenAI-Compatible (vLLM, LiteLLM, LocalAI, LM Studio) | auth-sweep, enum, validate-inference, prompt-extract, tool-enum, prompt-test, throughput, proxy-test, litellm-probe; scan/template coverage; explicit model-inventory + prompt-injection + weak-auth/tool-injection templates | Campaign-level reporting, provider-specific divergence | Provider-specific deep exploit paths |
| ChromaDB | Enum (tenant/database-aware fallback), extract, sensitive search (27 patterns), unauth template | Better value scoring | Mutating abuse, snapshot exports |
| Qdrant | Enum, extract, sensitive search, unauth template | Better cluster/detail enrichment | Snapshot export, destructive workflows |
| Weaviate | Enum, extract, sensitive search, unauth template | Better schema/GraphQL enrichment | Backup/export abuse, cluster ops |
| Milvus | Enum, extract, sensitive search, inject, metadata-inject, unauth template | Deeper schema introspection | Partition/index manipulation |
| pgvector | Enum (table introspection), extract, sensitive search, inject, metadata-inject | Column-level sensitive scanning | SQL injection chaining |
| BentoML | Enum, routes (OpenAPI parsing), predict, metrics; 3 vuln templates | Deeper runner/model versioning | Custom runner exploitation |
| NVIDIA Triton | Enum, models, model-config, infer, model-load, model-unload, shm-probe (CVE-2025-23319/23320/23334); 7 vuln templates | Repository manipulation | Full IPC exploitation chain |
| TorchServe | Enum, models, predict, register (ShellTorch SSRF), scale, unregister, metrics; 5 vuln templates | Deeper handler inspection | Full ShellTorch RCE chain |
| Jupyter | Enum, kernels, notebooks (--mine-secrets cell mining), read-notebook, exec, start-kernel, reverse-shell-proof, pip-proof; auth/terminal templates; kernelspec + contents-read templates | Extension/config enrichment | Broader notebook environment takeover |
| MCP | Analyze, enum, poison (9 modes: generic/ssrf-cloud/cmd-inject/path-traversal + 5 schema poison modes), env-extract (credential probing), chain (automated kill chain); HTTP + stdio transport; config variants; capability classification; remote URL correlation; 20 vuln templates covering unauth access, inspector exposure, DNS rebinding, session leakage, SSRF, RCE CVEs (Inspector, Figma, K8s, mcp-remote, Filesystem MCP), and server-specific detection (Neo4j, Vet, MS Learn) | Deeper multi-step poison automation, richer inspector-to-remote exploitation | Broader exploit automation |
| Ray | Dashboard enum, jobs (array + object formats, runtime_env/env_vars extraction), job-logs, job-artifacts, submit, runtime-env, pip-inject, cluster-info; unauth templates; cluster-status + log-exposure templates | Better cluster-state enrichment | Broader cluster takeover workflows |
| MLflow | Tracking enum (root + /health), experiments, runs, registry (GET-first), model-versions, model-artifacts, artifact-tree, artifact reads, tamper-proof (experiment/run/param creation); unauth templates; registry-surface template | Better artifact sensitivity/value inference | Deeper artifact and registry exploitation |
| Gradio | Config enum, endpoint discovery, predict, queue/upload proofs, file reads, file-chain, serve-probe; exposed-surface templates; file-capable surface template | Better route-specific exploit chaining, richer handle parsing | Deeper upload/read/serve exploit coverage |
Fingerprinting now distinguishes confirmed, suspected, and ambiguous matches. Generic probes are preserved where useful, but weak-only hits are downgraded instead of being presented as authoritative service identity. Ambiguous or proxy-like ports are no longer skipped during network assessment: aipostex expands template coverage across each plausible HTTP service identity and marks resulting findings with fingerprint_status=ambiguous, coverage_expanded=true, candidate_services, and identity_confidence=ambiguous. Clear non-HTTP identities such as PostgreSQL/pgvector are preserved for module enumeration, but HTTP templates are not sprayed at those ports; the output records an informational skip instead.
Template Coverage¶
Templates are classified by info.type in each YAML: detection (default) or exploit. Totals below are from pkg/vulncheck/templates/**/*.yaml (embedded at build time). The --mode flag controls which run: detect (default, safe) runs only detection templates; full runs all templates including active exploitation.
| Category | Templates | Detection | Exploit | Severity Range |
|---|---|---|---|---|
| MCP | 23 | 16 | 7 | Critical, High, Medium |
| A2A | 14 | 2 | 12 | High, Medium |
| NVIDIA Triton | 8 | 6 | 2 | Critical, High, Medium |
| Vector DBs | 7 | 4 | 3 | High |
| LangChain / LangServe | 7 | 5 | 2 | Critical, High, Medium |
| Ray | 6 | 4 | 2 | Critical, High, Medium |
| Ollama | 6 | 5 | 1 | Critical, High, Medium, Info |
| Hugging Face | 6 | 4 | 2 | Critical, High, Medium |
| Jupyter | 6 | 4 | 2 | Critical, High, Medium |
| Gradio | 6 | 4 | 2 | Critical, High, Medium |
| TorchServe | 5 | 3 | 2 | Critical, High, Medium |
| OpenAI-Compatible | 5 | 1 | 4 | Critical, High |
| MLflow | 5 | 3 | 2 | Critical, High, Medium |
| BentoML | 4 | 4 | 0 | High, Medium, Info |
| vLLM | 4 | 4 | 0 | Critical, High |
| Weights & Biases | 3 | 3 | 0 | High, Medium |
| TensorFlow Serving | 3 | 2 | 1 | High, Medium |
| Kubeflow | 3 | 3 | 0 | High |
| Kubernetes | 3 | 2 | 1 | Critical, High, Medium |
| Campaign | 3 | 2 | 1 | High |
| LiteLLM | 2 | 2 | 0 | High |
| Streamlit | 2 | 2 | 0 | High, Medium |
| Total | 131 | 85 | 46 |
Discovery Rule Coverage¶
| Rule Pack | Rules | Category |
|---|---|---|
| api_keys.yaml | 17 | AI credentials (OpenAI, Anthropic, HF, Google, Cohere, Replicate, Mistral, Groq, AWS, Pinecone, GitHub, Slack, Jira, Brave, LangChain, WandB, Azure OpenAI) |
| mcp_configs.yaml | 5 | MCP configuration files |
| local_llm.yaml | 9 | Local LLM artifacts (Ollama, GGUF, SafeTensors, pickle, PyTorch, ONNX, LM Studio, Docker AI, Ollama env) |
| vectordb_rag.yaml | 9 | Vector DB, RAG configs, training data, notebooks, LLMjacking indicators |
| core_assessment.yaml | 9 | Fine-tuning data (manifests, CSV, HF dataset files), Arrow/TFRecord, RAG pipelines, LLMjacking, DB connection strings, LangChain RAG config |
| Total | 49 |
Lab harness parity¶
End-to-end checks against the live lab live in the companion repo aipostex-lab script lab-scripts/attack-box/verify-aipostex.sh.
Automated there (operator / active / contract layers): discover network, discover files, assess network (unless AIPOSTEX_SKIP_ASSESS=1), scan targets, mcp (analyze, enum, stdio, poison), openai-compat, ray, mlflow, gradio, ollama, vectordb (enum, extract, search-sensitive), jupyter (including notebooks --mine-secrets, read-notebook, gated proofs).
Not driven by that harness (manual, scoring-only, or optional layers): templates, report, engagement, model-scan, top-level litellm (LiteLLM is still exercised via openai-compat litellm-probe), BentoML / Triton / TorchServe modules unless you aim those CLIs at services that happen to be up in your range, and MCP env-extract / chain (no deterministic assertions in the default script).
Current Gaps¶
These features are still planned or intentionally deferred:
| Feature | Status | Description |
|---|---|---|
validate |
Deferred | Finding validation and confidence scoring |
| SQLite output | Deferred | Database output format for querying |
| Additional transports | Deferred | More transport coverage beyond current HTTP and MCP stdio support |
| Resumable jobs | Deferred | Long-running job orchestration and resume |
| Deeper provider-specific workflows | Planned | More service-specific exploit depth where generic coverage is not enough |
| Model supply chain validation | Partial | model-scan CLI with directory excludes, size cap, GGUF handling; extend signals as needed |
| Cloud AI service probing | Planned | SageMaker, Bedrock, Vertex AI, Azure OpenAI endpoint discovery |
Build Plan Phases¶
The project follows a phased build plan:
- Correctness -- assessment loop hardening, deduplication, summaries (done)
- MCP Expansion -- deeper MCP coverage, poison modes, capability classification (done)
- OpenAI-Compatible -- generic inference validation, auth sweep, tool enumeration, prompt testing, throughput (done)
- Runtime/OPSEC -- proxy, stealth, embed, signal handling, guardrails (done)
- Service Backlog -- Ray, MLflow, Gradio depth modules (done)
- Lab/Release -- CI/CD, documentation, release packaging, lab validation at 91.8% (done)
MITRE ATLAS Mapping¶
Templates reference MITRE ATLAS techniques where applicable:
| Technique | Coverage |
|---|---|
| AML.T0049 (Exploit Public-Facing Application) | Ollama, MCP, vector DBs, Jupyter, Ray, MLflow, Gradio, OpenAI-compat, vLLM, LangChain |
| AML.T0034 (Cost Harvesting) | OpenAI-compat throughput, Ollama generate, HF TGI inference, HF TEI embedding |
| AML.T0040 (ML Model Inference API Access) | Ollama, OpenAI-compat validate-inference, HF TGI, HF TEI, LangServe chain invoke |