Coverage & Roadmap¶

Current Coverage Matrix¶

Service	Implemented	Next Additions	Later-Phase
Ollama	Enum, prompts (System field + Modelfile parsing), generate, show, running, copy/create/delete/poison, weight exfiltration	Better value scoring, model tamper validation	Resource exhaustion
OpenAI-Compatible (vLLM, LiteLLM, LocalAI, LM Studio)	auth-sweep, enum, validate-inference, prompt-extract, tool-enum, prompt-test, throughput, proxy-test, litellm-probe; scan/template coverage; explicit model-inventory + prompt-injection + weak-auth/tool-injection templates	Campaign-level reporting, provider-specific divergence	Provider-specific deep exploit paths
ChromaDB	Enum (tenant/database-aware fallback), extract, sensitive search (27 patterns), unauth template	Better value scoring	Mutating abuse, snapshot exports
Qdrant	Enum, extract, sensitive search, unauth template	Better cluster/detail enrichment	Snapshot export, destructive workflows
Weaviate	Enum, extract, sensitive search, unauth template	Better schema/GraphQL enrichment	Backup/export abuse, cluster ops
Milvus	Enum, extract, sensitive search, inject, metadata-inject, unauth template	Deeper schema introspection	Partition/index manipulation
pgvector	Enum (table introspection), extract, sensitive search, inject, metadata-inject	Column-level sensitive scanning	SQL injection chaining
BentoML	Enum, routes (OpenAPI parsing), predict, metrics; 3 vuln templates	Deeper runner/model versioning	Custom runner exploitation
NVIDIA Triton	Enum, models, model-config, infer, model-load, model-unload, shm-probe (CVE-2025-23319/23320/23334); 7 vuln templates	Repository manipulation	Full IPC exploitation chain
TorchServe	Enum, models, predict, register (ShellTorch SSRF), scale, unregister, metrics; 5 vuln templates	Deeper handler inspection	Full ShellTorch RCE chain
Jupyter	Enum, kernels, notebooks (--mine-secrets cell mining), read-notebook, exec, start-kernel, reverse-shell-proof, pip-proof; auth/terminal templates; kernelspec + contents-read templates	Extension/config enrichment	Broader notebook environment takeover
MCP	Analyze, enum, poison (9 modes: generic/ssrf-cloud/cmd-inject/path-traversal + 5 schema poison modes), env-extract (credential probing), chain (automated kill chain); HTTP + stdio transport; config variants; capability classification; remote URL correlation; 20 vuln templates covering unauth access, inspector exposure, DNS rebinding, session leakage, SSRF, RCE CVEs (Inspector, Figma, K8s, mcp-remote, Filesystem MCP), and server-specific detection (Neo4j, Vet, MS Learn)	Deeper multi-step poison automation, richer inspector-to-remote exploitation	Broader exploit automation
Ray	Dashboard enum, jobs (array + object formats, runtime_env/env_vars extraction), job-logs, job-artifacts, submit, runtime-env, pip-inject, cluster-info; unauth templates; cluster-status + log-exposure templates	Better cluster-state enrichment	Broader cluster takeover workflows
MLflow	Tracking enum (root + /health), experiments, runs, registry (GET-first), model-versions, model-artifacts, artifact-tree, artifact reads, tamper-proof (experiment/run/param creation); unauth templates; registry-surface template	Better artifact sensitivity/value inference	Deeper artifact and registry exploitation
Gradio	Config enum, endpoint discovery, predict, queue/upload proofs, file reads, file-chain, serve-probe; exposed-surface templates; file-capable surface template	Better route-specific exploit chaining, richer handle parsing	Deeper upload/read/serve exploit coverage

Fingerprinting now distinguishes confirmed, suspected, and ambiguous matches. Generic probes are preserved where useful, but weak-only hits are downgraded instead of being presented as authoritative service identity. Ambiguous or proxy-like ports are no longer skipped during network assessment: aipostex expands template coverage across each plausible HTTP service identity and marks resulting findings with fingerprint_status=ambiguous, coverage_expanded=true, candidate_services, and identity_confidence=ambiguous. Clear non-HTTP identities such as PostgreSQL/pgvector are preserved for module enumeration, but HTTP templates are not sprayed at those ports; the output records an informational skip instead.

Template Coverage¶

Templates are classified by info.type in each YAML: detection (default) or exploit. Totals below are from pkg/vulncheck/templates/**/*.yaml (embedded at build time). The --mode flag controls which run: detect (default, safe) runs only detection templates; full runs all templates including active exploitation.

Category	Templates	Detection	Exploit	Severity Range
MCP	23	16	7	Critical, High, Medium
A2A	14	2	12	High, Medium
NVIDIA Triton	8	6	2	Critical, High, Medium
Vector DBs	7	4	3	High
LangChain / LangServe	7	5	2	Critical, High, Medium
Ray	6	4	2	Critical, High, Medium
Ollama	6	5	1	Critical, High, Medium, Info
Hugging Face	6	4	2	Critical, High, Medium
Jupyter	6	4	2	Critical, High, Medium
Gradio	6	4	2	Critical, High, Medium
TorchServe	5	3	2	Critical, High, Medium
OpenAI-Compatible	5	1	4	Critical, High
MLflow	5	3	2	Critical, High, Medium
BentoML	4	4	0	High, Medium, Info
vLLM	4	4	0	Critical, High
Weights & Biases	3	3	0	High, Medium
TensorFlow Serving	3	2	1	High, Medium
Kubeflow	3	3	0	High
Kubernetes	3	2	1	Critical, High, Medium
Campaign	3	2	1	High
LiteLLM	2	2	0	High
Streamlit	2	2	0	High, Medium
Total	131	85	46

Discovery Rule Coverage¶

Rule Pack	Rules	Category
api_keys.yaml	17	AI credentials (OpenAI, Anthropic, HF, Google, Cohere, Replicate, Mistral, Groq, AWS, Pinecone, GitHub, Slack, Jira, Brave, LangChain, WandB, Azure OpenAI)
mcp_configs.yaml	5	MCP configuration files
local_llm.yaml	9	Local LLM artifacts (Ollama, GGUF, SafeTensors, pickle, PyTorch, ONNX, LM Studio, Docker AI, Ollama env)
vectordb_rag.yaml	9	Vector DB, RAG configs, training data, notebooks, LLMjacking indicators
core_assessment.yaml	9	Fine-tuning data (manifests, CSV, HF dataset files), Arrow/TFRecord, RAG pipelines, LLMjacking, DB connection strings, LangChain RAG config
Total	49

Lab harness parity¶

End-to-end checks against the live lab live in the companion repo aipostex-lab script lab-scripts/attack-box/verify-aipostex.sh.

Automated there (operator / active / contract layers): discover network, discover files, assess network (unless AIPOSTEX_SKIP_ASSESS=1), scan targets, mcp (analyze, enum, stdio, poison), openai-compat, ray, mlflow, gradio, ollama, vectordb (enum, extract, search-sensitive), jupyter (including notebooks --mine-secrets, read-notebook, gated proofs).

Not driven by that harness (manual, scoring-only, or optional layers): templates, report, engagement, model-scan, top-level litellm (LiteLLM is still exercised via openai-compat litellm-probe), BentoML / Triton / TorchServe modules unless you aim those CLIs at services that happen to be up in your range, and MCP env-extract / chain (no deterministic assertions in the default script).

Current Gaps¶

These features are still planned or intentionally deferred:

Feature	Status	Description
`validate`	Deferred	Finding validation and confidence scoring
SQLite output	Deferred	Database output format for querying
Additional transports	Deferred	More transport coverage beyond current HTTP and MCP stdio support
Resumable jobs	Deferred	Long-running job orchestration and resume
Deeper provider-specific workflows	Planned	More service-specific exploit depth where generic coverage is not enough
Model supply chain validation	Partial	`model-scan` CLI with directory excludes, size cap, GGUF handling; extend signals as needed
Cloud AI service probing	Planned	SageMaker, Bedrock, Vertex AI, Azure OpenAI endpoint discovery

Build Plan Phases¶

The project follows a phased build plan:

Correctness -- assessment loop hardening, deduplication, summaries (done)
MCP Expansion -- deeper MCP coverage, poison modes, capability classification (done)
OpenAI-Compatible -- generic inference validation, auth sweep, tool enumeration, prompt testing, throughput (done)
Runtime/OPSEC -- proxy, stealth, embed, signal handling, guardrails (done)
Service Backlog -- Ray, MLflow, Gradio depth modules (done)
Lab/Release -- CI/CD, documentation, release packaging, lab validation at 91.8% (done)

MITRE ATLAS Mapping¶

Templates reference MITRE ATLAS techniques where applicable:

Technique	Coverage
AML.T0049 (Exploit Public-Facing Application)	Ollama, MCP, vector DBs, Jupyter, Ray, MLflow, Gradio, OpenAI-compat, vLLM, LangChain
AML.T0034 (Cost Harvesting)	OpenAI-compat throughput, Ollama generate, HF TGI inference, HF TEI embedding
AML.T0040 (ML Model Inference API Access)	Ollama, OpenAI-compat validate-inference, HF TGI, HF TEI, LangServe chain invoke