Skip to content

Coverage & Roadmap

Current Coverage Matrix

Service Implemented Next Additions Later-Phase
Ollama Enum, prompts (System field + Modelfile parsing), generate, show, running, copy/create/delete/poison, weight exfiltration Better value scoring, model tamper validation Resource exhaustion
OpenAI-Compatible (vLLM, LiteLLM, LocalAI, LM Studio) auth-sweep, enum, validate-inference, prompt-extract, tool-enum, prompt-test, throughput, proxy-test, litellm-probe; scan/template coverage; explicit model-inventory + prompt-injection + weak-auth/tool-injection templates Campaign-level reporting, provider-specific divergence Provider-specific deep exploit paths
ChromaDB Enum (tenant/database-aware fallback), extract, sensitive search (27 patterns), unauth template Better value scoring Mutating abuse, snapshot exports
Qdrant Enum, extract, sensitive search, unauth template Better cluster/detail enrichment Snapshot export, destructive workflows
Weaviate Enum, extract, sensitive search, unauth template Better schema/GraphQL enrichment Backup/export abuse, cluster ops
Milvus Enum, extract, sensitive search, inject, metadata-inject, unauth template Deeper schema introspection Partition/index manipulation
pgvector Enum (table introspection), extract, sensitive search, inject, metadata-inject Column-level sensitive scanning SQL injection chaining
BentoML Enum, routes (OpenAPI parsing), predict, metrics; 3 vuln templates Deeper runner/model versioning Custom runner exploitation
NVIDIA Triton Enum, models, model-config, infer, model-load, model-unload, shm-probe (CVE-2025-23319/23320/23334); 7 vuln templates Repository manipulation Full IPC exploitation chain
TorchServe Enum, models, predict, register (ShellTorch SSRF), scale, unregister, metrics; 5 vuln templates Deeper handler inspection Full ShellTorch RCE chain
Jupyter Enum, kernels, notebooks (--mine-secrets cell mining), read-notebook, exec, start-kernel, reverse-shell-proof, pip-proof; auth/terminal templates; kernelspec + contents-read templates Extension/config enrichment Broader notebook environment takeover
MCP Analyze, enum, poison (9 modes: generic/ssrf-cloud/cmd-inject/path-traversal + 5 schema poison modes), env-extract (credential probing), chain (automated kill chain); HTTP + stdio transport; config variants; capability classification; remote URL correlation; 20 vuln templates covering unauth access, inspector exposure, DNS rebinding, session leakage, SSRF, RCE CVEs (Inspector, Figma, K8s, mcp-remote, Filesystem MCP), and server-specific detection (Neo4j, Vet, MS Learn) Deeper multi-step poison automation, richer inspector-to-remote exploitation Broader exploit automation
Ray Dashboard enum, jobs (array + object formats, runtime_env/env_vars extraction), job-logs, job-artifacts, submit, runtime-env, pip-inject, cluster-info; unauth templates; cluster-status + log-exposure templates Better cluster-state enrichment Broader cluster takeover workflows
MLflow Tracking enum (root + /health), experiments, runs, registry (GET-first), model-versions, model-artifacts, artifact-tree, artifact reads, tamper-proof (experiment/run/param creation); unauth templates; registry-surface template Better artifact sensitivity/value inference Deeper artifact and registry exploitation
Gradio Config enum, endpoint discovery, predict, queue/upload proofs, file reads, file-chain, serve-probe; exposed-surface templates; file-capable surface template Better route-specific exploit chaining, richer handle parsing Deeper upload/read/serve exploit coverage

Fingerprinting now distinguishes confirmed, suspected, and ambiguous matches. Generic probes are preserved where useful, but weak-only hits are downgraded instead of being presented as authoritative service identity. Ambiguous or proxy-like ports are no longer skipped during network assessment: aipostex expands template coverage across each plausible HTTP service identity and marks resulting findings with fingerprint_status=ambiguous, coverage_expanded=true, candidate_services, and identity_confidence=ambiguous. Clear non-HTTP identities such as PostgreSQL/pgvector are preserved for module enumeration, but HTTP templates are not sprayed at those ports; the output records an informational skip instead.

Template Coverage

Templates are classified by info.type in each YAML: detection (default) or exploit. Totals below are from pkg/vulncheck/templates/**/*.yaml (embedded at build time). The --mode flag controls which run: detect (default, safe) runs only detection templates; full runs all templates including active exploitation.

Category Templates Detection Exploit Severity Range
MCP 23 16 7 Critical, High, Medium
A2A 14 2 12 High, Medium
NVIDIA Triton 8 6 2 Critical, High, Medium
Vector DBs 7 4 3 High
LangChain / LangServe 7 5 2 Critical, High, Medium
Ray 6 4 2 Critical, High, Medium
Ollama 6 5 1 Critical, High, Medium, Info
Hugging Face 6 4 2 Critical, High, Medium
Jupyter 6 4 2 Critical, High, Medium
Gradio 6 4 2 Critical, High, Medium
TorchServe 5 3 2 Critical, High, Medium
OpenAI-Compatible 5 1 4 Critical, High
MLflow 5 3 2 Critical, High, Medium
BentoML 4 4 0 High, Medium, Info
vLLM 4 4 0 Critical, High
Weights & Biases 3 3 0 High, Medium
TensorFlow Serving 3 2 1 High, Medium
Kubeflow 3 3 0 High
Kubernetes 3 2 1 Critical, High, Medium
Campaign 3 2 1 High
LiteLLM 2 2 0 High
Streamlit 2 2 0 High, Medium
Total 131 85 46

Discovery Rule Coverage

Rule Pack Rules Category
api_keys.yaml 17 AI credentials (OpenAI, Anthropic, HF, Google, Cohere, Replicate, Mistral, Groq, AWS, Pinecone, GitHub, Slack, Jira, Brave, LangChain, WandB, Azure OpenAI)
mcp_configs.yaml 5 MCP configuration files
local_llm.yaml 9 Local LLM artifacts (Ollama, GGUF, SafeTensors, pickle, PyTorch, ONNX, LM Studio, Docker AI, Ollama env)
vectordb_rag.yaml 9 Vector DB, RAG configs, training data, notebooks, LLMjacking indicators
core_assessment.yaml 9 Fine-tuning data (manifests, CSV, HF dataset files), Arrow/TFRecord, RAG pipelines, LLMjacking, DB connection strings, LangChain RAG config
Total 49

Lab harness parity

End-to-end checks against the live lab live in the companion repo aipostex-lab script lab-scripts/attack-box/verify-aipostex.sh.

Automated there (operator / active / contract layers): discover network, discover files, assess network (unless AIPOSTEX_SKIP_ASSESS=1), scan targets, mcp (analyze, enum, stdio, poison), openai-compat, ray, mlflow, gradio, ollama, vectordb (enum, extract, search-sensitive), jupyter (including notebooks --mine-secrets, read-notebook, gated proofs).

Not driven by that harness (manual, scoring-only, or optional layers): templates, report, engagement, model-scan, top-level litellm (LiteLLM is still exercised via openai-compat litellm-probe), BentoML / Triton / TorchServe modules unless you aim those CLIs at services that happen to be up in your range, and MCP env-extract / chain (no deterministic assertions in the default script).

Current Gaps

These features are still planned or intentionally deferred:

Feature Status Description
validate Deferred Finding validation and confidence scoring
SQLite output Deferred Database output format for querying
Additional transports Deferred More transport coverage beyond current HTTP and MCP stdio support
Resumable jobs Deferred Long-running job orchestration and resume
Deeper provider-specific workflows Planned More service-specific exploit depth where generic coverage is not enough
Model supply chain validation Partial model-scan CLI with directory excludes, size cap, GGUF handling; extend signals as needed
Cloud AI service probing Planned SageMaker, Bedrock, Vertex AI, Azure OpenAI endpoint discovery

Build Plan Phases

The project follows a phased build plan:

  1. Correctness -- assessment loop hardening, deduplication, summaries (done)
  2. MCP Expansion -- deeper MCP coverage, poison modes, capability classification (done)
  3. OpenAI-Compatible -- generic inference validation, auth sweep, tool enumeration, prompt testing, throughput (done)
  4. Runtime/OPSEC -- proxy, stealth, embed, signal handling, guardrails (done)
  5. Service Backlog -- Ray, MLflow, Gradio depth modules (done)
  6. Lab/Release -- CI/CD, documentation, release packaging, lab validation at 91.8% (done)

MITRE ATLAS Mapping

Templates reference MITRE ATLAS techniques where applicable:

Technique Coverage
AML.T0049 (Exploit Public-Facing Application) Ollama, MCP, vector DBs, Jupyter, Ray, MLflow, Gradio, OpenAI-compat, vLLM, LangChain
AML.T0034 (Cost Harvesting) OpenAI-compat throughput, Ollama generate, HF TGI inference, HF TEI embedding
AML.T0040 (ML Model Inference API Access) Ollama, OpenAI-compat validate-inference, HF TGI, HF TEI, LangServe chain invoke