OpenAI-Compatible¶
Enumerate and validate generic OpenAI-compatible inference endpoints.
Overview¶
The openai-compat module targets any API that implements the OpenAI /v1/models and /v1/chat/completions interface. This covers vLLM, LiteLLM, LocalAI, LM Studio, and other compatible implementations. It provides authentication analysis, model enumeration, inference validation, operator-supplied generation, prompt extraction, tool enumeration, prompt injection testing, throughput testing, and proxy validation.
Subcommands¶
Read-Only (no --force-exploit required)¶
| Subcommand | Description |
|---|---|
auth-sweep |
Classify weak authentication acceptance patterns |
enum |
List available models and metadata |
validate-inference |
Verify coherent inference output from a model |
prompt-extract |
Attempt to extract hidden system instructions |
tool-enum |
Enumerate function/tool calling behavior and test tool injection |
prompt-test |
Probe prompt injection, jailbreak, and refusal-bypass resistance |
litellm-probe |
Probe LiteLLM health, readiness, and model-info endpoints |
Gated (requires --force-exploit)¶
| Subcommand | Description |
|---|---|
generate |
Send an operator-supplied prompt and capture the model response |
throughput |
Measure inference throughput with concurrent requests |
proxy-test |
Prove the endpoint can proxy inference requests |
Flags¶
| Flag | Required | Description |
|---|---|---|
--target |
Yes | API base URL (e.g., http://127.0.0.1:8000) |
--header |
No | Custom HTTP headers. Repeatable. |
--api-key |
No | API key for authentication |
--model |
For most commands | Model name to target |
--prompt |
For generate |
Prompt text for inference |
--max-tokens |
For generate |
Maximum tokens to request |
--requests |
Required for throughput |
Number of requests to send (must be > 0) |
--concurrency |
Required for throughput |
Parallel request count (must be > 0) |
Auth Sweep¶
The auth-sweep command tests multiple weak authentication patterns:
- No authentication (no header)
- Empty Bearer token
- Placeholder keys (
sk-test,sk-dummy, etc.) - Development tokens
Each pattern is classified and reported as a finding.
When no --model is supplied, auth-sweep and validate-inference try the highest-value listed model first and then fall back through the remaining model list until one succeeds or all fail. Backend failures are preserved in model_attempts metadata with classes such as backend-dependency-missing, backend-config-error, and model-route-error, so a broken provider route does not hide a working local model behind the same proxy.
Model Value Scoring¶
The module scores discovered models by value to the attacker:
- Model name analysis for high-value indicators (GPT-4, Claude, large parameter counts)
- Inference coherence scoring to confirm the model produces useful output
- Rate limit signal detection
LiteLLM Probe¶
The litellm-probe subcommand targets LiteLLM-specific endpoints that are not part of the standard OpenAI API surface. Use it when the target is a LiteLLM proxy (typically on :4000). It probes three endpoints:
| Endpoint | What It Exposes | Severity |
|---|---|---|
/health/readiness |
LiteLLM version, DB connection status, cache status | Medium |
/health |
Backend topology — healthy/unhealthy endpoint counts and api_base URLs |
High |
/v1/model/info |
Full model configurations; escalates to Critical if embedded API keys or credentials are found in litellm_params |
High / Critical |
If none of the endpoints respond, the command emits an Info-level finding noting the probe returned no results.
Examples¶
# Test authentication patterns
./aipostex openai-compat --target http://127.0.0.1:8000 auth-sweep
# Enumerate models
./aipostex openai-compat --target http://127.0.0.1:8000 enum
# Validate inference on a model
./aipostex openai-compat --target http://127.0.0.1:8000 \
validate-inference --model llama3
# Validate inference using model fallback
./aipostex openai-compat --target http://127.0.0.1:4000 validate-inference
# Send an operator prompt (gated)
./aipostex openai-compat --target http://127.0.0.1:4000 \
generate --model local-smollm \
--prompt "Explain what access this proxy gives me in one sentence." \
--force-exploit
# Attempt prompt extraction
./aipostex openai-compat --target http://127.0.0.1:8000 \
prompt-extract --model llama3
# Enumerate tool support and injection behavior
./aipostex openai-compat --target http://127.0.0.1:8000 \
tool-enum --model llama3
# Probe prompt injection resistance
./aipostex openai-compat --target http://127.0.0.1:8000 \
prompt-test --model llama3
# Throughput test (gated)
./aipostex openai-compat --target http://127.0.0.1:8000 throughput \
--model llama3 --requests 5 --concurrency 2 --force-exploit
# Proxy validation (gated)
./aipostex openai-compat --target http://127.0.0.1:8000 proxy-test \
--model llama3 --force-exploit
# Probe LiteLLM-specific endpoints
./aipostex openai-compat --target http://127.0.0.1:4000 litellm-probe
Workflow Progression¶
discover network (discovers OpenAI-compatible on :8000/:4000/:1234)
→ openai-compat auth-sweep (classify auth posture)
→ openai-compat litellm-probe (LiteLLM targets on :4000)
→ openai-compat enum (list models, value scoring)
→ openai-compat validate-inference --model <name>
→ openai-compat generate --model <name> --prompt "..." (gated proof)
→ openai-compat prompt-extract --model <name>
→ openai-compat tool-enum --model <name>
→ openai-compat prompt-test --model <name>
→ openai-compat throughput (measure abuse potential, gated)
→ openai-compat proxy-test (validate proxying, gated)