OpenAI-Compatible¶

Enumerate and validate generic OpenAI-compatible inference endpoints.

Overview¶

The openai-compat module targets any API that implements the OpenAI /v1/models and /v1/chat/completions interface. This covers vLLM, LiteLLM, LocalAI, LM Studio, and other compatible implementations. It provides authentication analysis, model enumeration, inference validation, operator-supplied generation, prompt extraction, tool enumeration, prompt injection testing, throughput testing, and proxy validation.

Subcommands¶

Read-Only (no `--force-exploit` required)¶

Subcommand	Description
`auth-sweep`	Classify weak authentication acceptance patterns
`enum`	List available models and metadata
`validate-inference`	Verify coherent inference output from a model
`prompt-extract`	Attempt to extract hidden system instructions
`tool-enum`	Enumerate function/tool calling behavior and test tool injection
`prompt-test`	Probe prompt injection, jailbreak, and refusal-bypass resistance
`litellm-probe`	Probe LiteLLM health, readiness, and model-info endpoints

Gated (requires `--force-exploit`)¶

Subcommand	Description
`generate`	Send an operator-supplied prompt and capture the model response
`throughput`	Measure inference throughput with concurrent requests
`proxy-test`	Prove the endpoint can proxy inference requests

Flags¶

Flag	Required	Description
`--target`	Yes	API base URL (e.g., `http://127.0.0.1:8000`)
`--header`	No	Custom HTTP headers. Repeatable.
`--api-key`	No	API key for authentication
`--model`	For most commands	Model name to target
`--prompt`	For `generate`	Prompt text for inference
`--max-tokens`	For `generate`	Maximum tokens to request
`--requests`	Required for `throughput`	Number of requests to send (must be > 0)
`--concurrency`	Required for `throughput`	Parallel request count (must be > 0)

Auth Sweep¶

The auth-sweep command tests multiple weak authentication patterns:

No authentication (no header)
Empty Bearer token
Placeholder keys (sk-test, sk-dummy, etc.)
Development tokens

Each pattern is classified and reported as a finding.

When no --model is supplied, auth-sweep and validate-inference try the highest-value listed model first and then fall back through the remaining model list until one succeeds or all fail. Backend failures are preserved in model_attempts metadata with classes such as backend-dependency-missing, backend-config-error, and model-route-error, so a broken provider route does not hide a working local model behind the same proxy.

Model Value Scoring¶

The module scores discovered models by value to the attacker:

Model name analysis for high-value indicators (GPT-4, Claude, large parameter counts)
Inference coherence scoring to confirm the model produces useful output
Rate limit signal detection

LiteLLM Probe¶

The litellm-probe subcommand targets LiteLLM-specific endpoints that are not part of the standard OpenAI API surface. Use it when the target is a LiteLLM proxy (typically on :4000). It probes three endpoints:

Endpoint	What It Exposes	Severity
`/health/readiness`	LiteLLM version, DB connection status, cache status	Medium
`/health`	Backend topology — healthy/unhealthy endpoint counts and `api_base` URLs	High
`/v1/model/info`	Full model configurations; escalates to Critical if embedded API keys or credentials are found in `litellm_params`	High / Critical

If none of the endpoints respond, the command emits an Info-level finding noting the probe returned no results.

Examples¶

# Test authentication patterns
./aipostex openai-compat --target http://127.0.0.1:8000 auth-sweep

# Enumerate models
./aipostex openai-compat --target http://127.0.0.1:8000 enum

# Validate inference on a model
./aipostex openai-compat --target http://127.0.0.1:8000 \
  validate-inference --model llama3

# Validate inference using model fallback
./aipostex openai-compat --target http://127.0.0.1:4000 validate-inference

# Send an operator prompt (gated)
./aipostex openai-compat --target http://127.0.0.1:4000 \
  generate --model local-smollm \
  --prompt "Explain what access this proxy gives me in one sentence." \
  --force-exploit

# Attempt prompt extraction
./aipostex openai-compat --target http://127.0.0.1:8000 \
  prompt-extract --model llama3

# Enumerate tool support and injection behavior
./aipostex openai-compat --target http://127.0.0.1:8000 \
  tool-enum --model llama3

# Probe prompt injection resistance
./aipostex openai-compat --target http://127.0.0.1:8000 \
  prompt-test --model llama3

# Throughput test (gated)
./aipostex openai-compat --target http://127.0.0.1:8000 throughput \
  --model llama3 --requests 5 --concurrency 2 --force-exploit

# Proxy validation (gated)
./aipostex openai-compat --target http://127.0.0.1:8000 proxy-test \
  --model llama3 --force-exploit

# Probe LiteLLM-specific endpoints
./aipostex openai-compat --target http://127.0.0.1:4000 litellm-probe

Workflow Progression¶

discover network (discovers OpenAI-compatible on :8000/:4000/:1234)
  → openai-compat auth-sweep (classify auth posture)
  → openai-compat litellm-probe (LiteLLM targets on :4000)
    → openai-compat enum (list models, value scoring)
      → openai-compat validate-inference --model <name>
        → openai-compat generate --model <name> --prompt "..." (gated proof)
        → openai-compat prompt-extract --model <name>
          → openai-compat tool-enum --model <name>
          → openai-compat prompt-test --model <name>
          → openai-compat throughput (measure abuse potential, gated)
          → openai-compat proxy-test (validate proxying, gated)