Skip to content

OpenAI-Compatible

Enumerate and validate generic OpenAI-compatible inference endpoints.

Overview

The openai-compat module targets any API that implements the OpenAI /v1/models and /v1/chat/completions interface. This covers vLLM, LiteLLM, LocalAI, LM Studio, and other compatible implementations. It provides authentication analysis, model enumeration, inference validation, operator-supplied generation, prompt extraction, tool enumeration, prompt injection testing, throughput testing, and proxy validation.

Subcommands

Read-Only (no --force-exploit required)

Subcommand Description
auth-sweep Classify weak authentication acceptance patterns
enum List available models and metadata
validate-inference Verify coherent inference output from a model
prompt-extract Attempt to extract hidden system instructions
tool-enum Enumerate function/tool calling behavior and test tool injection
prompt-test Probe prompt injection, jailbreak, and refusal-bypass resistance
litellm-probe Probe LiteLLM health, readiness, and model-info endpoints

Gated (requires --force-exploit)

Subcommand Description
generate Send an operator-supplied prompt and capture the model response
throughput Measure inference throughput with concurrent requests
proxy-test Prove the endpoint can proxy inference requests

Flags

Flag Required Description
--target Yes API base URL (e.g., http://127.0.0.1:8000)
--header No Custom HTTP headers. Repeatable.
--api-key No API key for authentication
--model For most commands Model name to target
--prompt For generate Prompt text for inference
--max-tokens For generate Maximum tokens to request
--requests Required for throughput Number of requests to send (must be > 0)
--concurrency Required for throughput Parallel request count (must be > 0)

Auth Sweep

The auth-sweep command tests multiple weak authentication patterns:

  • No authentication (no header)
  • Empty Bearer token
  • Placeholder keys (sk-test, sk-dummy, etc.)
  • Development tokens

Each pattern is classified and reported as a finding.

When no --model is supplied, auth-sweep and validate-inference try the highest-value listed model first and then fall back through the remaining model list until one succeeds or all fail. Backend failures are preserved in model_attempts metadata with classes such as backend-dependency-missing, backend-config-error, and model-route-error, so a broken provider route does not hide a working local model behind the same proxy.

Model Value Scoring

The module scores discovered models by value to the attacker:

  • Model name analysis for high-value indicators (GPT-4, Claude, large parameter counts)
  • Inference coherence scoring to confirm the model produces useful output
  • Rate limit signal detection

LiteLLM Probe

The litellm-probe subcommand targets LiteLLM-specific endpoints that are not part of the standard OpenAI API surface. Use it when the target is a LiteLLM proxy (typically on :4000). It probes three endpoints:

Endpoint What It Exposes Severity
/health/readiness LiteLLM version, DB connection status, cache status Medium
/health Backend topology — healthy/unhealthy endpoint counts and api_base URLs High
/v1/model/info Full model configurations; escalates to Critical if embedded API keys or credentials are found in litellm_params High / Critical

If none of the endpoints respond, the command emits an Info-level finding noting the probe returned no results.

Examples

# Test authentication patterns
./aipostex openai-compat --target http://127.0.0.1:8000 auth-sweep

# Enumerate models
./aipostex openai-compat --target http://127.0.0.1:8000 enum

# Validate inference on a model
./aipostex openai-compat --target http://127.0.0.1:8000 \
  validate-inference --model llama3

# Validate inference using model fallback
./aipostex openai-compat --target http://127.0.0.1:4000 validate-inference

# Send an operator prompt (gated)
./aipostex openai-compat --target http://127.0.0.1:4000 \
  generate --model local-smollm \
  --prompt "Explain what access this proxy gives me in one sentence." \
  --force-exploit

# Attempt prompt extraction
./aipostex openai-compat --target http://127.0.0.1:8000 \
  prompt-extract --model llama3

# Enumerate tool support and injection behavior
./aipostex openai-compat --target http://127.0.0.1:8000 \
  tool-enum --model llama3

# Probe prompt injection resistance
./aipostex openai-compat --target http://127.0.0.1:8000 \
  prompt-test --model llama3

# Throughput test (gated)
./aipostex openai-compat --target http://127.0.0.1:8000 throughput \
  --model llama3 --requests 5 --concurrency 2 --force-exploit

# Proxy validation (gated)
./aipostex openai-compat --target http://127.0.0.1:8000 proxy-test \
  --model llama3 --force-exploit

# Probe LiteLLM-specific endpoints
./aipostex openai-compat --target http://127.0.0.1:4000 litellm-probe

Workflow Progression

discover network (discovers OpenAI-compatible on :8000/:4000/:1234)
  → openai-compat auth-sweep (classify auth posture)
  → openai-compat litellm-probe (LiteLLM targets on :4000)
    → openai-compat enum (list models, value scoring)
      → openai-compat validate-inference --model <name>
        → openai-compat generate --model <name> --prompt "..." (gated proof)
        → openai-compat prompt-extract --model <name>
          → openai-compat tool-enum --model <name>
          → openai-compat prompt-test --model <name>
          → openai-compat throughput (measure abuse potential, gated)
          → openai-compat proxy-test (validate proxying, gated)