Skip to content

Triton Inference Server

Enumerate and exploit NVIDIA Triton Inference Server instances.

Overview

The triton module targets the Triton Inference Server REST API (KFServing v2 protocol). It discovers server metadata, lists loaded models with their configurations, probes for shared memory vulnerabilities (CVE-2025-23319/23320/23334), and tests inference and model lifecycle operations.

Subcommands

Read-Only (no --force-exploit required)

Subcommand Description
enum Server metadata, health status, and extensions
models List all loaded models with detailed metadata
model-config Detailed model configuration (instance groups, scheduling, optimization)
shm-probe Probe shared memory regions for IPC vulnerability chain (CVE-2025-23319/23320/23334)

Gated (requires --force-exploit)

Subcommand Description
infer Send inference request to a model
model-load Load a model from the repository (proves model injection surface)
model-unload Unload a model (proves destructive model lifecycle access)

Flags

Flag Required Description
--target Yes Triton HTTP API URL (default port 8000)
--header No Custom HTTP headers. Repeatable.
--model For model-config, infer, model-load, model-unload Model name
--payload For infer JSON inference payload

Key Endpoints

Endpoint Method Purpose
/v2 GET Server metadata (name, version, extensions)
/v2/health/ready GET Readiness probe
/v2/health/live GET Liveness probe
/v2/models GET List all loaded models
/v2/models/<name> GET Model metadata (inputs, outputs, platform)
/v2/models/<name>/config GET Detailed model configuration
/v2/models/<name>/infer POST Model inference
/v2/repository/index POST Model repository listing
/v2/repository/models/<name>/load POST Load model from repository
/v2/repository/models/<name>/unload POST Unload model
/v2/systemsharedmemory/status GET System shared memory regions
/v2/cudasharedmemory/status GET CUDA shared memory regions

SHM Probe (IPC Vulnerability Chain)

The shm-probe subcommand checks for the Wiz-discovered IPC vulnerability chain affecting Triton:

  • CVE-2025-23319 -- shared memory region manipulation
  • CVE-2025-23320 -- CUDA shared memory corruption
  • CVE-2025-23334 -- IPC exploitation for code execution

If shared memory status endpoints expose region data (names, keys, offsets, byte sizes), it indicates the IPC attack surface is accessible.

Examples

# Enumerate server metadata
./aipostex triton --target http://127.0.0.1:8000 enum

# List loaded models
./aipostex triton --target http://127.0.0.1:8000 models

# Get detailed model config
./aipostex triton --target http://127.0.0.1:8000 model-config --model resnet50

# Probe shared memory (IPC vuln chain)
./aipostex triton --target http://127.0.0.1:8000 shm-probe

# Test inference (gated)
./aipostex triton --target http://127.0.0.1:8000 infer \
  --model resnet50 --payload '{"inputs":[]}' --force-exploit

# Load a model from repository (gated)
./aipostex triton --target http://127.0.0.1:8000 model-load \
  --model test --force-exploit

Workflow Progression

discover network (discovers Triton on :8000)
  -> triton enum (server metadata, health)
    -> triton models (loaded model inventory)
    -> triton model-config --model <name> (detailed config)
    -> triton shm-probe (IPC vulnerability assessment)
    -> triton infer --model <name> (inference test, gated)
    -> triton model-load --model <name> (model injection, gated)