Crucible¶

The first structure-aware fuzzer for ML model file formats and inference infrastructure

Crucible is a structure-aware fuzzer targeting the attack surface between model distribution and inference execution: file format parsers (GGUF), serving protocols (ggml-rpc), template engines (Jinja2), and API endpoints. It combines format-aware mutation with systematic input validation auditing.

60 Mutation Strategies

12 Mutation Categories

35+ Fuzz Harnesses

CVE Ready Reports

Why Crucible?¶

ML model parsers and inference protocols are high-value attack surfaces

Every time someone loads a model through Ollama, LM Studio, GPT4All, llama.cpp, whisper.cpp, or stable-diffusion.cpp, they execute C/C++ parsers on untrusted data — GGUF files, RPC wire protocol messages, JSON Schema grammars, Jinja templates, and HTTP API payloads.

Known CVEs in these parsers carry CVSS scores from 8.8 to 10.0 — heap buffer overflows, integer overflows, and out-of-bounds reads discovered by teams like Cisco Talos and tracked by VulDB. These are not theoretical risks; they are shipped bugs in software that millions of users run on their machines.

No systematic structure-aware fuzzer for ML model parsers existed until now.

Generic fuzzers like AFL++ and libFuzzer are excellent tools, but when pointed at a binary format parser they spend most of their time generating inputs that fail magic-byte checks on the first 4 bytes. Crucible solves this by:

Parsing valid GGUF seed files into structured representations
Mutating at the semantic level -- header fields, metadata key-value pairs, tensor info blocks, alignment values, cross-field consistency invariants
Serializing the mutated structure back into valid-enough binary that passes initial checks and reaches deep parsing code paths

The Gap¶

Security research in the ML ecosystem has focused on two areas: model-level attacks (adversarial examples, prompt injection, data poisoning) and web-layer vulnerabilities (SSRF in API endpoints, path traversal in model registries). The infrastructure between these layers — the binary parsers, wire protocols, and format converters that every inference stack depends on — has received far less attention.

The bugs that do get found here are severe. Cisco Talos published five GGUF parser CVEs in 2024, all CVSS 8.8, all heap overflows reachable by opening a crafted model file. 360 Vulnerability Research Institute found three RPC backend flaws including a CVSS 9.8 write-what-where that gives unauthenticated remote code execution. Wiz Research demonstrated path traversal to RCE in Ollama. These were found by skilled teams doing targeted manual audits — not by systematic, structure-aware fuzzing.

That is the gap Crucible fills. Rather than flipping random bytes and hoping to pass the GGUF magic check, Crucible:

Parses seed files into typed structures (headers, metadata KV pairs, tensor info blocks, alignment, data regions)
Mutates at the semantic level — corrupting cross-field invariants, injecting type-confused metadata, overflowing dimension products
Serializes back to binary that is structurally valid enough to reach deep parser code paths

The Known CVEs page tracks 30+ published vulnerabilities across GGUF parsers, the RPC backend, grammar engines, and downstream projects including whisper.cpp, stable-diffusion.cpp, PyTorch, TensorFlow Lite, and Apple MLX. Every one of them is in Crucible's target surface.

Quick Start¶

Get from zero to fuzzing in four commands:

libFuzzerAFL++Go Native

make build              # Build crucible, crucible-gen, crucible-triage
make generate           # Generate 50 structurally varied seed files
make run-libfuzzer      # Launch libFuzzer with 8 parallel workers
make triage-libfuzzer   # Replay binary crashes and generate reports

make build
make generate
make run-afl            # Launch AFL++ campaign
make triage-libfuzzer

make build
make generate
make fuzz-go            # Go's built-in fuzzer targeting the GGUF reader
make triage

Prerequisites

You need a local clone of llama.cpp for the C harnesses. Set LLAMA_CPP=/path/to/llama.cpp when building. The Go native harness requires only the Go toolchain.

Feature Highlights¶

Structure-Aware Mutation Engine¶

Crucible does not flip random bytes. Every mutation operates on a parsed GGUF structure -- modifying header fields, injecting malformed metadata, corrupting tensor dimensions, and breaking cross-field invariants that parsers rely on.

Weighted Strategy Selection¶

Not all mutation categories are equal. Metadata and tensor info mutations account for 70% of selections because that is where the bug density is highest. The weight distribution:

Category	Weight	Strategy Count	Rationale
Metadata	35%	18 strategies	Most complex parsing, string handling, type confusion, model-loader targeting
Tensor Info	35%	8 strategies	Dimension overflows, offset manipulation, type fuzzing
Header	10%	5 strategies	Version, counts, magic corruption
Consistency	10%	6 strategies	Cross-field mismatches -- where the worst bugs hide
Alignment	5%	3 strategies	Padding calculation bugs (cf. Cisco Talos findings)
Data	5%	6 strategies	Truncation, overlap, size mismatches

Six Mutation Categories¶

Each category targets a different layer of the GGUF binary format:

Header Mutations

Corrupt magic bytes, version fields, tensor counts, and metadata counts with boundary values (0, MaxUint64, off-by-one).

Metadata Mutations

Extreme key lengths, embedded null bytes, non-UTF-8 payloads, type confusion (change value type tag but keep original encoding), alignment poisoning via general.alignment, key shadowing of critical keys like general.architecture.

Tensor Info Mutations

Invalid n_dims (0, 5, MaxUint32), dimension overflow products that wrap uint64, invalid ggml_type enum values, offsets beyond file bounds, duplicate/empty tensor names.

Alignment Mutations

Pathological general.alignment values (0 for division-by-zero, prime numbers, MaxUint64), extra garbage padding between sections, alignment metadata that disagrees with actual file padding.

Data Mutations

Truncated tensor data, overlapping tensor offset ranges, zero-length data sections with non-zero tensor claims, garbage-filled data blobs.

Consistency Mutations

tensor_count header mismatch with actual tensor info blocks, metadata_kv_count mismatches, tensor offsets beyond file bounds, tensor dimensions implying sizes larger than the data section, alignment metadata disagreeing with file structure.

Crash Deduplication by Stack Hash¶

The triage engine parses ASAN/UBSAN output, extracts stack frames, and computes a stable hash. Duplicate crashes are suppressed automatically so you focus on unique bugs.

Automatic CVSS Scoring¶

Every triaged crash receives a CVSS 3.1 base score derived from the vulnerability class:

Crash Type	CVSS Score	Severity
Heap buffer overflow	9.8	Critical
Use-after-free	9.8	Critical
Integer overflow	8.8	High
Stack buffer overflow	7.5	High
Null dereference	5.3	Medium
Assertion failure	5.3	Medium

CVE Submission Templates¶

Triage output includes pre-formatted vulnerability reports with crash ID, CVSS score, affected function, source location, stack trace, and reproducer reference -- ready for responsible disclosure.

Active Research¶

Crucible has identified potential issues that have been reported upstream via responsible disclosure. Details will be published after the disclosure process completes.

Crucible is a Halo Forge Labs project.