Crucible¶
The first structure-aware fuzzer for ML model file formats and inference infrastructure
Crucible is a structure-aware fuzzer targeting the attack surface between model distribution and inference execution: file format parsers (GGUF), serving protocols (ggml-rpc), template engines (Jinja2), and API endpoints. It combines format-aware mutation with systematic input validation auditing.
60 Mutation Strategies
12 Mutation Categories
35+ Fuzz Harnesses
CVE Ready Reports
Why Crucible?¶
ML model parsers and inference protocols are high-value attack surfaces
Every time someone loads a model through Ollama, LM Studio, GPT4All, llama.cpp, whisper.cpp, or stable-diffusion.cpp, they execute C/C++ parsers on untrusted data — GGUF files, RPC wire protocol messages, JSON Schema grammars, Jinja templates, and HTTP API payloads.
Known CVEs in these parsers carry CVSS scores from 8.8 to 10.0 — heap buffer overflows, integer overflows, and out-of-bounds reads discovered by teams like Cisco Talos and tracked by VulDB. These are not theoretical risks; they are shipped bugs in software that millions of users run on their machines.
No systematic structure-aware fuzzer for ML model parsers existed until now.
Generic fuzzers like AFL++ and libFuzzer are excellent tools, but when pointed at a binary format parser they spend most of their time generating inputs that fail magic-byte checks on the first 4 bytes. Crucible solves this by:
- Parsing valid GGUF seed files into structured representations
- Mutating at the semantic level -- header fields, metadata key-value pairs, tensor info blocks, alignment values, cross-field consistency invariants
- Serializing the mutated structure back into valid-enough binary that passes initial checks and reaches deep parsing code paths
The Gap¶
Security research in the ML ecosystem has focused on two areas: model-level attacks (adversarial examples, prompt injection, data poisoning) and web-layer vulnerabilities (SSRF in API endpoints, path traversal in model registries). The infrastructure between these layers — the binary parsers, wire protocols, and format converters that every inference stack depends on — has received far less attention.
The bugs that do get found here are severe. Cisco Talos published five GGUF parser CVEs in 2024, all CVSS 8.8, all heap overflows reachable by opening a crafted model file. 360 Vulnerability Research Institute found three RPC backend flaws including a CVSS 9.8 write-what-where that gives unauthenticated remote code execution. Wiz Research demonstrated path traversal to RCE in Ollama. These were found by skilled teams doing targeted manual audits — not by systematic, structure-aware fuzzing.
That is the gap Crucible fills. Rather than flipping random bytes and hoping to pass the GGUF magic check, Crucible:
- Parses seed files into typed structures (headers, metadata KV pairs, tensor info blocks, alignment, data regions)
- Mutates at the semantic level — corrupting cross-field invariants, injecting type-confused metadata, overflowing dimension products
- Serializes back to binary that is structurally valid enough to reach deep parser code paths
The Known CVEs page tracks 30+ published vulnerabilities across GGUF parsers, the RPC backend, grammar engines, and downstream projects including whisper.cpp, stable-diffusion.cpp, PyTorch, TensorFlow Lite, and Apple MLX. Every one of them is in Crucible's target surface.
Quick Start¶
Get from zero to fuzzing in four commands:
Prerequisites
You need a local clone of llama.cpp for the C harnesses. Set LLAMA_CPP=/path/to/llama.cpp when building. The Go native harness requires only the Go toolchain.
Feature Highlights¶
Structure-Aware Mutation Engine¶
Crucible does not flip random bytes. Every mutation operates on a parsed GGUF structure -- modifying header fields, injecting malformed metadata, corrupting tensor dimensions, and breaking cross-field invariants that parsers rely on.
Weighted Strategy Selection¶
Not all mutation categories are equal. Metadata and tensor info mutations account for 70% of selections because that is where the bug density is highest. The weight distribution:
| Category | Weight | Strategy Count | Rationale |
|---|---|---|---|
| Metadata | 35% | 18 strategies | Most complex parsing, string handling, type confusion, model-loader targeting |
| Tensor Info | 35% | 8 strategies | Dimension overflows, offset manipulation, type fuzzing |
| Header | 10% | 5 strategies | Version, counts, magic corruption |
| Consistency | 10% | 6 strategies | Cross-field mismatches -- where the worst bugs hide |
| Alignment | 5% | 3 strategies | Padding calculation bugs (cf. Cisco Talos findings) |
| Data | 5% | 6 strategies | Truncation, overlap, size mismatches |
Six Mutation Categories¶
Each category targets a different layer of the GGUF binary format:
Header Mutations
Corrupt magic bytes, version fields, tensor counts, and metadata counts with boundary values (0, MaxUint64, off-by-one).
Metadata Mutations
Extreme key lengths, embedded null bytes, non-UTF-8 payloads, type confusion (change value type tag but keep original encoding), alignment poisoning via general.alignment, key shadowing of critical keys like general.architecture.
Tensor Info Mutations
Invalid n_dims (0, 5, MaxUint32), dimension overflow products that wrap uint64, invalid ggml_type enum values, offsets beyond file bounds, duplicate/empty tensor names.
Alignment Mutations
Pathological general.alignment values (0 for division-by-zero, prime numbers, MaxUint64), extra garbage padding between sections, alignment metadata that disagrees with actual file padding.
Data Mutations
Truncated tensor data, overlapping tensor offset ranges, zero-length data sections with non-zero tensor claims, garbage-filled data blobs.
Consistency Mutations
tensor_count header mismatch with actual tensor info blocks, metadata_kv_count mismatches, tensor offsets beyond file bounds, tensor dimensions implying sizes larger than the data section, alignment metadata disagreeing with file structure.
Crash Deduplication by Stack Hash¶
The triage engine parses ASAN/UBSAN output, extracts stack frames, and computes a stable hash. Duplicate crashes are suppressed automatically so you focus on unique bugs.
Automatic CVSS Scoring¶
Every triaged crash receives a CVSS 3.1 base score derived from the vulnerability class:
| Crash Type | CVSS Score | Severity |
|---|---|---|
| Heap buffer overflow | 9.8 | Critical |
| Use-after-free | 9.8 | Critical |
| Integer overflow | 8.8 | High |
| Stack buffer overflow | 7.5 | High |
| Null dereference | 5.3 | Medium |
| Assertion failure | 5.3 | Medium |
CVE Submission Templates¶
Triage output includes pre-formatted vulnerability reports with crash ID, CVSS score, affected function, source location, stack trace, and reproducer reference -- ready for responsible disclosure.
Active Research¶
Crucible has identified potential issues that have been reported upstream via responsible disclosure. Details will be published after the disclosure process completes.
Crucible is a Halo Forge Labs project.