Mutation Engine¶
The mutation engine is the core innovation of Crucible. Rather than treating GGUF files as opaque byte streams, it parses the binary structure and applies targeted mutations to specific fields, reaching deep code paths that generic fuzzers never touch.
Why Structure-Aware Beats Generic¶
Generic fuzzers like AFL and libFuzzer treat inputs as flat byte arrays. For a format like GGUF, this means the vast majority of mutations produce files rejected at the very first check — the 4-byte magic number.
Generic Fuzzer Problem
A random bit-flip has a 99.99% chance of corrupting the magic bytes, version field, or count fields in the header. The target rejects the file immediately, and no deeper code is ever exercised.
Structure-Aware Advantage
Crucible parses the seed into a typed Go struct, mutates specific fields within structural constraints, then re-serializes. Every generated file passes initial parsing and reaches the code paths where real bugs live.
Consider the difference:
Mutation Pipeline¶
Each fuzzing iteration follows this pipeline:
flowchart LR
A[Parse Seed] --> B[Select 1-3\nMutations]
B --> C[Weighted Category\nSelection]
C --> D[Strategy\nSelection]
D --> E[Apply to\nGGUF Struct]
E --> F[Serialize\nto Bytes]
F --> G[Write to\nTarget] - Parse Seed — Load and deserialize a
.gguffile from the corpus into a*gguf.Filestruct - Select Mutation Count — Randomly choose 1 to 3 mutations to apply per iteration
- Weighted Category Selection — Pick a mutation category using the distribution below
- Strategy Selection — Uniformly select a specific strategy within that category
- Apply Mutation — Call
Strategy.Mutate(*gguf.File, *rand.Rand)to modify the struct in place - Serialize — Re-encode the mutated struct back to valid GGUF binary
- Write — Pass the bytes to the target binary via stdin or temp file
Category Weights¶
Mutation categories are weighted based on historical CVE density and code path coverage:
pie title Mutation Category Weights
"Metadata + Model-Loader (35%)" : 35
"TensorInfo (35%)" : 35
"Header (10%)" : 10
"Consistency (10%)" : 10
"Alignment (5%)" : 5
"Data (5%)" : 5 Why 70% Metadata + TensorInfo¶
The metadata and tensor info sections receive 70% of mutation budget for good reason. Analysis of historical CVEs in llama.cpp and related parsers shows these sections contain the most bug-dense code:
- Metadata parsing involves variable-length strings, nested arrays, and type dispatch — classic sources of buffer overflows and type confusion
- Tensor info parsing involves dimension arithmetic (multiplication of multiple
uint64values), offset calculations, and memory allocation sizing — classic sources of integer overflows - Model-loader targeting (5 strategies in
model_loader.go, weighted under the Metadata category) fuzzes architecture dispatch, hyperparameter handling, and tensor name schemas — exercising thellama_model_load()path that runs after GGUF parsing
CVE Evidence
The majority of GGUF-related vulnerabilities discovered by Cisco Talos, Trail of Bits, and independent researchers have been in metadata string handling, tensor dimension validation, and alignment calculation code paths.
Strategy Categories¶
Crucible implements 46 strategies across 6 weighted categories (7 strategy files). Every strategy implements the same interface:
type Strategy interface {
Name() string // (1)!
Mutate(f *gguf.File, rng *rand.Rand) // (2)!
}
- Returns a human-readable name for logging and crash reports
- Modifies the GGUF file struct in place using the provided RNG for determinism
Header Strategies¶
| Strategy | What It Does |
|---|---|
header.magic_corrupt | Partial magic corruption (keep 1-2 valid bytes) |
header.version | Set version to 0, 1, 999, or UINT32_MAX |
header.tensor_count | Set tensor_count to 0 while tensors remain |
header.metadata_kv_count | Set tensor_count to UINT64_MAX |
header.version_mismatch | Set version that disagrees with field sizes used |
Metadata Strategies¶
| Strategy | What It Does |
|---|---|
metadata.key_length | Empty keys, 1MB+ keys, embedded null bytes |
metadata.key_content | Non-UTF8 sequences, null sleds, path traversal strings, surrogate pairs |
metadata.key_shadow | Duplicate keys like general.architecture with conflicting value types |
metadata.value_type | Invalid enum values (14+, UINT32_MAX), type confusion between similar types |
metadata.deep_array | Create arrays nested to extreme depth |
metadata.array | Empty arrays, nested arrays, element type mismatch, large arrays (100K+ elements) |
metadata.string_value | Empty strings, 10MB strings, embedded nulls, non-UTF8 |
metadata.invalid_utf8 | Inject non-UTF-8 byte sequences in string values |
metadata.alignment_poison | Set general.alignment to 0, 1, 3, 7, UINT32_MAX |
metadata.reorder | Randomize the order of metadata key-value pairs |
metadata.add_extra | Inject 50-250 extra KV pairs with random types and large values |
metadata.int_overflow | UINT32_MAX, UINT64_MAX, INT64_MIN in integer fields |
metadata.string_truncated | Declared string length exceeds actual bytes available |
TensorInfo Strategies¶
| Strategy | What It Does |
|---|---|
tensorinfo.n_dims | Set n_dims to 0, 5+, UINT32_MAX (spec allows 1-4) |
tensorinfo.dim_overflow | Set individual dimension values to 0 or UINT64_MAX |
tensorinfo.type | Invalid ggml_type enum values (5, 15, 255, UINT32_MAX) |
tensorinfo.offset | Set offset beyond file size, UINT64_MAX, overlapping |
tensorinfo.name | Empty names, 1MB names, embedded nulls, non-UTF8, duplicates |
tensorinfo.dim_product_overflow | Dimension values whose product overflows uint64 |
tensorinfo.name_collision | Give two tensors the same name |
tensorinfo.offset_wraparound | Offset + size wraps uint64, bypassing bounds checks |
Model-Loader Strategies¶
These target the model-loading path (llama_model_load) rather than raw GGUF parsing. They are registered under the Metadata category for weighting purposes.
| Strategy | What It Does |
|---|---|
model.architecture | Set general.architecture to bogus/unknown/empty values |
model.hyperparam | Overflow hyperparameter keys (embedding_length, head_count, etc.) with UINT32_MAX |
model.vocab | Mutate tokenizer keys (model, bos/eos/pad token IDs) with invalid values |
model.layer_count | Set block_count to extreme values (0, UINT32_MAX) |
model.tensor_name_schema | Corrupt tensor names to break the name → layer mapping |
Alignment Strategies¶
| Strategy | What It Does |
|---|---|
alignment.padding | Set alignment to 0, prime numbers, UINT32_MAX, OS page size |
alignment.extra_padding | Insert random non-zero bytes before tensor data section |
alignment.missing_padding | Metadata claims alignment but padding bytes are absent |
Data Strategies¶
| Strategy | What It Does |
|---|---|
data.truncate | Truncate data section mid-tensor |
data.overlap | Multiple tensors pointing to the same offset |
data.zero_length | Empty data section with non-zero tensor count |
data.shorter | Data section shorter than the sum of all tensor sizes |
data.garbage_fill | Fill data section with random bytes |
data.nan_inf | Inject NaN and Infinity values into tensor data |
Consistency Strategies¶
| Strategy | What It Does |
|---|---|
consistency.tensor_count | Make tensor_count disagree with actual tensor entries |
consistency.metadata_count | Make metadata_kv_count disagree with actual pairs |
consistency.offset_beyond | Tensor offset + tensor data size > total file size |
consistency.tensor_size | Dimensions claim X bytes but actual data region is Y bytes |
consistency.duplicate_offset | Multiple tensors claim the same offset range |
consistency.alignment_disagree | Metadata alignment value != actual file padding alignment |
Critical Bug Patterns¶
The strategies above are designed to trigger specific classes of vulnerabilities:
Integer Overflow in Dimension Products
When tensor dimensions are multiplied to compute n_elements, large values cause silent overflow in C/C++. A tensor with dimensions [UINT64_MAX, 2] wraps around to a small allocation size, but the parser then writes far more data than allocated.
Strategies: HugeDimension, ExcessiveDims
Type Confusion
Writing a value as one type but tagging it as another causes the parser to interpret raw bytes incorrectly. A 4-byte float tagged as a string causes the parser to read the float bits as a string length, leading to out-of-bounds reads.
Strategies: WrongValueType, TypeConfusion
Alignment Poisoning
The general.alignment metadata key controls padding calculations. Setting it to 0 causes division-by-zero in offset % alignment. Setting it to a massive value causes the serializer to attempt allocating gigabytes of padding.
Strategies: ZeroAlignment, HugeAlignment
Deterministic Reproduction¶
Every fuzzing run is seeded with a 64-bit integer from crypto/rand. This seed is:
- Logged at the start of each run
- Recorded in every crash report
- Sufficient to reproduce the exact sequence of mutations
# Reproduce a crash with a known seed
crucible generate --seed 8827361950234 --count 1 --corpus ./corpus
Reproducibility Guarantee
Given the same RNG seed, corpus, and Crucible version, the fuzzer produces byte-identical output files. This makes every crash trivially reproducible for debugging and CVE reporting.