Skip to content

Mutation Engine

The mutation engine is the core innovation of Crucible. Rather than treating GGUF files as opaque byte streams, it parses the binary structure and applies targeted mutations to specific fields, reaching deep code paths that generic fuzzers never touch.

Why Structure-Aware Beats Generic

Generic fuzzers like AFL and libFuzzer treat inputs as flat byte arrays. For a format like GGUF, this means the vast majority of mutations produce files rejected at the very first check — the 4-byte magic number.

Generic Fuzzer Problem

A random bit-flip has a 99.99% chance of corrupting the magic bytes, version field, or count fields in the header. The target rejects the file immediately, and no deeper code is ever exercised.

Structure-Aware Advantage

Crucible parses the seed into a typed Go struct, mutates specific fields within structural constraints, then re-serializes. Every generated file passes initial parsing and reaches the code paths where real bugs live.

Consider the difference:

Iteration 1: corrupt magic       → rejected at byte 0
Iteration 2: corrupt magic       → rejected at byte 0
Iteration 3: corrupt version     → rejected at byte 4
Iteration 4: corrupt magic       → rejected at byte 0
...
Iteration 10000: finally valid   → reaches metadata parser
Iteration 1: mutate metadata key → reaches metadata string handler
Iteration 2: corrupt tensor dims → reaches dimension validation
Iteration 3: poison alignment    → reaches padding calculator
Iteration 4: overflow dim product → reaches allocation logic
...
Every iteration reaches deep code paths

Mutation Pipeline

Each fuzzing iteration follows this pipeline:

flowchart LR
    A[Parse Seed] --> B[Select 1-3\nMutations]
    B --> C[Weighted Category\nSelection]
    C --> D[Strategy\nSelection]
    D --> E[Apply to\nGGUF Struct]
    E --> F[Serialize\nto Bytes]
    F --> G[Write to\nTarget]
  1. Parse Seed — Load and deserialize a .gguf file from the corpus into a *gguf.File struct
  2. Select Mutation Count — Randomly choose 1 to 3 mutations to apply per iteration
  3. Weighted Category Selection — Pick a mutation category using the distribution below
  4. Strategy Selection — Uniformly select a specific strategy within that category
  5. Apply Mutation — Call Strategy.Mutate(*gguf.File, *rand.Rand) to modify the struct in place
  6. Serialize — Re-encode the mutated struct back to valid GGUF binary
  7. Write — Pass the bytes to the target binary via stdin or temp file

Category Weights

Mutation categories are weighted based on historical CVE density and code path coverage:

pie title Mutation Category Weights
    "Metadata + Model-Loader (35%)" : 35
    "TensorInfo (35%)" : 35
    "Header (10%)" : 10
    "Consistency (10%)" : 10
    "Alignment (5%)" : 5
    "Data (5%)" : 5

Why 70% Metadata + TensorInfo

The metadata and tensor info sections receive 70% of mutation budget for good reason. Analysis of historical CVEs in llama.cpp and related parsers shows these sections contain the most bug-dense code:

  • Metadata parsing involves variable-length strings, nested arrays, and type dispatch — classic sources of buffer overflows and type confusion
  • Tensor info parsing involves dimension arithmetic (multiplication of multiple uint64 values), offset calculations, and memory allocation sizing — classic sources of integer overflows
  • Model-loader targeting (5 strategies in model_loader.go, weighted under the Metadata category) fuzzes architecture dispatch, hyperparameter handling, and tensor name schemas — exercising the llama_model_load() path that runs after GGUF parsing

CVE Evidence

The majority of GGUF-related vulnerabilities discovered by Cisco Talos, Trail of Bits, and independent researchers have been in metadata string handling, tensor dimension validation, and alignment calculation code paths.

Strategy Categories

Crucible implements 46 strategies across 6 weighted categories (7 strategy files). Every strategy implements the same interface:

Strategy Interface
type Strategy interface {
    Name() string                          // (1)!
    Mutate(f *gguf.File, rng *rand.Rand)   // (2)!
}
  1. Returns a human-readable name for logging and crash reports
  2. Modifies the GGUF file struct in place using the provided RNG for determinism

Header Strategies

Strategy What It Does
header.magic_corrupt Partial magic corruption (keep 1-2 valid bytes)
header.version Set version to 0, 1, 999, or UINT32_MAX
header.tensor_count Set tensor_count to 0 while tensors remain
header.metadata_kv_count Set tensor_count to UINT64_MAX
header.version_mismatch Set version that disagrees with field sizes used

Metadata Strategies

Strategy What It Does
metadata.key_length Empty keys, 1MB+ keys, embedded null bytes
metadata.key_content Non-UTF8 sequences, null sleds, path traversal strings, surrogate pairs
metadata.key_shadow Duplicate keys like general.architecture with conflicting value types
metadata.value_type Invalid enum values (14+, UINT32_MAX), type confusion between similar types
metadata.deep_array Create arrays nested to extreme depth
metadata.array Empty arrays, nested arrays, element type mismatch, large arrays (100K+ elements)
metadata.string_value Empty strings, 10MB strings, embedded nulls, non-UTF8
metadata.invalid_utf8 Inject non-UTF-8 byte sequences in string values
metadata.alignment_poison Set general.alignment to 0, 1, 3, 7, UINT32_MAX
metadata.reorder Randomize the order of metadata key-value pairs
metadata.add_extra Inject 50-250 extra KV pairs with random types and large values
metadata.int_overflow UINT32_MAX, UINT64_MAX, INT64_MIN in integer fields
metadata.string_truncated Declared string length exceeds actual bytes available

TensorInfo Strategies

Strategy What It Does
tensorinfo.n_dims Set n_dims to 0, 5+, UINT32_MAX (spec allows 1-4)
tensorinfo.dim_overflow Set individual dimension values to 0 or UINT64_MAX
tensorinfo.type Invalid ggml_type enum values (5, 15, 255, UINT32_MAX)
tensorinfo.offset Set offset beyond file size, UINT64_MAX, overlapping
tensorinfo.name Empty names, 1MB names, embedded nulls, non-UTF8, duplicates
tensorinfo.dim_product_overflow Dimension values whose product overflows uint64
tensorinfo.name_collision Give two tensors the same name
tensorinfo.offset_wraparound Offset + size wraps uint64, bypassing bounds checks

Model-Loader Strategies

These target the model-loading path (llama_model_load) rather than raw GGUF parsing. They are registered under the Metadata category for weighting purposes.

Strategy What It Does
model.architecture Set general.architecture to bogus/unknown/empty values
model.hyperparam Overflow hyperparameter keys (embedding_length, head_count, etc.) with UINT32_MAX
model.vocab Mutate tokenizer keys (model, bos/eos/pad token IDs) with invalid values
model.layer_count Set block_count to extreme values (0, UINT32_MAX)
model.tensor_name_schema Corrupt tensor names to break the name → layer mapping

Alignment Strategies

Strategy What It Does
alignment.padding Set alignment to 0, prime numbers, UINT32_MAX, OS page size
alignment.extra_padding Insert random non-zero bytes before tensor data section
alignment.missing_padding Metadata claims alignment but padding bytes are absent

Data Strategies

Strategy What It Does
data.truncate Truncate data section mid-tensor
data.overlap Multiple tensors pointing to the same offset
data.zero_length Empty data section with non-zero tensor count
data.shorter Data section shorter than the sum of all tensor sizes
data.garbage_fill Fill data section with random bytes
data.nan_inf Inject NaN and Infinity values into tensor data

Consistency Strategies

Strategy What It Does
consistency.tensor_count Make tensor_count disagree with actual tensor entries
consistency.metadata_count Make metadata_kv_count disagree with actual pairs
consistency.offset_beyond Tensor offset + tensor data size > total file size
consistency.tensor_size Dimensions claim X bytes but actual data region is Y bytes
consistency.duplicate_offset Multiple tensors claim the same offset range
consistency.alignment_disagree Metadata alignment value != actual file padding alignment

Critical Bug Patterns

The strategies above are designed to trigger specific classes of vulnerabilities:

Integer Overflow in Dimension Products

When tensor dimensions are multiplied to compute n_elements, large values cause silent overflow in C/C++. A tensor with dimensions [UINT64_MAX, 2] wraps around to a small allocation size, but the parser then writes far more data than allocated.

Strategies: HugeDimension, ExcessiveDims

Type Confusion

Writing a value as one type but tagging it as another causes the parser to interpret raw bytes incorrectly. A 4-byte float tagged as a string causes the parser to read the float bits as a string length, leading to out-of-bounds reads.

Strategies: WrongValueType, TypeConfusion

Alignment Poisoning

The general.alignment metadata key controls padding calculations. Setting it to 0 causes division-by-zero in offset % alignment. Setting it to a massive value causes the serializer to attempt allocating gigabytes of padding.

Strategies: ZeroAlignment, HugeAlignment

Deterministic Reproduction

Every fuzzing run is seeded with a 64-bit integer from crypto/rand. This seed is:

  1. Logged at the start of each run
  2. Recorded in every crash report
  3. Sufficient to reproduce the exact sequence of mutations
# Reproduce a crash with a known seed
crucible generate --seed 8827361950234 --count 1 --corpus ./corpus

Reproducibility Guarantee

Given the same RNG seed, corpus, and Crucible version, the fuzzer produces byte-identical output files. This makes every crash trivially reproducible for debugging and CVE reporting.