Configuration¶

Makefile targets¶

The project Makefile is the primary build and run interface. All targets respect the environment variables listed below.

Target	Description
`build`	Compile all Go binaries (`crucible`, `crucible-gen`, `crucible-triage`)
`test`	Run Go unit tests (`./cmd/...` and `./pkg/...`)
`generate`	Run `crucible-gen` to create a seed corpus
`harness-libfuzzer`	Build the libFuzzer harness (requires `LLAMA_CPP`)
`harness-afl`	Build the AFL++ harness (requires `LLAMA_CPP`)
`run-libfuzzer`	Build and run a libFuzzer campaign
`run-afl`	Build and run an AFL++ campaign
`triage`	Triage crashes in `./crashes` and write reports
`triage-libfuzzer`	Replay binary crashes through harness and generate reports
`fuzz-go`	Run Go-native fuzz tests (`go test -fuzz`)
`fuzz-mutator`	Fuzz the mutation engine itself
`fuzz-roundtrip`	Fuzz the GGUF encode/decode round-trip path
`coverage`	Collect code coverage from corpus replay
`test-short`	Run Go unit tests with a shorter timeout
`lint`	Run `go vet` and `go fmt`
`vet`	Run `go vet` static analysis
`fmt`	Run `go fmt` code formatting
`clean`	Remove built binaries, crash artifacts, and generated corpus

Chaining targets

make generate harness-libfuzzer run-libfuzzer LLAMA_CPP=../llama.cpp FUZZ_JOBS=4

Environment variables¶

Set these before invoking make or the CLI tools directly.

Variable	Default	Description
`LLAMA_CPP`	`~/src/llama.cpp`	Path to a local llama.cpp checkout (required for harness targets)
`FUZZ_JOBS`	`nproc` / `sysctl -n hw.ncpu`	Number of parallel fuzzing workers
`FUZZ_TIMEOUT`	`0` (unlimited)	Campaign timeout in seconds (`0` = run until stopped)
`FUZZ_MAX_LEN`	`10485760` (10 MiB)	Maximum input size in bytes fed to the harness
`GO`	`go`	Path to the Go binary (useful for non-standard installs)

Version Pinning

Target versions are pinned in VERSIONS.env at the project root for CI reproducibility:

LLAMA_CPP_VERSION=66c4f9d
OLLAMA_VERSION=v0.20.2
WHISPER_CPP_VERSION=v1.8.2
SD_CPP_VERSION=master-471-7010bb4
GO_MIN_VERSION=1.23.0

Persistent configuration

Export variables in your shell profile to avoid repeating them:

export LLAMA_CPP="$HOME/src/llama.cpp"
export FUZZ_JOBS=8

CLI flags reference¶

Each binary accepts its own set of flags. The table below is a summary — see the full reference in the CLI section of the documentation.

crucible¶

Flag	Description
`--harness`	Path to the compiled harness binary
`--corpus`	Corpus directory (passed to engine)
`--output`	Output directory for crashes
`--jobs`	Parallel worker count
`--engine`	`libfuzzer` (default) or `afl`
`--timeout`	Per-testcase timeout
`--max-len`	Maximum input size in bytes
`--dict`	Fuzzer dictionary path (auto-detected from corpus dir if empty)
`--dry-run`	Print configuration and exit
`--help`	Show help

crucible-gen¶

Flag	Description
`--output`	Directory to write generated seeds
`--count`	Number of mutated files to generate
`--seed`	Random seed (`0` = time-based)
`--mutate`	Also generate mutated variants
`--talos`	Generate Talos CVE-targeted seeds
`--arch`	Generate architecture-targeted seeds
`--clip`	Generate clip/vision-model seeds
`--help`	Show help

crucible-triage¶

Flag	Description
`--crashes`	Directory containing crash inputs
`--output`	Directory for report output
`--harness`	Harness binary (required for binary crash replay)
`--minimize`	Minimize crash reproducers (recurses into subdirectories)
`--replay-timeout`	Timeout per replay execution (default `30s`)
`--replay-env`	Extra env vars for replay (`KEY=VALUE`, repeatable)
`--target`	Target surface for reports (auto-detected from harness)
`--sarif`	Write SARIF 2.1.0 output to this file path
`--help`	Show help

Directory structure¶

A typical Crucible workspace looks like this:

crucible/
  corpus/
    minimal/        # Hand-crafted minimal valid GGUF files
    real/           # Real models downloaded from HuggingFace
    generated/      # Output of crucible-gen
    reconstructed/  # CVE-targeted seeds (crucible-gen --talos)
    targeted/       # Targeted seeds (crucible-gen --arch / --clip)
    arch-seeds/     # Per-architecture dispatch seeds (100+ architectures)
    grammar/        # GBNF grammar test cases
    jinja/          # Jinja template test cases
    json-schema/    # JSON Schema test cases
    rpc/            # RPC protocol message seeds
    server/         # HTTP endpoint test cases
    lora/           # LoRA adapter format seeds
    whisper/        # whisper.cpp model seeds
    whisper-audio/  # PCM audio seeds for whisper inference
    tflite/         # TensorFlow Lite model seeds
    torchscript/    # PyTorch TorchScript model seeds
    safetensors/    # SafeTensors format seeds
    onnx/           # ONNX model format seeds
    gguf.dict       # GGUF-specific fuzzer dictionary
    grammar.dict    # GBNF grammar dictionary
    jinja.dict      # Jinja template dictionary
    json-schema.dict # JSON Schema dictionary
    pytorch.dict    # PyTorch-specific dictionary
    rpc.dict        # RPC protocol dictionary
    server.dict     # HTTP/JSON endpoint dictionary
    tflite.dict     # TFLite operator dictionary
  crashes/          # Crashing inputs discovered during campaigns
  reports/          # Triage reports (one per unique crash)

Campaign directories

Active fuzzing campaigns may create additional subdirectories under corpus/ (e.g., campaign-deep/, libfuzzer-campaign/). These are runtime artifacts managed by the fuzzing engine and do not need to be committed.

Note

The --corpus flag passes the directory directly to the fuzzing engine. libFuzzer reads top-level files in the corpus directory; it does not recurse into subdirectories. Point --corpus at a flat directory containing your seed files, or use crucible generate --output ./seeds to create one.

Fuzzer dictionary¶

The file corpus/gguf.dict contains tokens and byte sequences that are significant to the GGUF binary format. Both libFuzzer and AFL++ can consume this dictionary to guide mutation toward structurally meaningful changes.

The dictionary includes:

Magic bytes — the GGUF file signature
Version constants — known GGUF version numbers
Metadata key prefixes — common key strings such as general.architecture and tokenizer.ggml.model
Tensor dtype identifiers — enum values for F16, Q4_0, Q8_0, etc.
Alignment values — common padding boundaries (32, 64, 128)

Using the dictionary

libFuzzerAFL++Makefile

crucible run \
  --harness ./crucible-libfuzzer \
  --corpus ./corpus/generated \
  --output ./crashes

crucible run \
  --harness ./crucible-afl \
  --corpus ./corpus/generated \
  --output ./crashes \
  --engine afl

The run-libfuzzer and run-afl Makefile targets automatically pass corpus/gguf.dict if the file exists. No extra flags needed.

Extending the dictionary

If you are fuzzing a custom GGUF extension, add its specific magic values and key strings to gguf.dict. One token per line, using the libFuzzer dictionary format:

# Custom extension tokens
kw_custom="my_custom.key"
magic_ext="\x43\x55\x53\x54"

Next: See the Architecture section for details on how Crucible's mutation engine targets GGUF structure.