Skip to content

Configuration

Makefile targets

The project Makefile is the primary build and run interface. All targets respect the environment variables listed below.

Target Description
build Compile all Go binaries (crucible, crucible-gen, crucible-triage)
test Run Go unit tests (./cmd/... and ./pkg/...)
generate Run crucible-gen to create a seed corpus
harness-libfuzzer Build the libFuzzer harness (requires LLAMA_CPP)
harness-afl Build the AFL++ harness (requires LLAMA_CPP)
run-libfuzzer Build and run a libFuzzer campaign
run-afl Build and run an AFL++ campaign
triage Triage crashes in ./crashes and write reports
triage-libfuzzer Replay binary crashes through harness and generate reports
fuzz-go Run Go-native fuzz tests (go test -fuzz)
fuzz-mutator Fuzz the mutation engine itself
fuzz-roundtrip Fuzz the GGUF encode/decode round-trip path
coverage Collect code coverage from corpus replay
test-short Run Go unit tests with a shorter timeout
lint Run go vet and go fmt
vet Run go vet static analysis
fmt Run go fmt code formatting
clean Remove built binaries, crash artifacts, and generated corpus

Chaining targets

make generate harness-libfuzzer run-libfuzzer LLAMA_CPP=../llama.cpp FUZZ_JOBS=4

Environment variables

Set these before invoking make or the CLI tools directly.

Variable Default Description
LLAMA_CPP ~/src/llama.cpp Path to a local llama.cpp checkout (required for harness targets)
FUZZ_JOBS nproc / sysctl -n hw.ncpu Number of parallel fuzzing workers
FUZZ_TIMEOUT 0 (unlimited) Campaign timeout in seconds (0 = run until stopped)
FUZZ_MAX_LEN 10485760 (10 MiB) Maximum input size in bytes fed to the harness
GO go Path to the Go binary (useful for non-standard installs)

Version Pinning

Target versions are pinned in VERSIONS.env at the project root for CI reproducibility:

LLAMA_CPP_VERSION=66c4f9d
OLLAMA_VERSION=v0.20.2
WHISPER_CPP_VERSION=v1.8.2
SD_CPP_VERSION=master-471-7010bb4
GO_MIN_VERSION=1.23.0

Persistent configuration

Export variables in your shell profile to avoid repeating them:

export LLAMA_CPP="$HOME/src/llama.cpp"
export FUZZ_JOBS=8

CLI flags reference

Each binary accepts its own set of flags. The table below is a summary — see the full reference in the CLI section of the documentation.

crucible

Flag Description
--harness Path to the compiled harness binary
--corpus Corpus directory (passed to engine)
--output Output directory for crashes
--jobs Parallel worker count
--engine libfuzzer (default) or afl
--timeout Per-testcase timeout
--max-len Maximum input size in bytes
--dict Fuzzer dictionary path (auto-detected from corpus dir if empty)
--dry-run Print configuration and exit
--help Show help

crucible-gen

Flag Description
--output Directory to write generated seeds
--count Number of mutated files to generate
--seed Random seed (0 = time-based)
--mutate Also generate mutated variants
--talos Generate Talos CVE-targeted seeds
--arch Generate architecture-targeted seeds
--clip Generate clip/vision-model seeds
--help Show help

crucible-triage

Flag Description
--crashes Directory containing crash inputs
--output Directory for report output
--harness Harness binary (required for binary crash replay)
--minimize Minimize crash reproducers (recurses into subdirectories)
--replay-timeout Timeout per replay execution (default 30s)
--replay-env Extra env vars for replay (KEY=VALUE, repeatable)
--target Target surface for reports (auto-detected from harness)
--sarif Write SARIF 2.1.0 output to this file path
--help Show help

Directory structure

A typical Crucible workspace looks like this:

crucible/
  corpus/
    minimal/        # Hand-crafted minimal valid GGUF files
    real/           # Real models downloaded from HuggingFace
    generated/      # Output of crucible-gen
    reconstructed/  # CVE-targeted seeds (crucible-gen --talos)
    targeted/       # Targeted seeds (crucible-gen --arch / --clip)
    arch-seeds/     # Per-architecture dispatch seeds (100+ architectures)
    grammar/        # GBNF grammar test cases
    jinja/          # Jinja template test cases
    json-schema/    # JSON Schema test cases
    rpc/            # RPC protocol message seeds
    server/         # HTTP endpoint test cases
    lora/           # LoRA adapter format seeds
    whisper/        # whisper.cpp model seeds
    whisper-audio/  # PCM audio seeds for whisper inference
    tflite/         # TensorFlow Lite model seeds
    torchscript/    # PyTorch TorchScript model seeds
    safetensors/    # SafeTensors format seeds
    onnx/           # ONNX model format seeds
    gguf.dict       # GGUF-specific fuzzer dictionary
    grammar.dict    # GBNF grammar dictionary
    jinja.dict      # Jinja template dictionary
    json-schema.dict # JSON Schema dictionary
    pytorch.dict    # PyTorch-specific dictionary
    rpc.dict        # RPC protocol dictionary
    server.dict     # HTTP/JSON endpoint dictionary
    tflite.dict     # TFLite operator dictionary
  crashes/          # Crashing inputs discovered during campaigns
  reports/          # Triage reports (one per unique crash)

Campaign directories

Active fuzzing campaigns may create additional subdirectories under corpus/ (e.g., campaign-deep/, libfuzzer-campaign/). These are runtime artifacts managed by the fuzzing engine and do not need to be committed.

Note

The --corpus flag passes the directory directly to the fuzzing engine. libFuzzer reads top-level files in the corpus directory; it does not recurse into subdirectories. Point --corpus at a flat directory containing your seed files, or use crucible generate --output ./seeds to create one.


Fuzzer dictionary

The file corpus/gguf.dict contains tokens and byte sequences that are significant to the GGUF binary format. Both libFuzzer and AFL++ can consume this dictionary to guide mutation toward structurally meaningful changes.

The dictionary includes:

  • Magic bytes — the GGUF file signature
  • Version constants — known GGUF version numbers
  • Metadata key prefixes — common key strings such as general.architecture and tokenizer.ggml.model
  • Tensor dtype identifiers — enum values for F16, Q4_0, Q8_0, etc.
  • Alignment values — common padding boundaries (32, 64, 128)

Using the dictionary

crucible run \
  --harness ./crucible-libfuzzer \
  --corpus ./corpus/generated \
  --output ./crashes
crucible run \
  --harness ./crucible-afl \
  --corpus ./corpus/generated \
  --output ./crashes \
  --engine afl

The run-libfuzzer and run-afl Makefile targets automatically pass corpus/gguf.dict if the file exists. No extra flags needed.

Extending the dictionary

If you are fuzzing a custom GGUF extension, add its specific magic values and key strings to gguf.dict. One token per line, using the libFuzzer dictionary format:

# Custom extension tokens
kw_custom="my_custom.key"
magic_ext="\x43\x55\x53\x54"

Next: See the Architecture section for details on how Crucible's mutation engine targets GGUF structure.