Skip to content

Regression Testing

Regression testing verifies that Crucible can rediscover known vulnerabilities when run against previously-vulnerable code. This validates the fuzzer's effectiveness and ensures mutation strategies cover real-world bug patterns.

Why Regression Test?

  • Validate strategy coverage — Confirm that mutation strategies actually trigger known bug classes
  • Benchmark effectiveness — Measure time-to-crash against a known-vulnerable target
  • Test new strategies — When adding mutations, verify they can find bugs the old set could
  • Confidence building — Prove the fuzzer works before committing to long campaigns against patched code

Version Pinning

The llama.cpp target supports pinning to specific commits or tags:

cd targets/llamacpp
make pin-version LLAMA_CPP_VERSION=b3561

This checks out the specified version and rebuilds the harness. Common versions for regression testing:

Version Known Vulnerability Expected Finding
b3561 CVE-2024-23496 (heap overflow in metadata parsing) Crash via metadata.key_length or metadata.string_value
b3561 Cisco Talos tensor info bugs (TALOS-2024-1912 through 1916) Crash via tensorinfo.dim_overflow or tensorinfo.offset

TALOS CVE-Targeted Seeds

Beyond version pinning, Crucible provides reconstruction seeds that target specific vulnerability patterns from published CVEs. These are particularly useful for validating that mutation strategies cover known bug classes:

crucible-gen --talos

This produces seeds in corpus/reconstructed/ that exercise:

Seed CVE Pattern What It Exercises
talos-1912-array-string-overflow.gguf TALOS-2024-1912 Array string length overflow
talos-1913-string-length-wrap.gguf TALOS-2024-1913 String length integer wrap
talos-1914-ndims-oob.gguf TALOS-2024-1914 Out-of-bounds dimension count
talos-1915-tensor-count-overflow.gguf TALOS-2024-1915 Tensor count integer overflow
talos-1916-kv-count-overflow.gguf TALOS-2024-1916 Metadata KV count overflow
databricks-array-size-wrap.gguf Databricks CVEs Array size integer wrap
databricks-type-index-oob.gguf Databricks CVEs Type index out-of-bounds

These seeds prime the fuzzer to explore code paths around known bug patterns, increasing the chance of finding regressions or patch bypasses.

Regression validation strategy

Run TALOS seeds against a pinned vulnerable version first to confirm they trigger the expected crash, then run against the latest version to verify the fix holds. Historical CVE data shows that integer overflow patches in GGUF parsers have been bypassed at least twice (CVE-2025-53630 → CVE-2026-27940 → CVE-2026-33298).

Workflow

1. Pin to a Vulnerable Version

make -C targets/llamacpp pin-version LLAMA_CPP_VERSION=b3561

2. Build the Harness

make harness-libfuzzer LLAMA_CPP=~/src/llama.cpp

3. Generate CVE-Targeted Seeds

crucible-gen --talos

This produces reconstruction seeds in corpus/reconstructed/ based on known vulnerability patterns from Cisco Talos and other researchers.

4. Run a Focused Campaign

crucible run \
  --harness ./crucible-libfuzzer \
  --corpus ./corpus \
  --output ./crashes/regression \
  --jobs 8 \
  --timeout 30s

5. Verify Crashes

crucible triage --crashes ./crashes/regression --output ./reports/regression

Expected outcome

Against b3561, a well-seeded campaign should find heap buffer overflow crashes within the first hour. If the regression test does not find known bugs, investigate whether:

  • The CVE-targeted seeds are present in the corpus
  • The harness exercises the affected code path
  • Sanitizers are properly enabled in the build

Comparing Versions

Run the same reproducer against both the vulnerable and patched versions to confirm a fix:

# Against vulnerable version
./crucible-libfuzzer-b3561 crashes/regression/crash-abc123

# Against patched version
./crucible-libfuzzer crashes/regression/crash-abc123

If the crash reproduces on the old version but not the new one, the fix is confirmed. If it crashes on both, the bug may not have been fully patched.

Adding New Regression Targets

When Crucible discovers a new vulnerability:

  1. Record the affected commit and the reproducer file
  2. Add the reproducer to corpus/minimal/ for future seeding
  3. Document the version and expected crash in this workflow
  4. Periodically re-run regression tests after updating Crucible's mutation strategies

Multi-Target Regression

Crucible supports regression testing across multiple projects. When a vulnerability is found in shared code (e.g., gguf.cpp used by llama.cpp, whisper.cpp, and MLX), test the reproducer against all consumers:

# Test a GGUF reproducer against multiple targets
./crucible-libfuzzer crashes/regression/crash-abc123          # llama.cpp
./crucible-libfuzzer-whisper crashes/regression/crash-abc123  # whisper.cpp
./crucible-libfuzzer-mlx-gguf crashes/regression/crash-abc123 # Apple MLX

This cross-project validation has revealed cases where a fix applied to llama.cpp was not backported to downstream consumers, or where a vendored fork (like Ollama's llama.cpp) lags behind upstream security patches.