Regression Testing¶

Regression testing verifies that Crucible can rediscover known vulnerabilities when run against previously-vulnerable code. This validates the fuzzer's effectiveness and ensures mutation strategies cover real-world bug patterns.

Why Regression Test?¶

Validate strategy coverage — Confirm that mutation strategies actually trigger known bug classes
Benchmark effectiveness — Measure time-to-crash against a known-vulnerable target
Test new strategies — When adding mutations, verify they can find bugs the old set could
Confidence building — Prove the fuzzer works before committing to long campaigns against patched code

Version Pinning¶

The llama.cpp target supports pinning to specific commits or tags:

cd targets/llamacpp
make pin-version LLAMA_CPP_VERSION=b3561

This checks out the specified version and rebuilds the harness. Common versions for regression testing:

Version	Known Vulnerability	Expected Finding
`b3561`	CVE-2024-23496 (heap overflow in metadata parsing)	Crash via `metadata.key_length` or `metadata.string_value`
`b3561`	Cisco Talos tensor info bugs (TALOS-2024-1912 through 1916)	Crash via `tensorinfo.dim_overflow` or `tensorinfo.offset`

TALOS CVE-Targeted Seeds¶

Beyond version pinning, Crucible provides reconstruction seeds that target specific vulnerability patterns from published CVEs. These are particularly useful for validating that mutation strategies cover known bug classes:

crucible-gen --talos

This produces seeds in corpus/reconstructed/ that exercise:

Seed	CVE Pattern	What It Exercises
`talos-1912-array-string-overflow.gguf`	TALOS-2024-1912	Array string length overflow
`talos-1913-string-length-wrap.gguf`	TALOS-2024-1913	String length integer wrap
`talos-1914-ndims-oob.gguf`	TALOS-2024-1914	Out-of-bounds dimension count
`talos-1915-tensor-count-overflow.gguf`	TALOS-2024-1915	Tensor count integer overflow
`talos-1916-kv-count-overflow.gguf`	TALOS-2024-1916	Metadata KV count overflow
`databricks-array-size-wrap.gguf`	Databricks CVEs	Array size integer wrap
`databricks-type-index-oob.gguf`	Databricks CVEs	Type index out-of-bounds

These seeds prime the fuzzer to explore code paths around known bug patterns, increasing the chance of finding regressions or patch bypasses.

Regression validation strategy

Run TALOS seeds against a pinned vulnerable version first to confirm they trigger the expected crash, then run against the latest version to verify the fix holds. Historical CVE data shows that integer overflow patches in GGUF parsers have been bypassed at least twice (CVE-2025-53630 → CVE-2026-27940 → CVE-2026-33298).

Workflow¶

1. Pin to a Vulnerable Version¶

make -C targets/llamacpp pin-version LLAMA_CPP_VERSION=b3561

2. Build the Harness¶

make harness-libfuzzer LLAMA_CPP=~/src/llama.cpp

3. Generate CVE-Targeted Seeds¶

crucible-gen --talos

This produces reconstruction seeds in corpus/reconstructed/ based on known vulnerability patterns from Cisco Talos and other researchers.

4. Run a Focused Campaign¶

crucible run \
  --harness ./crucible-libfuzzer \
  --corpus ./corpus \
  --output ./crashes/regression \
  --jobs 8 \
  --timeout 30s

5. Verify Crashes¶

crucible triage --crashes ./crashes/regression --output ./reports/regression

Expected outcome

Against b3561, a well-seeded campaign should find heap buffer overflow crashes within the first hour. If the regression test does not find known bugs, investigate whether:

The CVE-targeted seeds are present in the corpus
The harness exercises the affected code path
Sanitizers are properly enabled in the build

Comparing Versions¶

Run the same reproducer against both the vulnerable and patched versions to confirm a fix:

# Against vulnerable version
./crucible-libfuzzer-b3561 crashes/regression/crash-abc123

# Against patched version
./crucible-libfuzzer crashes/regression/crash-abc123

If the crash reproduces on the old version but not the new one, the fix is confirmed. If it crashes on both, the bug may not have been fully patched.

Adding New Regression Targets¶

When Crucible discovers a new vulnerability:

Record the affected commit and the reproducer file
Add the reproducer to corpus/minimal/ for future seeding
Document the version and expected crash in this workflow
Periodically re-run regression tests after updating Crucible's mutation strategies

Multi-Target Regression¶

Crucible supports regression testing across multiple projects. When a vulnerability is found in shared code (e.g., gguf.cpp used by llama.cpp, whisper.cpp, and MLX), test the reproducer against all consumers:

# Test a GGUF reproducer against multiple targets
./crucible-libfuzzer crashes/regression/crash-abc123          # llama.cpp
./crucible-libfuzzer-whisper crashes/regression/crash-abc123  # whisper.cpp
./crucible-libfuzzer-mlx-gguf crashes/regression/crash-abc123 # Apple MLX

This cross-project validation has revealed cases where a fix applied to llama.cpp was not backported to downstream consumers, or where a vendored fork (like Ollama's llama.cpp) lags behind upstream security patches.