Regression Testing¶
Regression testing verifies that Crucible can rediscover known vulnerabilities when run against previously-vulnerable code. This validates the fuzzer's effectiveness and ensures mutation strategies cover real-world bug patterns.
Why Regression Test?¶
- Validate strategy coverage — Confirm that mutation strategies actually trigger known bug classes
- Benchmark effectiveness — Measure time-to-crash against a known-vulnerable target
- Test new strategies — When adding mutations, verify they can find bugs the old set could
- Confidence building — Prove the fuzzer works before committing to long campaigns against patched code
Version Pinning¶
The llama.cpp target supports pinning to specific commits or tags:
This checks out the specified version and rebuilds the harness. Common versions for regression testing:
| Version | Known Vulnerability | Expected Finding |
|---|---|---|
b3561 | CVE-2024-23496 (heap overflow in metadata parsing) | Crash via metadata.key_length or metadata.string_value |
b3561 | Cisco Talos tensor info bugs (TALOS-2024-1912 through 1916) | Crash via tensorinfo.dim_overflow or tensorinfo.offset |
TALOS CVE-Targeted Seeds¶
Beyond version pinning, Crucible provides reconstruction seeds that target specific vulnerability patterns from published CVEs. These are particularly useful for validating that mutation strategies cover known bug classes:
This produces seeds in corpus/reconstructed/ that exercise:
| Seed | CVE Pattern | What It Exercises |
|---|---|---|
talos-1912-array-string-overflow.gguf | TALOS-2024-1912 | Array string length overflow |
talos-1913-string-length-wrap.gguf | TALOS-2024-1913 | String length integer wrap |
talos-1914-ndims-oob.gguf | TALOS-2024-1914 | Out-of-bounds dimension count |
talos-1915-tensor-count-overflow.gguf | TALOS-2024-1915 | Tensor count integer overflow |
talos-1916-kv-count-overflow.gguf | TALOS-2024-1916 | Metadata KV count overflow |
databricks-array-size-wrap.gguf | Databricks CVEs | Array size integer wrap |
databricks-type-index-oob.gguf | Databricks CVEs | Type index out-of-bounds |
These seeds prime the fuzzer to explore code paths around known bug patterns, increasing the chance of finding regressions or patch bypasses.
Regression validation strategy
Run TALOS seeds against a pinned vulnerable version first to confirm they trigger the expected crash, then run against the latest version to verify the fix holds. Historical CVE data shows that integer overflow patches in GGUF parsers have been bypassed at least twice (CVE-2025-53630 → CVE-2026-27940 → CVE-2026-33298).
Workflow¶
1. Pin to a Vulnerable Version¶
2. Build the Harness¶
3. Generate CVE-Targeted Seeds¶
This produces reconstruction seeds in corpus/reconstructed/ based on known vulnerability patterns from Cisco Talos and other researchers.
4. Run a Focused Campaign¶
crucible run \
--harness ./crucible-libfuzzer \
--corpus ./corpus \
--output ./crashes/regression \
--jobs 8 \
--timeout 30s
5. Verify Crashes¶
Expected outcome
Against b3561, a well-seeded campaign should find heap buffer overflow crashes within the first hour. If the regression test does not find known bugs, investigate whether:
- The CVE-targeted seeds are present in the corpus
- The harness exercises the affected code path
- Sanitizers are properly enabled in the build
Comparing Versions¶
Run the same reproducer against both the vulnerable and patched versions to confirm a fix:
# Against vulnerable version
./crucible-libfuzzer-b3561 crashes/regression/crash-abc123
# Against patched version
./crucible-libfuzzer crashes/regression/crash-abc123
If the crash reproduces on the old version but not the new one, the fix is confirmed. If it crashes on both, the bug may not have been fully patched.
Adding New Regression Targets¶
When Crucible discovers a new vulnerability:
- Record the affected commit and the reproducer file
- Add the reproducer to
corpus/minimal/for future seeding - Document the version and expected crash in this workflow
- Periodically re-run regression tests after updating Crucible's mutation strategies
Multi-Target Regression¶
Crucible supports regression testing across multiple projects. When a vulnerability is found in shared code (e.g., gguf.cpp used by llama.cpp, whisper.cpp, and MLX), test the reproducer against all consumers:
# Test a GGUF reproducer against multiple targets
./crucible-libfuzzer crashes/regression/crash-abc123 # llama.cpp
./crucible-libfuzzer-whisper crashes/regression/crash-abc123 # whisper.cpp
./crucible-libfuzzer-mlx-gguf crashes/regression/crash-abc123 # Apple MLX
This cross-project validation has revealed cases where a fix applied to llama.cpp was not backported to downstream consumers, or where a vendored fork (like Ollama's llama.cpp) lags behind upstream security patches.