Known CVEs¶
Known vulnerabilities across the llama.cpp ecosystem — GGUF parsers, RPC backend, grammar engines, tokenizers, and downstream projects. These are the real-world bugs that Crucible's mutation strategies are designed to find.
Crucible's own discoveries
For vulnerabilities discovered by Crucible's fuzzing campaigns, see Crucible Findings.
Underexplored attack surface
Community security research (Huntr, Protect AI) identified GGUF as the least fuzzed ML model format compared to ONNX, SafeTensors, and others. The format's complexity — variable-length metadata, multiple quantization types, alignment requirements — creates a large attack surface that has not been systematically tested. Despite this, GGUF parsing has already yielded 20+ CVEs since January 2024.
GGUF Parser Vulnerabilities¶
Bugs in gguf_init_from_file() and related parsing functions in ggml/src/gguf.cpp. These affect every application that loads GGUF files.
| CVE | GHSA | CVSS | Type | Component | Discoverer |
|---|---|---|---|---|---|
| CVE-2024-21825 | -- | 8.8 | Integer Overflow | GGUF_TYPE_ARRAY/GGUF_TYPE_STRING parsing | Cisco Talos |
| CVE-2024-23496 | -- | 8.8 | Heap Buffer Overflow | gguf_fread_str() string length | Cisco Talos |
| CVE-2024-21802 | -- | 8.8 | Heap Buffer Overflow | n_dims > GGML_MAX_DIMS array overwrite | Cisco Talos |
| CVE-2024-21836 | -- | 8.8 | Integer Overflow | n_tensors allocation sizing | Cisco Talos |
| CVE-2024-23605 | -- | 8.8 | Integer Overflow | n_kv allocation sizing | Cisco Talos |
| CVE-2024-25664 | -- | -- | Heap Buffer Overflow | GGUF metadata validation | Databricks |
| CVE-2024-25665 | -- | -- | Heap Buffer Overflow | GGUF metadata validation | Databricks |
| CVE-2024-25666 | -- | -- | Heap Buffer Overflow | GGUF metadata validation | Databricks |
| CVE-2025-53630 | GHSA-vgg9-87g3-85w8 | -- | Integer Overflow | Cumulative tensor size ctx->size | -- |
| CVE-2026-27940 | GHSA-3p4r-fq3f-q74v | -- | Integer Overflow | Bypass of CVE-2025-53630 fix | -- |
| CVE-2026-33298 | GHSA-96jg-mvhq-q7q7 | -- | Integer Overflow | ggml_nbytes() dimension product | -- |
RPC Backend Vulnerabilities¶
Bugs in ggml-rpc.cpp, the network backend for distributed inference. The RPC backend ships with no authentication — the security policy states "do not use on untrusted networks."
| CVE | GHSA | CVSS | Type | Component | Discoverer |
|---|---|---|---|---|---|
| CVE-2024-42477 | GHSA-mqp6-7pv6-fqjf | -- | Global Buffer Overflow | ggml_type_size lookup | 360 VRI |
| CVE-2024-42478 | GHSA-5vm9-p64x-gqw9 | 9.8 | Arbitrary Address Read | User-controlled rpc_tensor.data pointer | 360 VRI |
| CVE-2024-42479 | GHSA-wcr5-566p-9cwj | 9.8 | Write-What-Where → RCE | ggml_backend_buffer::iface callback overwrite | 360 VRI |
| CVE-2026-34159 | GHSA-j8rj-fmpv-wcxw | 9.8 | Unauthenticated RCE | GRAPH_COMPUTE buffer=0 deserialization bypass | -- |
Upstream RPC Fixes (Not CVE-Assigned)¶
These fixes address security-relevant bugs in ggml-rpc that were patched without CVE assignment. They form critical context for CRUCIBLE-2026-004 through 006.
| Commit | PR | Date | Type | Component | Notes |
|---|---|---|---|---|---|
1d20e53c4 | ggml/1103 | 2025-02 | OOB Write → RCE | copy_tensor | First ggml-rpc security fix |
2bcdddd5e | #20712 | 2026-03-21 | Div-by-Zero (DoS) | deserialize_tensor type/blck_size | Independently reported via GHSA; fixes CRUCIBLE-2026-004 RPC vector |
39bf0d3c6 | #20908 | 2026-03-23 | Null Deref → RCE | create_node null-buffer check | |
ba38f3bec | #21030 | 2026-03-25 | Data Pointer Handling | deserialize_tensor data field |
Grammar, Tokenizer, and Server Vulnerabilities¶
Bugs in text processing components: GBNF grammar parsing, tokenizer vocabulary handling, and llama-server request processing.
| CVE | GHSA | CVSS | Type | Component | Discoverer |
|---|---|---|---|---|---|
| CVE-2026-2069 | -- | -- | Stack Buffer Overflow | GBNF grammar handler | -- |
| CVE-2025-49847 | GHSA-8wwf-w4qm-gpqr | -- | Buffer Overflow | token_to_piece() size_t→int32_t cast | -- |
| -- | GHSA-7rxv-5jhh-j6xx | -- | Heap Buffer Overflow | Tokenizer signed/unsigned overflow | -- |
| -- | GHSA-8947-pfff-2f3c | -- | OOB Write | llama-server negative n_discard context shift | -- |
Ecosystem Vulnerabilities¶
Bugs in downstream projects that wrap llama.cpp or share its ggml parsing code. These demonstrate that the attack surface extends beyond the core library.
| CVE | CVSS | Type | Project | Component | Discoverer |
|---|---|---|---|---|---|
| CVE-2024-34359 | 9.7 | SSTI → RCE | llama-cpp-python | Jinja2 chat templates via GGUF metadata | retr0reg |
| CVE-2024-37032 | -- | Path Traversal → RCE | Ollama | Model pull digest validation | Wiz Research |
| CVE-2025-14569 | -- | Use-After-Free | whisper.cpp | read_audio_data() | -- |
Why GGUF Bugs Propagate¶
Inherited attack surface
Every application that loads GGUF files inherits the parsing vulnerabilities of its underlying library. A heap overflow in llama.cpp is simultaneously a heap overflow in every tool built on top of it.
The GGUF ecosystem has a single-library dependency pattern:
llama.cpp / ggml (C/C++ GGUF parser)
|
+-- Ollama (wraps llama.cpp via cgo) — 175,000+ publicly-exposed servers
+-- llama-cpp-python (Python bindings)
+-- LM Studio (embeds llama.cpp)
+-- koboldcpp (fork of llama.cpp)
+-- LocalAI (wraps llama.cpp)
+-- GPT4All (uses llama.cpp backend)
+-- text-generation-webui (gguf loader)
+-- vLLM (optional GGUF support)
|
Shared ggml code (separate projects, same parser):
+-- whisper.cpp (speech recognition)
+-- stable-diffusion.cpp (image generation)
A single vulnerability in gguf_init_from_file() or ggml_nbytes() is exploitable across all of these tools. Users who download GGUF models from public repositories (Hugging Face, etc.) are exposed to malicious files that trigger these bugs.
The dependency tree extends beyond llama.cpp — whisper.cpp and stable-diffusion.cpp share the ggml library and inherit the same GGUF parser vulnerabilities. CVE-2024-34359 ("Llama Drama") demonstrated that 6,000+ models on Hugging Face were potentially affected by a single vulnerability in llama-cpp-python. SentinelOne and Censys identified 175,000 publicly-exposed Ollama servers across 130 countries as of January 2026.
Vulnerability Details¶
GGUF Parser — Cisco Talos Batch (January 2024)¶
Francesco Benvenuto of Cisco Talos discovered five heap-based buffer overflow vulnerabilities in gguf_init_from_file() in ggml.c, all disclosed February 26, 2024, all rated CVSS 8.8:
CVE-2024-21825 / TALOS-2024-1912 — Array/String Integer Overflow¶
- Root cause: Integer overflow in
GGUF_TYPE_ARRAY/GGUF_TYPE_STRINGparsing. The multiplicationkv->value.arr.n * sizeof(struct gguf_str)overflows, allocating less memory than needed. Subsequent element parsing writes past the buffer. - Crucible strategies:
metadata.array,metadata.int_overflow
CVE-2024-23496 / TALOS-2024-1913 — String Length Overflow¶
- Root cause: In
gguf_fread_str(),calloc(p->n + 1, 1)wraps whenp->nisUINT64_MAX, causing a tiny allocation followed by a large write from the file. - Crucible strategies:
metadata.key_length,metadata.string_value,metadata.string_truncated
CVE-2024-21802 / TALOS-2024-1914 — Dimension Count Array Overwrite¶
- Root cause:
n_dimsis an arbitraryuint32_tfrom the file, used to iterateinfo->ne[j]beyond the fixed 4-element array (GGML_MAX_DIMS = 4). Ann_dims > 4writes adjacent struct fields. - Crucible strategies:
tensorinfo.n_dims
CVE-2024-21836 / TALOS-2024-1915 — Tensor Count Allocation Overflow¶
- Root cause:
header.n_tensors * sizeof(gguf_tensor_info)overflows, under-allocating the tensor info array. The parsing loop then writes past the end. - Crucible strategies:
header.tensor_count,consistency.tensor_count
CVE-2024-23605 / TALOS-2024-1916 — KV Count Allocation Overflow¶
- Root cause: Same pattern as TALOS-2024-1915 but for
header.n_kv * sizeof(gguf_kv). Under-allocated KV array leads to heap buffer overflow during metadata parsing. - Crucible strategies:
header.metadata_kv_count,consistency.metadata_count
GGUF Parser — Databricks (January 2024)¶
Databricks independently reported three additional heap overflows from insufficient validation of GGUF metadata fields (CVE-2024-25664, CVE-2024-25665, CVE-2024-25666). These were parallel discoveries to the Cisco Talos findings.
Metadata injection
Databricks also documented a format-level design issue: the GGUF format allows arbitrary key-value metadata with no schema validation. Attackers can inject metadata keys that downstream tools interpret as trusted configuration (e.g., system prompts, execution parameters). Crucible targets this with metadata.key_shadow, metadata.add_extra, and metadata.key_content.
GGUF Parser — Integer Overflow Chain (2025–2026)¶
The most revealing pattern in GGUF parser security is the recurring cycle of overflow discovery, patch, and bypass. Each fix addresses the specific reported attack vector but misses the same pattern in adjacent code.
CVE-2025-53630 / GHSA-vgg9-87g3-85w8 — Cumulative Tensor Size Overflow¶
- Root cause: Integer overflow in
ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment)during the tensor size accumulation loop ingguf_init_from_file_impl(). The wrappedctx->sizeleads to a tiny allocation at the data buffer, followed by out-of-bounds pointer assignments for tensor data. - Patch:
SIZE_MAX - ctx->size < padded_sizeguard (commit 26a48ad). - Crucible strategies:
tensorinfo.dim_product_overflow,consistency.tensor_size,consistency.offset_beyond
CVE-2026-27940 / GHSA-3p4r-fq3f-q74v — Bypass of CVE-2025-53630 Fix¶
- Root cause: The CVE-2025-53630 patch addressed only one code path. Other allocation-size calculations in the same function remained vulnerable to the identical overflow pattern, allowing 528+ bytes of controlled data past the buffer boundary.
- Patch: Extend overflow guard to all accumulation paths (commit b8146).
- Crucible strategies: Same as CVE-2025-53630 — combined mutations across
tensorinfoandconsistencycategories
CVE-2026-33298 / GHSA-96jg-mvhq-q7q7 — ggml_nbytes() Integer Overflow¶
- Root cause:
ggml_nbytes()itself can integer-overflow when computing tensor data size from dimensions and quantization type. Crafted tensor dimensions cause the function to return a drastically undersized value (e.g., 4 MB instead of exabytes), corrupting the input to the CVE-2025-53630 overflow check before it even executes. - Patch: Overflow-checked multiplication in
ggml_nbytes()(commit b7824). - Crucible strategies:
tensorinfo.dim_product_overflow,tensorinfo.dim_overflow,consistency.tensor_size
RPC Backend (August 2024 – 2026)¶
7resp4ss and Guang Gong from 360 Vulnerability Research Institute discovered three critical vulnerabilities in ggml-rpc.cpp, fixed in version b3561. A fourth was found independently in 2026. Three of the four highest-severity CVEs in the entire llama.cpp ecosystem are in the RPC backend.
CVE-2024-42477 / GHSA-mqp6-7pv6-fqjf — Global Buffer Overflow¶
- Root cause: Out-of-bounds read in
ggml_type_sizelookup table via attacker-controlled tensor type field in RPC messages.
CVE-2024-42478 / GHSA-5vm9-p64x-gqw9 — Arbitrary Address Read (CVSS 9.8)¶
- Root cause: The
datafield inrpc_tensoris a user-controlled pointer. The RPC server dereferences it directly to read memory at any address the attacker specifies.
CVE-2024-42479 / GHSA-wcr5-566p-9cwj — Write-What-Where → Full RCE (CVSS 9.8)¶
- Root cause: Same user-controlled pointer mechanism as CVE-2024-42478 but for write operations. The original researchers chained this with CVE-2024-42478 into a full RCE exploit by overwriting
ggml_backend_buffer::ifacefunction pointer callbacks.
CVE-2026-34159 / GHSA-j8rj-fmpv-wcxw — GRAPH_COMPUTE Deserialization Bypass (CVSS 9.8)¶
- Root cause: Unauthenticated RCE via the
GRAPH_COMPUTEcommand. Whenbuffer=0,deserialize_tensor()skips bounds validation entirely, enabling arbitrary memory read/write with full ASLR bypass.
No authentication
The ggml-rpc backend has no authentication mechanism. The project's security policy states it should not be used on untrusted networks, yet it listens on all interfaces by default.
Grammar, Tokenizer, and Server¶
CVE-2026-2069 — GBNF Grammar Stack Buffer Overflow¶
- Root cause: Stack-based buffer overflow in the GBNF grammar handler during grammar parsing.
- Crucible strategies: Targeted by the grammar harness via crafted GBNF input.
CVE-2025-49847 / GHSA-8wwf-w4qm-gpqr — Vocabulary Buffer Overflow¶
- Root cause: In
llama_vocab::impl::token_to_piece(), asize_ttoint32_tcast bypasses bounds checking, enabling a buffer overflow during vocabulary loading via crafted GGUF files. - Crucible strategies:
model.vocab
GHSA-7rxv-5jhh-j6xx — Tokenizer Signed/Unsigned Heap Overflow¶
- Root cause: Signed/unsigned mismatch in
llama_vocab::tokenize()leads to a heap buffer overflow during tokenization. - Crucible strategies:
model.vocab
GHSA-8947-pfff-2f3c — Server OOB Write via Negative n_discard¶
- Root cause: A negative
n_discardvalue during context shift in llama-server causes an out-of-bounds write.
std::regex is fundamentally unsafe
GCC Bug #61582 documents stack overflow in std::regex due to recursive processing. GCC developers acknowledged the implementation is "unlikely to ever be fast or efficient, due to ABI compatibility reasons." Any C++ LLM infrastructure using std::regex — including llama.cpp's tokenizer pre-tokenization patterns — is vulnerable to ReDoS and crash attacks. RE2 (deterministic finite automata, linear time) is the recommended alternative.
Ecosystem¶
CVE-2024-34359 — "Llama Drama" (CVSS 9.7)¶
- Project: llama-cpp-python
- Root cause: Server-side template injection (SSTI) through malicious Jinja2 chat templates embedded in GGUF metadata. Remote code execution via
eval()in the template engine. Over 6,000 models on Hugging Face potentially affected. - Discoverer: retr0reg. JFrog published detailed analysis.
- Crucible strategies:
metadata.key_content,metadata.key_shadow(metadata injection vectors)
CVE-2024-37032 — "Probllama"¶
- Project: Ollama
- Root cause: Path traversal via model pull digest validation. Crafted model manifests write arbitrary files to the host filesystem, achieving RCE.
- Discoverer: Wiz Research
CVE-2025-14569 — whisper.cpp Use-After-Free¶
- Project: whisper.cpp
- Root cause: Use-after-free in
read_audio_data()affecting versions 1.8.0–1.8.2.
The Patch Bypass Pattern¶
The GGUF parser's vulnerability history demonstrates a recurring cycle: overflow discovered → point fix applied → same pattern found in adjacent code → deeper bypass found. This pattern is the strongest argument for continuous structure-aware fuzzing over manual code auditing.
Jan 2024 5 integer overflows found by Cisco Talos
(CVE-2024-21825, -21802, -21836, -23496, -23605)
→ All patched same day. All were allocation-size overflows.
Jul 2025 CVE-2025-53630 — same overflow class in ctx->size accumulation
→ Patch: SIZE_MAX guard added (commit 26a48ad)
Mar 2026 CVE-2026-27940 — bypass of CVE-2025-53630 fix
Original patch missed alternate code paths with identical overflow
→ Patch: extend guard to all paths (commit b8146)
Mar 2026 CVE-2026-33298 — ggml_nbytes() itself overflows BEFORE the guard
Crafted dimensions cause ggml_nbytes() to return 4 MB instead of
exabytes, corrupting the input to the overflow check
→ Patch: overflow-checked multiplication (commit b7824)
Rate of discovery
Despite 12+ CVE-class fixes since January 2024, the function's fundamental architecture — reading attacker-controlled values and using them in size calculations, loop bounds, and pointer arithmetic — continues to produce new exploitable bugs at a rate of roughly one critical vulnerability every three months. The recurring pattern of incomplete fixes suggests the codebase needs systematic hardening rather than point fixes.
Mapping Strategies to CVEs¶
The table below shows which Crucible mutation categories correspond to known vulnerability classes:
| Category | Known Bug Classes | Example CVEs | Key Strategies |
|---|---|---|---|
| Header | Count mismatch, allocation overflow | CVE-2024-21836, CVE-2024-23605 | header.tensor_count, header.metadata_kv_count |
| Metadata | String overflow, type confusion, injection | CVE-2024-23496, CVE-2024-21825, CVE-2024-34359 | metadata.key_length, metadata.string_value, metadata.array |
| TensorInfo | Dimension overflow, offset OOB, type crash | CVE-2024-21802, CVE-2025-53630, CVE-2026-27940, CVE-2026-33298 | tensorinfo.n_dims, tensorinfo.dim_product_overflow, tensorinfo.offset |
| Alignment | Division by zero, padding miscalculation | -- | alignment.padding, metadata.alignment_poison |
| Data | Truncation, overlap, zero-length | -- | data.truncate, data.zero_length |
| Consistency | Count mismatch, size mismatch, offset OOB | CVE-2025-53630, CVE-2026-27940 | consistency.tensor_size, consistency.offset_beyond |
| Model-Loader | Vocab overflow, architecture dispatch crash | CVE-2025-49847, GHSA-7rxv-5jhh-j6xx | model.vocab, model.architecture |
| RPC | Arbitrary read/write, deserialization bypass | CVE-2024-42478, CVE-2024-42479, CVE-2026-34159 | RPC command harness |
| Grammar | Stack overflow, ReDoS | CVE-2026-2069 | Grammar/JSON schema harness |
Undiscovered bugs
The Alignment and Data categories have no public CVEs yet, but the underlying bug patterns (division by zero on alignment=0, null pointer on zero-length data) are well-established vulnerability classes. These categories exist to find the bugs that have not been reported yet.