Known CVEs¶

Known vulnerabilities across the llama.cpp ecosystem — GGUF parsers, RPC backend, grammar engines, tokenizers, and downstream projects. These are the real-world bugs that Crucible's mutation strategies are designed to find.

Crucible's own discoveries

For vulnerabilities discovered by Crucible's fuzzing campaigns, see Crucible Findings.

Underexplored attack surface

Community security research (Huntr, Protect AI) identified GGUF as the least fuzzed ML model format compared to ONNX, SafeTensors, and others. The format's complexity — variable-length metadata, multiple quantization types, alignment requirements — creates a large attack surface that has not been systematically tested. Despite this, GGUF parsing has already yielded 20+ CVEs since January 2024.

GGUF Parser Vulnerabilities¶

Bugs in gguf_init_from_file() and related parsing functions in ggml/src/gguf.cpp. These affect every application that loads GGUF files.

CVE	GHSA	CVSS	Type	Component	Discoverer
CVE-2024-21825	--	8.8	Integer Overflow	`GGUF_TYPE_ARRAY`/`GGUF_TYPE_STRING` parsing	Cisco Talos
CVE-2024-23496	--	8.8	Heap Buffer Overflow	`gguf_fread_str()` string length	Cisco Talos
CVE-2024-21802	--	8.8	Heap Buffer Overflow	`n_dims` > `GGML_MAX_DIMS` array overwrite	Cisco Talos
CVE-2024-21836	--	8.8	Integer Overflow	`n_tensors` allocation sizing	Cisco Talos
CVE-2024-23605	--	8.8	Integer Overflow	`n_kv` allocation sizing	Cisco Talos
CVE-2024-25664	--	--	Heap Buffer Overflow	GGUF metadata validation	Databricks
CVE-2024-25665	--	--	Heap Buffer Overflow	GGUF metadata validation	Databricks
CVE-2024-25666	--	--	Heap Buffer Overflow	GGUF metadata validation	Databricks
CVE-2025-53630	GHSA-vgg9-87g3-85w8	--	Integer Overflow	Cumulative tensor size `ctx->size`	--
CVE-2026-27940	GHSA-3p4r-fq3f-q74v	--	Integer Overflow	Bypass of CVE-2025-53630 fix	--
CVE-2026-33298	GHSA-96jg-mvhq-q7q7	--	Integer Overflow	`ggml_nbytes()` dimension product	--

RPC Backend Vulnerabilities¶

Bugs in ggml-rpc.cpp, the network backend for distributed inference. The RPC backend ships with no authentication — the security policy states "do not use on untrusted networks."

CVE	GHSA	CVSS	Type	Component	Discoverer
CVE-2024-42477	GHSA-mqp6-7pv6-fqjf	--	Global Buffer Overflow	`ggml_type_size` lookup	360 VRI
CVE-2024-42478	GHSA-5vm9-p64x-gqw9	9.8	Arbitrary Address Read	User-controlled `rpc_tensor.data` pointer	360 VRI
CVE-2024-42479	GHSA-wcr5-566p-9cwj	9.8	Write-What-Where → RCE	`ggml_backend_buffer::iface` callback overwrite	360 VRI
CVE-2026-34159	GHSA-j8rj-fmpv-wcxw	9.8	Unauthenticated RCE	`GRAPH_COMPUTE` buffer=0 deserialization bypass	--

Upstream RPC Fixes (Not CVE-Assigned)¶

These fixes address security-relevant bugs in ggml-rpc that were patched without CVE assignment. They form critical context for CRUCIBLE-2026-004 through 006.

Commit	PR	Date	Type	Component	Notes
`1d20e53c4`	ggml/1103	2025-02	OOB Write → RCE	`copy_tensor`	First ggml-rpc security fix
`2bcdddd5e`	#20712	2026-03-21	Div-by-Zero (DoS)	`deserialize_tensor` type/blck_size	Independently reported via GHSA; fixes CRUCIBLE-2026-004 RPC vector
`39bf0d3c6`	#20908	2026-03-23	Null Deref → RCE	`create_node` null-buffer check
`ba38f3bec`	#21030	2026-03-25	Data Pointer Handling	`deserialize_tensor` data field

Grammar, Tokenizer, and Server Vulnerabilities¶

Bugs in text processing components: GBNF grammar parsing, tokenizer vocabulary handling, and llama-server request processing.

CVE	GHSA	CVSS	Type	Component	Discoverer
CVE-2026-2069	--	--	Stack Buffer Overflow	GBNF grammar handler	--
CVE-2025-49847	GHSA-8wwf-w4qm-gpqr	--	Buffer Overflow	`token_to_piece()` `size_t`→`int32_t` cast	--
--	GHSA-7rxv-5jhh-j6xx	--	Heap Buffer Overflow	Tokenizer signed/unsigned overflow	--
--	GHSA-8947-pfff-2f3c	--	OOB Write	llama-server negative `n_discard` context shift	--

Ecosystem Vulnerabilities¶

Bugs in downstream projects that wrap llama.cpp or share its ggml parsing code. These demonstrate that the attack surface extends beyond the core library.

CVE	CVSS	Type	Project	Component	Discoverer
CVE-2024-34359	9.7	SSTI → RCE	llama-cpp-python	Jinja2 chat templates via GGUF metadata	retr0reg
CVE-2024-37032	--	Path Traversal → RCE	Ollama	Model pull digest validation	Wiz Research
CVE-2025-14569	--	Use-After-Free	whisper.cpp	`read_audio_data()`	--

Why GGUF Bugs Propagate¶

Inherited attack surface

Every application that loads GGUF files inherits the parsing vulnerabilities of its underlying library. A heap overflow in llama.cpp is simultaneously a heap overflow in every tool built on top of it.

The GGUF ecosystem has a single-library dependency pattern:

llama.cpp / ggml (C/C++ GGUF parser)
  |
  +-- Ollama (wraps llama.cpp via cgo) — 175,000+ publicly-exposed servers
  +-- llama-cpp-python (Python bindings)
  +-- LM Studio (embeds llama.cpp)
  +-- koboldcpp (fork of llama.cpp)
  +-- LocalAI (wraps llama.cpp)
  +-- GPT4All (uses llama.cpp backend)
  +-- text-generation-webui (gguf loader)
  +-- vLLM (optional GGUF support)
  |
  Shared ggml code (separate projects, same parser):
  +-- whisper.cpp (speech recognition)
  +-- stable-diffusion.cpp (image generation)

A single vulnerability in gguf_init_from_file() or ggml_nbytes() is exploitable across all of these tools. Users who download GGUF models from public repositories (Hugging Face, etc.) are exposed to malicious files that trigger these bugs.

The dependency tree extends beyond llama.cpp — whisper.cpp and stable-diffusion.cpp share the ggml library and inherit the same GGUF parser vulnerabilities. CVE-2024-34359 ("Llama Drama") demonstrated that 6,000+ models on Hugging Face were potentially affected by a single vulnerability in llama-cpp-python. SentinelOne and Censys identified 175,000 publicly-exposed Ollama servers across 130 countries as of January 2026.

Vulnerability Details¶

GGUF Parser — Cisco Talos Batch (January 2024)¶

Francesco Benvenuto of Cisco Talos discovered five heap-based buffer overflow vulnerabilities in gguf_init_from_file() in ggml.c, all disclosed February 26, 2024, all rated CVSS 8.8:

CVE-2024-21825 / TALOS-2024-1912 — Array/String Integer Overflow¶

Root cause: Integer overflow in GGUF_TYPE_ARRAY/GGUF_TYPE_STRING parsing. The multiplication kv->value.arr.n * sizeof(struct gguf_str) overflows, allocating less memory than needed. Subsequent element parsing writes past the buffer.
Crucible strategies: metadata.array, metadata.int_overflow

CVE-2024-23496 / TALOS-2024-1913 — String Length Overflow¶

Root cause: In gguf_fread_str(), calloc(p->n + 1, 1) wraps when p->n is UINT64_MAX, causing a tiny allocation followed by a large write from the file.
Crucible strategies: metadata.key_length, metadata.string_value, metadata.string_truncated

CVE-2024-21802 / TALOS-2024-1914 — Dimension Count Array Overwrite¶

Root cause: n_dims is an arbitrary uint32_t from the file, used to iterate info->ne[j] beyond the fixed 4-element array (GGML_MAX_DIMS = 4). An n_dims > 4 writes adjacent struct fields.
Crucible strategies: tensorinfo.n_dims

CVE-2024-21836 / TALOS-2024-1915 — Tensor Count Allocation Overflow¶

Root cause: header.n_tensors * sizeof(gguf_tensor_info) overflows, under-allocating the tensor info array. The parsing loop then writes past the end.
Crucible strategies: header.tensor_count, consistency.tensor_count

CVE-2024-23605 / TALOS-2024-1916 — KV Count Allocation Overflow¶

Root cause: Same pattern as TALOS-2024-1915 but for header.n_kv * sizeof(gguf_kv). Under-allocated KV array leads to heap buffer overflow during metadata parsing.
Crucible strategies: header.metadata_kv_count, consistency.metadata_count

GGUF Parser — Databricks (January 2024)¶

Databricks independently reported three additional heap overflows from insufficient validation of GGUF metadata fields (CVE-2024-25664, CVE-2024-25665, CVE-2024-25666). These were parallel discoveries to the Cisco Talos findings.

Metadata injection

Databricks also documented a format-level design issue: the GGUF format allows arbitrary key-value metadata with no schema validation. Attackers can inject metadata keys that downstream tools interpret as trusted configuration (e.g., system prompts, execution parameters). Crucible targets this with metadata.key_shadow, metadata.add_extra, and metadata.key_content.

GGUF Parser — Integer Overflow Chain (2025–2026)¶

The most revealing pattern in GGUF parser security is the recurring cycle of overflow discovery, patch, and bypass. Each fix addresses the specific reported attack vector but misses the same pattern in adjacent code.

CVE-2025-53630 / GHSA-vgg9-87g3-85w8 — Cumulative Tensor Size Overflow¶

Root cause: Integer overflow in ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment) during the tensor size accumulation loop in gguf_init_from_file_impl(). The wrapped ctx->size leads to a tiny allocation at the data buffer, followed by out-of-bounds pointer assignments for tensor data.
Patch: SIZE_MAX - ctx->size < padded_size guard (commit 26a48ad).
Crucible strategies: tensorinfo.dim_product_overflow, consistency.tensor_size, consistency.offset_beyond

CVE-2026-27940 / GHSA-3p4r-fq3f-q74v — Bypass of CVE-2025-53630 Fix¶

Root cause: The CVE-2025-53630 patch addressed only one code path. Other allocation-size calculations in the same function remained vulnerable to the identical overflow pattern, allowing 528+ bytes of controlled data past the buffer boundary.
Patch: Extend overflow guard to all accumulation paths (commit b8146).
Crucible strategies: Same as CVE-2025-53630 — combined mutations across tensorinfo and consistency categories

CVE-2026-33298 / GHSA-96jg-mvhq-q7q7 — ggml_nbytes() Integer Overflow¶

Root cause: ggml_nbytes() itself can integer-overflow when computing tensor data size from dimensions and quantization type. Crafted tensor dimensions cause the function to return a drastically undersized value (e.g., 4 MB instead of exabytes), corrupting the input to the CVE-2025-53630 overflow check before it even executes.
Patch: Overflow-checked multiplication in ggml_nbytes() (commit b7824).
Crucible strategies: tensorinfo.dim_product_overflow, tensorinfo.dim_overflow, consistency.tensor_size

RPC Backend (August 2024 – 2026)¶

7resp4ss and Guang Gong from 360 Vulnerability Research Institute discovered three critical vulnerabilities in ggml-rpc.cpp, fixed in version b3561. A fourth was found independently in 2026. Three of the four highest-severity CVEs in the entire llama.cpp ecosystem are in the RPC backend.

CVE-2024-42477 / GHSA-mqp6-7pv6-fqjf — Global Buffer Overflow¶

Root cause: Out-of-bounds read in ggml_type_size lookup table via attacker-controlled tensor type field in RPC messages.

CVE-2024-42478 / GHSA-5vm9-p64x-gqw9 — Arbitrary Address Read (CVSS 9.8)¶

Root cause: The data field in rpc_tensor is a user-controlled pointer. The RPC server dereferences it directly to read memory at any address the attacker specifies.

CVE-2024-42479 / GHSA-wcr5-566p-9cwj — Write-What-Where → Full RCE (CVSS 9.8)¶

Root cause: Same user-controlled pointer mechanism as CVE-2024-42478 but for write operations. The original researchers chained this with CVE-2024-42478 into a full RCE exploit by overwriting ggml_backend_buffer::iface function pointer callbacks.

CVE-2026-34159 / GHSA-j8rj-fmpv-wcxw — GRAPH_COMPUTE Deserialization Bypass (CVSS 9.8)¶

Root cause: Unauthenticated RCE via the GRAPH_COMPUTE command. When buffer=0, deserialize_tensor() skips bounds validation entirely, enabling arbitrary memory read/write with full ASLR bypass.

No authentication

The ggml-rpc backend has no authentication mechanism. The project's security policy states it should not be used on untrusted networks, yet it listens on all interfaces by default.

Grammar, Tokenizer, and Server¶

CVE-2026-2069 — GBNF Grammar Stack Buffer Overflow¶

Root cause: Stack-based buffer overflow in the GBNF grammar handler during grammar parsing.
Crucible strategies: Targeted by the grammar harness via crafted GBNF input.

CVE-2025-49847 / GHSA-8wwf-w4qm-gpqr — Vocabulary Buffer Overflow¶

Root cause: In llama_vocab::impl::token_to_piece(), a size_t to int32_t cast bypasses bounds checking, enabling a buffer overflow during vocabulary loading via crafted GGUF files.
Crucible strategies: model.vocab

GHSA-7rxv-5jhh-j6xx — Tokenizer Signed/Unsigned Heap Overflow¶

Root cause: Signed/unsigned mismatch in llama_vocab::tokenize() leads to a heap buffer overflow during tokenization.
Crucible strategies: model.vocab

GHSA-8947-pfff-2f3c — Server OOB Write via Negative n_discard¶

Root cause: A negative n_discard value during context shift in llama-server causes an out-of-bounds write.

std::regex is fundamentally unsafe

GCC Bug #61582 documents stack overflow in std::regex due to recursive processing. GCC developers acknowledged the implementation is "unlikely to ever be fast or efficient, due to ABI compatibility reasons." Any C++ LLM infrastructure using std::regex — including llama.cpp's tokenizer pre-tokenization patterns — is vulnerable to ReDoS and crash attacks. RE2 (deterministic finite automata, linear time) is the recommended alternative.

Ecosystem¶

CVE-2024-34359 — "Llama Drama" (CVSS 9.7)¶

Project: llama-cpp-python
Root cause: Server-side template injection (SSTI) through malicious Jinja2 chat templates embedded in GGUF metadata. Remote code execution via eval() in the template engine. Over 6,000 models on Hugging Face potentially affected.
Discoverer: retr0reg. JFrog published detailed analysis.
Crucible strategies: metadata.key_content, metadata.key_shadow (metadata injection vectors)

CVE-2024-37032 — "Probllama"¶

Project: Ollama
Root cause: Path traversal via model pull digest validation. Crafted model manifests write arbitrary files to the host filesystem, achieving RCE.
Discoverer: Wiz Research

CVE-2025-14569 — whisper.cpp Use-After-Free¶

Project: whisper.cpp
Root cause: Use-after-free in read_audio_data() affecting versions 1.8.0–1.8.2.

The Patch Bypass Pattern¶

The GGUF parser's vulnerability history demonstrates a recurring cycle: overflow discovered → point fix applied → same pattern found in adjacent code → deeper bypass found. This pattern is the strongest argument for continuous structure-aware fuzzing over manual code auditing.

Jan 2024    5 integer overflows found by Cisco Talos
            (CVE-2024-21825, -21802, -21836, -23496, -23605)
            → All patched same day. All were allocation-size overflows.

Jul 2025    CVE-2025-53630 — same overflow class in ctx->size accumulation
            → Patch: SIZE_MAX guard added (commit 26a48ad)

Mar 2026    CVE-2026-27940 — bypass of CVE-2025-53630 fix
            Original patch missed alternate code paths with identical overflow
            → Patch: extend guard to all paths (commit b8146)

Mar 2026    CVE-2026-33298 — ggml_nbytes() itself overflows BEFORE the guard
            Crafted dimensions cause ggml_nbytes() to return 4 MB instead of
            exabytes, corrupting the input to the overflow check
            → Patch: overflow-checked multiplication (commit b7824)

Rate of discovery

Despite 12+ CVE-class fixes since January 2024, the function's fundamental architecture — reading attacker-controlled values and using them in size calculations, loop bounds, and pointer arithmetic — continues to produce new exploitable bugs at a rate of roughly one critical vulnerability every three months. The recurring pattern of incomplete fixes suggests the codebase needs systematic hardening rather than point fixes.

Mapping Strategies to CVEs¶

The table below shows which Crucible mutation categories correspond to known vulnerability classes:

Category	Known Bug Classes	Example CVEs	Key Strategies
Header	Count mismatch, allocation overflow	CVE-2024-21836, CVE-2024-23605	`header.tensor_count`, `header.metadata_kv_count`
Metadata	String overflow, type confusion, injection	CVE-2024-23496, CVE-2024-21825, CVE-2024-34359	`metadata.key_length`, `metadata.string_value`, `metadata.array`
TensorInfo	Dimension overflow, offset OOB, type crash	CVE-2024-21802, CVE-2025-53630, CVE-2026-27940, CVE-2026-33298	`tensorinfo.n_dims`, `tensorinfo.dim_product_overflow`, `tensorinfo.offset`
Alignment	Division by zero, padding miscalculation	--	`alignment.padding`, `metadata.alignment_poison`
Data	Truncation, overlap, zero-length	--	`data.truncate`, `data.zero_length`
Consistency	Count mismatch, size mismatch, offset OOB	CVE-2025-53630, CVE-2026-27940	`consistency.tensor_size`, `consistency.offset_beyond`
Model-Loader	Vocab overflow, architecture dispatch crash	CVE-2025-49847, GHSA-7rxv-5jhh-j6xx	`model.vocab`, `model.architecture`
RPC	Arbitrary read/write, deserialization bypass	CVE-2024-42478, CVE-2024-42479, CVE-2026-34159	RPC command harness
Grammar	Stack overflow, ReDoS	CVE-2026-2069	Grammar/JSON schema harness

Undiscovered bugs

The Alignment and Data categories have no public CVEs yet, but the underlying bug patterns (division by zero on alignment=0, null pointer on zero-length data) are well-established vulnerability classes. These categories exist to find the bugs that have not been reported yet.