Skip to content

Known CVEs

Known vulnerabilities across the llama.cpp ecosystem — GGUF parsers, RPC backend, grammar engines, tokenizers, and downstream projects. These are the real-world bugs that Crucible's mutation strategies are designed to find.

Crucible's own discoveries

For vulnerabilities discovered by Crucible's fuzzing campaigns, see Crucible Findings.

Underexplored attack surface

Community security research (Huntr, Protect AI) identified GGUF as the least fuzzed ML model format compared to ONNX, SafeTensors, and others. The format's complexity — variable-length metadata, multiple quantization types, alignment requirements — creates a large attack surface that has not been systematically tested. Despite this, GGUF parsing has already yielded 20+ CVEs since January 2024.


GGUF Parser Vulnerabilities

Bugs in gguf_init_from_file() and related parsing functions in ggml/src/gguf.cpp. These affect every application that loads GGUF files.

CVE GHSA CVSS Type Component Discoverer
CVE-2024-21825 -- 8.8 Integer Overflow GGUF_TYPE_ARRAY/GGUF_TYPE_STRING parsing Cisco Talos
CVE-2024-23496 -- 8.8 Heap Buffer Overflow gguf_fread_str() string length Cisco Talos
CVE-2024-21802 -- 8.8 Heap Buffer Overflow n_dims > GGML_MAX_DIMS array overwrite Cisco Talos
CVE-2024-21836 -- 8.8 Integer Overflow n_tensors allocation sizing Cisco Talos
CVE-2024-23605 -- 8.8 Integer Overflow n_kv allocation sizing Cisco Talos
CVE-2024-25664 -- -- Heap Buffer Overflow GGUF metadata validation Databricks
CVE-2024-25665 -- -- Heap Buffer Overflow GGUF metadata validation Databricks
CVE-2024-25666 -- -- Heap Buffer Overflow GGUF metadata validation Databricks
CVE-2025-53630 GHSA-vgg9-87g3-85w8 -- Integer Overflow Cumulative tensor size ctx->size --
CVE-2026-27940 GHSA-3p4r-fq3f-q74v -- Integer Overflow Bypass of CVE-2025-53630 fix --
CVE-2026-33298 GHSA-96jg-mvhq-q7q7 -- Integer Overflow ggml_nbytes() dimension product --

RPC Backend Vulnerabilities

Bugs in ggml-rpc.cpp, the network backend for distributed inference. The RPC backend ships with no authentication — the security policy states "do not use on untrusted networks."

CVE GHSA CVSS Type Component Discoverer
CVE-2024-42477 GHSA-mqp6-7pv6-fqjf -- Global Buffer Overflow ggml_type_size lookup 360 VRI
CVE-2024-42478 GHSA-5vm9-p64x-gqw9 9.8 Arbitrary Address Read User-controlled rpc_tensor.data pointer 360 VRI
CVE-2024-42479 GHSA-wcr5-566p-9cwj 9.8 Write-What-Where → RCE ggml_backend_buffer::iface callback overwrite 360 VRI
CVE-2026-34159 GHSA-j8rj-fmpv-wcxw 9.8 Unauthenticated RCE GRAPH_COMPUTE buffer=0 deserialization bypass --

Upstream RPC Fixes (Not CVE-Assigned)

These fixes address security-relevant bugs in ggml-rpc that were patched without CVE assignment. They form critical context for CRUCIBLE-2026-004 through 006.

Commit PR Date Type Component Notes
1d20e53c4 ggml/1103 2025-02 OOB Write → RCE copy_tensor First ggml-rpc security fix
2bcdddd5e #20712 2026-03-21 Div-by-Zero (DoS) deserialize_tensor type/blck_size Independently reported via GHSA; fixes CRUCIBLE-2026-004 RPC vector
39bf0d3c6 #20908 2026-03-23 Null Deref → RCE create_node null-buffer check
ba38f3bec #21030 2026-03-25 Data Pointer Handling deserialize_tensor data field

Grammar, Tokenizer, and Server Vulnerabilities

Bugs in text processing components: GBNF grammar parsing, tokenizer vocabulary handling, and llama-server request processing.

CVE GHSA CVSS Type Component Discoverer
CVE-2026-2069 -- -- Stack Buffer Overflow GBNF grammar handler --
CVE-2025-49847 GHSA-8wwf-w4qm-gpqr -- Buffer Overflow token_to_piece() size_tint32_t cast --
-- GHSA-7rxv-5jhh-j6xx -- Heap Buffer Overflow Tokenizer signed/unsigned overflow --
-- GHSA-8947-pfff-2f3c -- OOB Write llama-server negative n_discard context shift --

Ecosystem Vulnerabilities

Bugs in downstream projects that wrap llama.cpp or share its ggml parsing code. These demonstrate that the attack surface extends beyond the core library.

CVE CVSS Type Project Component Discoverer
CVE-2024-34359 9.7 SSTI → RCE llama-cpp-python Jinja2 chat templates via GGUF metadata retr0reg
CVE-2024-37032 -- Path Traversal → RCE Ollama Model pull digest validation Wiz Research
CVE-2025-14569 -- Use-After-Free whisper.cpp read_audio_data() --

Why GGUF Bugs Propagate

Inherited attack surface

Every application that loads GGUF files inherits the parsing vulnerabilities of its underlying library. A heap overflow in llama.cpp is simultaneously a heap overflow in every tool built on top of it.

The GGUF ecosystem has a single-library dependency pattern:

llama.cpp / ggml (C/C++ GGUF parser)
  |
  +-- Ollama (wraps llama.cpp via cgo) — 175,000+ publicly-exposed servers
  +-- llama-cpp-python (Python bindings)
  +-- LM Studio (embeds llama.cpp)
  +-- koboldcpp (fork of llama.cpp)
  +-- LocalAI (wraps llama.cpp)
  +-- GPT4All (uses llama.cpp backend)
  +-- text-generation-webui (gguf loader)
  +-- vLLM (optional GGUF support)
  |
  Shared ggml code (separate projects, same parser):
  +-- whisper.cpp (speech recognition)
  +-- stable-diffusion.cpp (image generation)

A single vulnerability in gguf_init_from_file() or ggml_nbytes() is exploitable across all of these tools. Users who download GGUF models from public repositories (Hugging Face, etc.) are exposed to malicious files that trigger these bugs.

The dependency tree extends beyond llama.cpp — whisper.cpp and stable-diffusion.cpp share the ggml library and inherit the same GGUF parser vulnerabilities. CVE-2024-34359 ("Llama Drama") demonstrated that 6,000+ models on Hugging Face were potentially affected by a single vulnerability in llama-cpp-python. SentinelOne and Censys identified 175,000 publicly-exposed Ollama servers across 130 countries as of January 2026.


Vulnerability Details

GGUF Parser — Cisco Talos Batch (January 2024)

Francesco Benvenuto of Cisco Talos discovered five heap-based buffer overflow vulnerabilities in gguf_init_from_file() in ggml.c, all disclosed February 26, 2024, all rated CVSS 8.8:

CVE-2024-21825 / TALOS-2024-1912 — Array/String Integer Overflow

  • Root cause: Integer overflow in GGUF_TYPE_ARRAY/GGUF_TYPE_STRING parsing. The multiplication kv->value.arr.n * sizeof(struct gguf_str) overflows, allocating less memory than needed. Subsequent element parsing writes past the buffer.
  • Crucible strategies: metadata.array, metadata.int_overflow

CVE-2024-23496 / TALOS-2024-1913 — String Length Overflow

  • Root cause: In gguf_fread_str(), calloc(p->n + 1, 1) wraps when p->n is UINT64_MAX, causing a tiny allocation followed by a large write from the file.
  • Crucible strategies: metadata.key_length, metadata.string_value, metadata.string_truncated

CVE-2024-21802 / TALOS-2024-1914 — Dimension Count Array Overwrite

  • Root cause: n_dims is an arbitrary uint32_t from the file, used to iterate info->ne[j] beyond the fixed 4-element array (GGML_MAX_DIMS = 4). An n_dims > 4 writes adjacent struct fields.
  • Crucible strategies: tensorinfo.n_dims

CVE-2024-21836 / TALOS-2024-1915 — Tensor Count Allocation Overflow

  • Root cause: header.n_tensors * sizeof(gguf_tensor_info) overflows, under-allocating the tensor info array. The parsing loop then writes past the end.
  • Crucible strategies: header.tensor_count, consistency.tensor_count

CVE-2024-23605 / TALOS-2024-1916 — KV Count Allocation Overflow

  • Root cause: Same pattern as TALOS-2024-1915 but for header.n_kv * sizeof(gguf_kv). Under-allocated KV array leads to heap buffer overflow during metadata parsing.
  • Crucible strategies: header.metadata_kv_count, consistency.metadata_count

GGUF Parser — Databricks (January 2024)

Databricks independently reported three additional heap overflows from insufficient validation of GGUF metadata fields (CVE-2024-25664, CVE-2024-25665, CVE-2024-25666). These were parallel discoveries to the Cisco Talos findings.

Metadata injection

Databricks also documented a format-level design issue: the GGUF format allows arbitrary key-value metadata with no schema validation. Attackers can inject metadata keys that downstream tools interpret as trusted configuration (e.g., system prompts, execution parameters). Crucible targets this with metadata.key_shadow, metadata.add_extra, and metadata.key_content.

GGUF Parser — Integer Overflow Chain (2025–2026)

The most revealing pattern in GGUF parser security is the recurring cycle of overflow discovery, patch, and bypass. Each fix addresses the specific reported attack vector but misses the same pattern in adjacent code.

CVE-2025-53630 / GHSA-vgg9-87g3-85w8 — Cumulative Tensor Size Overflow

  • Root cause: Integer overflow in ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment) during the tensor size accumulation loop in gguf_init_from_file_impl(). The wrapped ctx->size leads to a tiny allocation at the data buffer, followed by out-of-bounds pointer assignments for tensor data.
  • Patch: SIZE_MAX - ctx->size < padded_size guard (commit 26a48ad).
  • Crucible strategies: tensorinfo.dim_product_overflow, consistency.tensor_size, consistency.offset_beyond

CVE-2026-27940 / GHSA-3p4r-fq3f-q74v — Bypass of CVE-2025-53630 Fix

  • Root cause: The CVE-2025-53630 patch addressed only one code path. Other allocation-size calculations in the same function remained vulnerable to the identical overflow pattern, allowing 528+ bytes of controlled data past the buffer boundary.
  • Patch: Extend overflow guard to all accumulation paths (commit b8146).
  • Crucible strategies: Same as CVE-2025-53630 — combined mutations across tensorinfo and consistency categories

CVE-2026-33298 / GHSA-96jg-mvhq-q7q7 — ggml_nbytes() Integer Overflow

  • Root cause: ggml_nbytes() itself can integer-overflow when computing tensor data size from dimensions and quantization type. Crafted tensor dimensions cause the function to return a drastically undersized value (e.g., 4 MB instead of exabytes), corrupting the input to the CVE-2025-53630 overflow check before it even executes.
  • Patch: Overflow-checked multiplication in ggml_nbytes() (commit b7824).
  • Crucible strategies: tensorinfo.dim_product_overflow, tensorinfo.dim_overflow, consistency.tensor_size

RPC Backend (August 2024 – 2026)

7resp4ss and Guang Gong from 360 Vulnerability Research Institute discovered three critical vulnerabilities in ggml-rpc.cpp, fixed in version b3561. A fourth was found independently in 2026. Three of the four highest-severity CVEs in the entire llama.cpp ecosystem are in the RPC backend.

CVE-2024-42477 / GHSA-mqp6-7pv6-fqjf — Global Buffer Overflow

  • Root cause: Out-of-bounds read in ggml_type_size lookup table via attacker-controlled tensor type field in RPC messages.

CVE-2024-42478 / GHSA-5vm9-p64x-gqw9 — Arbitrary Address Read (CVSS 9.8)

  • Root cause: The data field in rpc_tensor is a user-controlled pointer. The RPC server dereferences it directly to read memory at any address the attacker specifies.

CVE-2024-42479 / GHSA-wcr5-566p-9cwj — Write-What-Where → Full RCE (CVSS 9.8)

  • Root cause: Same user-controlled pointer mechanism as CVE-2024-42478 but for write operations. The original researchers chained this with CVE-2024-42478 into a full RCE exploit by overwriting ggml_backend_buffer::iface function pointer callbacks.

CVE-2026-34159 / GHSA-j8rj-fmpv-wcxw — GRAPH_COMPUTE Deserialization Bypass (CVSS 9.8)

  • Root cause: Unauthenticated RCE via the GRAPH_COMPUTE command. When buffer=0, deserialize_tensor() skips bounds validation entirely, enabling arbitrary memory read/write with full ASLR bypass.

No authentication

The ggml-rpc backend has no authentication mechanism. The project's security policy states it should not be used on untrusted networks, yet it listens on all interfaces by default.

Grammar, Tokenizer, and Server

CVE-2026-2069 — GBNF Grammar Stack Buffer Overflow

  • Root cause: Stack-based buffer overflow in the GBNF grammar handler during grammar parsing.
  • Crucible strategies: Targeted by the grammar harness via crafted GBNF input.

CVE-2025-49847 / GHSA-8wwf-w4qm-gpqr — Vocabulary Buffer Overflow

  • Root cause: In llama_vocab::impl::token_to_piece(), a size_t to int32_t cast bypasses bounds checking, enabling a buffer overflow during vocabulary loading via crafted GGUF files.
  • Crucible strategies: model.vocab

GHSA-7rxv-5jhh-j6xx — Tokenizer Signed/Unsigned Heap Overflow

  • Root cause: Signed/unsigned mismatch in llama_vocab::tokenize() leads to a heap buffer overflow during tokenization.
  • Crucible strategies: model.vocab

GHSA-8947-pfff-2f3c — Server OOB Write via Negative n_discard

  • Root cause: A negative n_discard value during context shift in llama-server causes an out-of-bounds write.

std::regex is fundamentally unsafe

GCC Bug #61582 documents stack overflow in std::regex due to recursive processing. GCC developers acknowledged the implementation is "unlikely to ever be fast or efficient, due to ABI compatibility reasons." Any C++ LLM infrastructure using std::regex — including llama.cpp's tokenizer pre-tokenization patterns — is vulnerable to ReDoS and crash attacks. RE2 (deterministic finite automata, linear time) is the recommended alternative.

Ecosystem

CVE-2024-34359 — "Llama Drama" (CVSS 9.7)

  • Project: llama-cpp-python
  • Root cause: Server-side template injection (SSTI) through malicious Jinja2 chat templates embedded in GGUF metadata. Remote code execution via eval() in the template engine. Over 6,000 models on Hugging Face potentially affected.
  • Discoverer: retr0reg. JFrog published detailed analysis.
  • Crucible strategies: metadata.key_content, metadata.key_shadow (metadata injection vectors)

CVE-2024-37032 — "Probllama"

  • Project: Ollama
  • Root cause: Path traversal via model pull digest validation. Crafted model manifests write arbitrary files to the host filesystem, achieving RCE.
  • Discoverer: Wiz Research

CVE-2025-14569 — whisper.cpp Use-After-Free

  • Project: whisper.cpp
  • Root cause: Use-after-free in read_audio_data() affecting versions 1.8.0–1.8.2.

The Patch Bypass Pattern

The GGUF parser's vulnerability history demonstrates a recurring cycle: overflow discovered → point fix applied → same pattern found in adjacent code → deeper bypass found. This pattern is the strongest argument for continuous structure-aware fuzzing over manual code auditing.

Jan 2024    5 integer overflows found by Cisco Talos
            (CVE-2024-21825, -21802, -21836, -23496, -23605)
            → All patched same day. All were allocation-size overflows.

Jul 2025    CVE-2025-53630 — same overflow class in ctx->size accumulation
            → Patch: SIZE_MAX guard added (commit 26a48ad)

Mar 2026    CVE-2026-27940 — bypass of CVE-2025-53630 fix
            Original patch missed alternate code paths with identical overflow
            → Patch: extend guard to all paths (commit b8146)

Mar 2026    CVE-2026-33298 — ggml_nbytes() itself overflows BEFORE the guard
            Crafted dimensions cause ggml_nbytes() to return 4 MB instead of
            exabytes, corrupting the input to the overflow check
            → Patch: overflow-checked multiplication (commit b7824)

Rate of discovery

Despite 12+ CVE-class fixes since January 2024, the function's fundamental architecture — reading attacker-controlled values and using them in size calculations, loop bounds, and pointer arithmetic — continues to produce new exploitable bugs at a rate of roughly one critical vulnerability every three months. The recurring pattern of incomplete fixes suggests the codebase needs systematic hardening rather than point fixes.


Mapping Strategies to CVEs

The table below shows which Crucible mutation categories correspond to known vulnerability classes:

Category Known Bug Classes Example CVEs Key Strategies
Header Count mismatch, allocation overflow CVE-2024-21836, CVE-2024-23605 header.tensor_count, header.metadata_kv_count
Metadata String overflow, type confusion, injection CVE-2024-23496, CVE-2024-21825, CVE-2024-34359 metadata.key_length, metadata.string_value, metadata.array
TensorInfo Dimension overflow, offset OOB, type crash CVE-2024-21802, CVE-2025-53630, CVE-2026-27940, CVE-2026-33298 tensorinfo.n_dims, tensorinfo.dim_product_overflow, tensorinfo.offset
Alignment Division by zero, padding miscalculation -- alignment.padding, metadata.alignment_poison
Data Truncation, overlap, zero-length -- data.truncate, data.zero_length
Consistency Count mismatch, size mismatch, offset OOB CVE-2025-53630, CVE-2026-27940 consistency.tensor_size, consistency.offset_beyond
Model-Loader Vocab overflow, architecture dispatch crash CVE-2025-49847, GHSA-7rxv-5jhh-j6xx model.vocab, model.architecture
RPC Arbitrary read/write, deserialization bypass CVE-2024-42478, CVE-2024-42479, CVE-2026-34159 RPC command harness
Grammar Stack overflow, ReDoS CVE-2026-2069 Grammar/JSON schema harness

Undiscovered bugs

The Alignment and Data categories have no public CVEs yet, but the underlying bug patterns (division by zero on alignment=0, null pointer on zero-length data) are well-established vulnerability classes. These categories exist to find the bugs that have not been reported yet.