Skip to content

Injection vs. Model Scale

Page embargoed pending paper publication

The detailed analysis of how prompt-injection rates vary across model scale (7B → 72B), framework adaptation, and cross-model mismatch is part of an ongoing research paper currently under peer review. The full results — including paired-replay claim-grade evidence across the scale ladder — will be published in the paper and re-summarized on this page after publication.

Until then, this page holds a brief qualitative summary only. The numerical effect sizes that previously appeared here were historical exploratory sweeps (pre-fix epoch, April 2-12, 2026) that the paper itself does not cite as evidence; they have been moved out of the public repository while the paper is under review.

What this page will cover (after publication)

  • Baseline injection rates across the scale ladder (7B BF16 → 32B BF16 → 72B AWQ-4bit → 72B FP8-dynamic)
  • How transfer of 7B-tuned Bayesian-optimized parameters behaves up the scale curve
  • Per-framework variance at each scale rung
  • Cross-family generalization probe (Llama 3.1 8B)

Qualitative observations that hold

The following qualitative observations are paper-track findings; specific magnitudes are deferred to the paper:

  • Scale matters but is not protective. Larger models do not uniformly resist corpus-poisoning attacks; the relationship is non-monotonic across categories.
  • Framework choice is consequential. The four canonical RAG frameworks (LangChain, LlamaIndex, Unstructured, Haystack) differ measurably in how much they amplify or suppress poisoned-document influence on the model's output.
  • Retrieval and injection decouple. Embedding-similarity optimization that improves retrieval ranking does not consistently translate to improved end-to-end injection success.

Reproduce locally

While the headline analysis is embargoed, the infrastructure for reproducing it is public:

You can run your own scale sweep using these primitives against any models and frameworks you have access to.