Research Foundation¶
hemlock operationalizes two peer-reviewed research programs into a practical security testing tool. This page summarizes the key findings from each paper, explains how hemlock translates those findings into capabilities, and provides context on the broader RAG security research landscape.
PoisonedRAG¶
Full title: PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
Authors: Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia
Venue: USENIX Security 2025
Threat Model¶
PoisonedRAG considers an attacker who can inject a small number of crafted documents into a RAG system's knowledge base. The attacker does not need access to the LLM, the embedding model, or the retrieval infrastructure. The only requirement is the ability to add documents to the corpus that the RAG pipeline indexes---a realistic threat given that many knowledge bases accept contributions from multiple users, ingest documents from shared drives, or scrape external sources.
Methodology¶
The paper develops two attack variants:
- Black-box attack. The attacker has no knowledge of the embedding model or retrieval algorithm. Crafted documents are optimized to be semantically relevant to target queries using only the query text itself.
- White-box attack. The attacker knows the embedding model and can optimize document content to maximize cosine similarity with target query embeddings.
Both variants generate documents that contain legitimate-looking text alongside hidden prompt injection payloads. When retrieved, these documents provide context that overrides the LLM's behavior.
Key Results¶
- Achieved over 90% attack success rate with as few as 5 injected documents across multiple LLMs (GPT-4, LLaMA-2, Vicuna).
- The black-box attack---requiring no model access---remained effective across different embedding models and retrieval configurations.
- Existing RAG defenses (perplexity filtering, paraphrasing, duplicate detection) provided limited protection against the attack.
Relevance to hemlock¶
hemlock's document generation pipeline directly implements the PoisonedRAG methodology:
- Cover text generation produces semantically relevant documents that rank highly in retrieval results.
- Query-targeted optimization (
--target-query) enriches cover text with keywords from the target retrieval query, implementing the black-box attack's semantic relevance optimization. - Similarity scoring (
--embed-provider) computes cosine similarity between query and document embeddings using OpenAI or Ollama, providing quantitative retrieval effectiveness metrics. - Payload categories (override, exfiltrate, redirect, denial) cover the attack objectives studied in the paper.
- Format diversity extends the attack surface beyond the plain-text documents used in the original research.
PhantomText¶
Full title: The Hidden Threat in Plain Text: Attacking RAG Data Loaders
Authors: Alberto Castagnaro, Umberto Salviati, Mauro Conti, Luca Pajola, Simeone Pizzi
Venue: AISec '25 --- 18th ACM Workshop on Artificial Intelligence and Security, co-located with ACM CCS 2025, Taipei, Taiwan. DOI: 10.1145/3733799.3762976
Toolkit: PhantomText on PyPI (pip install phantomtext) and experiment notebooks
Research Scope¶
PhantomText exposes a critical security gap at the data loading stage of RAG pipelines. The paper introduces two threat vector families---Content Obfuscation (disrupting existing text with invisible characters) and Content Injection (inserting new hidden content)---targeting three common document formats: PDF, DOCX, and HTML.
The research tested 5 data loaders: Docling, Haystack, LangChain, LlamaIndex, and LLMSherpa, evaluating 21 distinct parsers across those frameworks. It further validated attacks on 6 end-to-end RAG systems: 3 white-box (Llama 3.2 3B, Gemma 3 27B, and DeepSeek R1 via LangChain + Chroma) and 3 black-box (OpenAI GPT-4o, OpenAI o3-mini, and Google NotebookLM).
Methodology¶
The paper catalogs 19 technique variants organized into three families:
- Content Obfuscation: Diacritical marks, homoglyph substitution, OCR poisoning, bidirectional (Bidi) reordering, zero-width characters (3 mask variants)
- Content Injection: Camouflage element (text behind images), metadata injection, out-of-bound text (2 variants), transparent text (5 variants: background-color, opacity-0, opacity-0.01, visibility-hidden, DOCX vanish), zero-size font (3 variants)
- Two-fold (obfuscation + injection): Font poisoning --- custom cmap tables that map displayed glyphs to different underlying characters
The evaluation dataset comprised 4,200 poisoned documents generated from 100 Amazon Review samples, producing 35,900 individual parser evaluations.
Key Results¶
- Achieved a 74.4% overall attack success rate across 357 technique---format---loader scenarios.
- 238 of 375 tests yielded a success rate above 95%.
- Per-loader ranking: LangChain was the most vulnerable (ASR 0.83 injection, 0.85 obfuscation); LlamaIndex was the most resistant (ASR 0.63 obfuscation, 0.67 injection). Docling, LLMSherpa, and Haystack fell in the mid-range.
- Per-format ranking: DOCX was the most vulnerable (0.89 injection ASR); PDF offered the most resistance.
- Font poisoning and homoglyphs achieved 100% ASR across all data loaders. Metadata injection was largely ineffective except against LangChain.
- In end-to-end testing, most techniques (camouflage, font poisoning, transparent text, out-of-bound, zero-size) achieved ASR = 1.0 across all 6 RAG systems. Metadata had 0% E2E ASR. Homoglyphs showed model-dependent variation.
- The paper also mapped attacks to a 9-category CIA triad taxonomy: pipeline failure, reasoning overload, unreadable output, empty statement, vague output, bias injection, factual distortion, outdated knowledge, and sensitive data disclosure.
Relevance to hemlock¶
hemlock extends the PhantomText research in several dimensions:
- Format coverage: PhantomText covers 3 formats (PDF, DOCX, HTML). hemlock extends this to 11 formats, adding TXT, Markdown, RTF, EPUB, CSV, JSON, XLSX, and Image.
- Technique count: PhantomText implements 19 technique variants. hemlock implements 57 techniques, of which approximately 12 have direct PhantomText counterparts (zero-width, homoglyph, bidi-override, diacritical, fontzero, whitefont, metadata, offscreen/offpage, CSS-hide, invisible-text, camouflage, hidden-paragraph/vanish). The remaining ~45 are hemlock-original extensions.
- Data loader coverage: PhantomText tested Docling, Haystack, LangChain, LlamaIndex, and LLMSherpa. hemlock's validation engine simulates LangChain, LlamaIndex, Unstructured, and Haystack. Three loaders overlap; hemlock uniquely adds Unstructured, while PhantomText uniquely covers Docling and LLMSherpa.
- Evaluation layers: PhantomText evaluates extraction survival and E2E RAG manipulation as two experiments. hemlock-lab's harness adds an explicit retrieval ranking layer between extraction and injection, isolating the three-stage pipeline: extraction → retrieval → injection.
PhantomText techniques not yet in hemlock
Two PhantomText technique families are not currently implemented in hemlock: OCR poisoning and font cmap poisoning (custom font character remapping). Three others have been added: diacritical marks (TXT), camouflage element (HTML), and DOCX vanish (<w:vanish/> as hidden-paragraph).
How hemlock Operationalizes the Research¶
flowchart TD
subgraph Research
A["PoisonedRAG<br/>Attack methodology<br/>+ payload design"]
B["PhantomText<br/>Hiding techniques<br/>+ extraction testing"]
end
subgraph hemlock
C["craft package<br/>Document generation"]
D["payloads package<br/>Injection templates"]
E["formats packages<br/>57 hiding techniques"]
F["validate package<br/>Framework simulation"]
end
A --> C
A --> D
B --> E
B --> F
C --> G["Poisoned Documents"]
G --> H["RAG Pipeline Testing"]
| Research Contribution | hemlock Implementation |
|---|---|
| Knowledge corpus poisoning methodology | craft.Craft() with cover text and payload embedding |
| Prompt injection payload design | payloads package with 6 categories, 70 variants |
| 19 document hiding techniques (3 formats) | 57 techniques across 11 format packages (extends PhantomText) |
| Extraction tool survival testing | validate package simulating 5 frameworks |
| Attack success measurement | Stealth scores and validation results |
Related Work in RAG Security¶
The PoisonedRAG and PhantomText papers build on and intersect with several related research directions:
Prompt injection attacks. Research by Perez and Ribeiro (2022), Greshake et al. (2023), and others established that LLMs are vulnerable to instruction injection through their input context. RAG poisoning extends this by weaponizing the retrieval step as the injection vector.
Adversarial document attacks. Work on adversarial examples for document understanding models (e.g., adversarial patches on scanned documents, perturbed OCR inputs) explores related but distinct attack surfaces where the goal is to fool the extraction model itself rather than the downstream LLM.
Data poisoning in machine learning. Traditional data poisoning attacks target model training data. RAG poisoning is distinct because it operates at inference time---the model weights are unchanged, but the retrieved context is manipulated.
Supply chain attacks on AI systems. RAG knowledge base poisoning is a form of supply chain attack where the data pipeline is the vector. This connects to broader research on AI supply chain security, including model provenance and data integrity.
Defensive Research and Mitigations¶
Both papers discuss potential defenses. hemlock enables testing these defenses by providing a controlled adversarial input source:
| Defense Category | Approach | Effectiveness |
|---|---|---|
| Input sanitization | Strip hidden content during document ingestion | Partial---effective against some techniques but not all (see the survival matrix) |
| Perplexity filtering | Reject documents with unusual perplexity scores | Limited---well-crafted cover text maintains normal perplexity |
| Provenance verification | Track and verify document sources | Effective in controlled environments but not in open knowledge bases |
| Duplicate/near-duplicate detection | Identify suspiciously similar documents | Effective against bulk poisoning but not targeted single-document attacks |
| Instruction hierarchy | Train LLMs to prioritize system instructions over context | Promising but requires model-level changes |
| Output filtering | Detect injection artifacts in LLM output | Partial---can catch some attack categories but not all |
Using hemlock for Defensive Testing
Generate documents across all techniques and validate against your pipeline's extraction layer. Techniques that survive extraction represent gaps in your sanitization. Use this data to prioritize defensive improvements.
Citations¶
If you use hemlock in academic work, please cite the underlying research:
@inproceedings{zou2025poisonedrag,
title = {PoisonedRAG: Knowledge Corruption Attacks to
Retrieval-Augmented Generation of Large Language Models},
author = {Zou, Wei and Geng, Runpeng and Wang, Binghui and Jia, Jinyuan},
booktitle = {Proceedings of the 34th USENIX Security Symposium},
year = {2025}
}
@inproceedings{castagnaro2025hidden,
author = {Castagnaro, Alberto and Salviati, Umberto and Conti, Mauro
and Pajola, Luca and Pizzi, Simeone},
title = {The Hidden Threat in Plain Text: Attacking RAG Data Loaders},
booktitle = {Proceedings of the 2025 Workshop on Artificial Intelligence
and Security (AISec '25)},
year = {2025},
publisher = {ACM},
doi = {10.1145/3733799.3762976},
note = {Co-located with ACM CCS 2025}
}
Next Steps¶
- Validation Engine --- Test payload survival against RAG frameworks
- Framework Comparison --- Full survival matrix across all techniques and frameworks
- Techniques Reference --- Detailed documentation for each hiding technique