hemlock¶
Weaponize documents. Test RAG defenses. Harden pipelines.
hemlock is a Go library and CLI tool that generates documents containing hidden prompt injection payloads for testing the security of Retrieval-Augmented Generation (RAG) pipelines. It operationalizes two peer-reviewed research programs---PoisonedRAG (Zou et al., USENIX Security 2025) and PhantomText (Castagnaro et al., AISec '25)---into a single, portable binary that security professionals can deploy during engagements.
Authorized Use Only
hemlock is a security testing tool. Use it only against systems you own or have explicit written authorization to test. Unauthorized injection of poisoned documents into production knowledge bases may violate computer fraud laws in your jurisdiction. The authors assume no liability for misuse.
What Is hemlock?¶
RAG systems retrieve documents from a knowledge base and feed them to a language model as context. If an attacker can insert crafted documents into that knowledge base, hidden instructions survive extraction and reach the LLM---overriding behavior, exfiltrating data, or denying service.
hemlock automates this attack surface:
- Generates documents across 11 formats with 63 hiding techniques
- Embeds prompt injection payloads invisible to human readers but extracted by RAG loaders
- Validates whether payloads survive processing by LangChain, LlamaIndex, Unstructured.io, and Haystack
Quick Install¶
Requires Go 1.25+. See Installation for building from source and cross-compilation.
Quick Example¶
Generate five HTML documents with override payloads targeting an employee handbook topic:
[hemlock] Generated 50 documents in ./test-docs
poisoned-comment-001.html (stealth: 30)
poisoned-comment-002.html (stealth: 30)
poisoned-comment-003.html (stealth: 30)
poisoned-comment-004.html (stealth: 30)
poisoned-comment-005.html (stealth: 30)
poisoned-invisible-div-001.html (stealth: 55)
poisoned-invisible-div-002.html (stealth: 55)
poisoned-invisible-div-003.html (stealth: 55)
poisoned-invisible-div-004.html (stealth: 55)
poisoned-invisible-div-005.html (stealth: 55)
poisoned-aria-hidden-001.html (stealth: 70)
...
poisoned-noscript-005.html (stealth: 60)
poisoned-camouflage-005.html (stealth: 90)
Each document contains legitimate cover text about employee handbooks with a hidden override payload embedded using the specified technique. The stealth score (0--100) indicates how likely the payload is to evade visual inspection and basic content filters.
Feature Highlights¶
-
11 Document Formats
HTML, DOCX, PDF, TXT, Markdown, RTF, EPUB, CSV, JSON, XLSX, and Image---covering the formats RAG pipelines ingest most frequently.
-
63 Hiding Techniques
From zero-width Unicode and font-size-zero runs to CSS hiding, PDF annotations, chunk-boundary distribution, and steganographic image encoding. Each technique maps to real-world attack vectors documented in the research literature.
-
75 Payload Templates
Seven preset categories---override, exfiltrate, redirect, denial, multi-stage, authority, and manyshot---with 75 templates total, plus support for custom injection text.
-
Validation Engine
Simulates text extraction by LangChain, LlamaIndex, Unstructured.io, and Haystack to confirm payloads survive the RAG processing pipeline before deployment.
-
Go Library API
Import
github.com/professor-moody/hemlock/pkg/craftdirectly into your security tooling, CI pipelines, or custom harnesses. -
Single Static Binary
No runtime dependencies. Cross-compiles to Linux, macOS, and Windows from a single
make releaseinvocation.
Who Is This For?¶
hemlock operationalizes PoisonedRAG (Zou et al., USENIX Security 2025) and PhantomText (Castagnaro et al., AISec '25) into reproducible experiments. Use it to:
- Replicate poisoning attack scenarios against RAG architectures
- Measure stealth and extraction survival across frameworks
- Benchmark defensive techniques such as input sanitization and provenance verification
During authorized engagements, hemlock generates deployment-ready poisoned documents. Use it to:
- Produce format-specific payloads that target a client's RAG stack
- Validate payload survival before inserting documents into knowledge bases
- Generate full engagement document sets with a single
hemlock batchcommand
hemlock provides a structured adversarial test suite for your pipeline. Use it to:
- Stress-test document loaders against known injection techniques
- Integrate validation into CI/CD to catch extraction regressions
- Evaluate the effectiveness of content filtering and sanitization layers
Documentation¶
| Section | Description |
|---|---|
| Installation | Install via go install, build from source, or cross-compile |
| Quickstart | Step-by-step first use: craft, list, validate, batch |
| Concepts | RAG poisoning fundamentals and the research behind hemlock |
| CLI Reference | Complete command and flag documentation |
| Techniques | All 63 hiding techniques with format-specific details |
| Payloads | Payload categories, variants, and custom injection |
| Validation | Framework simulation engine and extraction behavior |
| Go API | Library usage, types, and integration patterns |
| Research | PoisonedRAG, PhantomText (AISec '25), and the academic foundation |