Quickstart¶
This guide walks through hemlock's core workflow: generating poisoned documents, exploring available techniques and payloads, validating extraction survival, and running full batch operations.
Prerequisites
hemlock must be installed and on your PATH. See Installation if you have not set it up yet.
Step 1: Generate Poisoned Documents¶
Generate 5 HTML documents per technique using override payloads, themed around an employee handbook:
hemlock craft \
--format html \
--payload override \
--topic "employee handbook" \
--output ./test-docs
hemlock creates documents for every HTML hiding technique (10 techniques x 5 variants = 50 documents):
[hemlock] Generated 50 documents in ./test-docs
poisoned-comment-001.html (stealth: 30)
poisoned-comment-002.html (stealth: 30)
poisoned-comment-003.html (stealth: 30)
poisoned-comment-004.html (stealth: 30)
poisoned-comment-005.html (stealth: 30)
poisoned-invisible-div-001.html (stealth: 55)
poisoned-invisible-div-002.html (stealth: 55)
poisoned-invisible-div-003.html (stealth: 55)
poisoned-invisible-div-004.html (stealth: 55)
poisoned-invisible-div-005.html (stealth: 55)
poisoned-aria-hidden-001.html (stealth: 70)
...
poisoned-microdata-001.html (stealth: 60)
...
poisoned-chunk-boundary-001.html (stealth: 65)
...
poisoned-offscreen-001.html (stealth: 80)
...
poisoned-color-transparent-001.html (stealth: 85)
...
poisoned-noscript-001.html (stealth: 60)
...
poisoned-camouflage-001.html (stealth: 90)
...
Each file contains legitimate cover text with a hidden payload embedded using the named technique. The stealth score (0--100) reflects how resistant the technique is to visual detection and automated content filtering.
Fewer variants
Pass --count 1 to generate a single variant per technique instead of 5. Useful for quick exploration.
Step 2: Explore Available Techniques¶
List every hiding technique hemlock supports for HTML:
FORMAT TECHNIQUE STEALTH DESCRIPTION
------ --------- ------- -----------
html comment 30 Hidden HTML comment containing payload
html invisible-div 55 Invisible div with display:none and offscreen positioning
html aria-hidden 70 aria-hidden span with offscreen CSS positioning
html css-hide 75 CSS class-based hiding with zero font size and transparent color
html microdata 60 Payload in schema.org microdata meta content attributes
html chunk-boundary 65 Fragments distributed across DOM with filler between
html offscreen 80 Payload positioned far offscreen with CSS transforms
html color-transparent 85 Very small transparent text on matching background
html noscript 60 Payload inside noscript tags
Omit --format to see techniques across all 11 formats:
63 techniques total
hemlock implements 10 HTML, 8 DOCX, 7 PDF, 5 TXT, 5 Markdown, 5 RTF, 6 EPUB, 5 CSV, 4 JSON, 4 XLSX, and 4 Image techniques across 11 formats. See Techniques for detailed documentation of each.
Step 3: Target a Specific Technique¶
Generate a single DOCX document using the fontzero technique with a redirect payload:
hemlock craft \
--format docx \
--technique fontzero \
--payload redirect \
--count 1 \
--output ./test-docs
The fontzero technique embeds the payload as a 1-point font run---invisible when the document is opened in Word but extracted by most DOCX parsers used in RAG pipelines.
Step 4: Validate Payload Survival¶
Test whether a payload in a generated document survives extraction by a specific RAG framework:
hemlock validate \
--file ./test-docs/poisoned-invisible-div-001.html \
--framework langchain \
--payload "Ignore all previous instructions."
File: ./test-docs/poisoned-invisible-div-001.html
Framework: langchain
Payload Found: true
Confidence: high
Notes: Payload found in extracted text. LangChain strips HTML tags but retains display:none content.
The validation engine simulates how each framework's document loader processes the file. A result of Payload Found: true means the hidden injection text appears in the extracted content that would be sent to the LLM.
Supported frameworks
The --framework flag accepts langchain, llamaindex, unstructured, or haystack. Each simulates the extraction behavior of the corresponding Python library's default document loaders.
Step 5: Explore Payload Templates¶
List all available payload templates:
CATEGORY NAME DESCRIPTION
-------- ---- -----------
override override-system-tag Uses a fake [SYSTEM] tag to discard prior context
override override-deprecation-notice Claims previous knowledge-base entries are deprecated
override override-correction Asserts prior documents contain errors and supplies a correction
override override-priority Declares a priority override that supersedes other sources
override override-admin-note Impersonates an admin context-window update
...
authority authority-version-control Uses version control metadata to imply authoritative document lineage
authority authority-classification-marking Uses classification markings and handling instructions to convey authority
manyshot manyshot-10 10-shot ICL attack — fits small retrieval chunks (~2k tokens)
manyshot manyshot-25 25-shot ICL attack — fits medium retrieval chunks (~4k tokens)
manyshot manyshot-50 50-shot ICL attack — targets 8k context windows
manyshot manyshot-100 100-shot ICL attack — targets 8-16k context windows
manyshot manyshot-250 250-shot ICL attack — targets 32k+ contexts, highest success rate
list-payloads prints one row per concrete template. The current registry contains 75 templates across seven preset categories: override (10), exfiltrate (10), redirect (10), denial (10), multistage (20), authority (10), and manyshot (5). custom is accepted by craft and batch, but it is not listed because the text is supplied by the operator.
Step 6: Generate a Full Batch¶
Produce one document per technique across every format---a complete engagement document set:
[hemlock] Generated 63 documents in ./engagement-docs
poisoned-comment-001.html (stealth: 30)
poisoned-invisible-div-001.html (stealth: 55)
poisoned-aria-hidden-001.html (stealth: 70)
poisoned-css-hide-001.html (stealth: 75)
poisoned-microdata-001.html (stealth: 60)
poisoned-chunk-boundary-001.html (stealth: 65)
poisoned-offscreen-001.html (stealth: 80)
poisoned-color-transparent-001.html (stealth: 85)
poisoned-noscript-001.html (stealth: 60)
poisoned-camouflage-001.html (stealth: 90)
poisoned-html-comment-001.md (stealth: 35)
poisoned-frontmatter-001.md (stealth: 55)
poisoned-link-title-001.md (stealth: 65)
poisoned-image-alt-001.md (stealth: 60)
poisoned-chunk-boundary-001.md (stealth: 50)
poisoned-zero-width-001.txt (stealth: 85)
poisoned-homoglyph-001.txt (stealth: 80)
poisoned-bidi-override-001.txt (stealth: 70)
poisoned-diacritical-001.txt (stealth: 85)
poisoned-chunk-boundary-001.txt (stealth: 45)
poisoned-metadata-001.docx (stealth: 60)
poisoned-fontzero-001.docx (stealth: 80)
poisoned-whitefont-001.docx (stealth: 70)
poisoned-comment-001.docx (stealth: 50)
poisoned-custom-xml-001.docx (stealth: 65)
poisoned-metadata-distributed-001.docx (stealth: 70)
poisoned-chunk-boundary-001.docx (stealth: 60)
poisoned-hidden-paragraph-001.docx (stealth: 75)
poisoned-annotation-001.pdf (stealth: 65)
poisoned-invisible-text-001.pdf (stealth: 75)
poisoned-javascript-001.pdf (stealth: 40)
poisoned-xmp-metadata-001.pdf (stealth: 60)
...and 33 more across RTF, EPUB, CSV, JSON, XLSX, Image
The batch command iterates over all 11 formats and all techniques within each format, producing one variant per technique (with --count 1). This is the recommended starting point for an engagement: it gives you one document per attack vector to test which techniques survive the target's specific RAG pipeline.
File count scales with --count
The default --count is 5, which produces 315 documents (63 techniques x 5 variants). Set --count 1 for initial reconnaissance, then increase for techniques that prove effective.
Next Steps¶
- Concepts --- Understand the RAG poisoning research behind hemlock
- CLI Reference --- Full documentation for every command and flag
- Techniques --- Deep dive into each hiding technique
- Payloads --- Payload categories, templates, and custom injection
- Validation --- Framework simulation details and extraction behavior
- Go API --- Use hemlock as a library in your own tools