Skip to content

Quickstart

This guide walks through hemlock's core workflow: generating poisoned documents, exploring available techniques and payloads, validating extraction survival, and running full batch operations.

Prerequisites

hemlock must be installed and on your PATH. See Installation if you have not set it up yet.


Step 1: Generate Poisoned Documents

Generate 5 HTML documents per technique using override payloads, themed around an employee handbook:

hemlock craft \
  --format html \
  --payload override \
  --topic "employee handbook" \
  --output ./test-docs

hemlock creates documents for every HTML hiding technique (10 techniques x 5 variants = 50 documents):

[hemlock] Generated 50 documents in ./test-docs
  poisoned-comment-001.html              (stealth: 30)
  poisoned-comment-002.html              (stealth: 30)
  poisoned-comment-003.html              (stealth: 30)
  poisoned-comment-004.html              (stealth: 30)
  poisoned-comment-005.html              (stealth: 30)
  poisoned-invisible-div-001.html        (stealth: 55)
  poisoned-invisible-div-002.html        (stealth: 55)
  poisoned-invisible-div-003.html        (stealth: 55)
  poisoned-invisible-div-004.html        (stealth: 55)
  poisoned-invisible-div-005.html        (stealth: 55)
  poisoned-aria-hidden-001.html          (stealth: 70)
  ...
  poisoned-microdata-001.html            (stealth: 60)
  ...
  poisoned-chunk-boundary-001.html       (stealth: 65)
  ...
  poisoned-offscreen-001.html            (stealth: 80)
  ...
  poisoned-color-transparent-001.html    (stealth: 85)
  ...
  poisoned-noscript-001.html             (stealth: 60)
  ...
  poisoned-camouflage-001.html           (stealth: 90)
  ...

Each file contains legitimate cover text with a hidden payload embedded using the named technique. The stealth score (0--100) reflects how resistant the technique is to visual detection and automated content filtering.

Fewer variants

Pass --count 1 to generate a single variant per technique instead of 5. Useful for quick exploration.


Step 2: Explore Available Techniques

List every hiding technique hemlock supports for HTML:

hemlock list-techniques --format html
FORMAT  TECHNIQUE            STEALTH  DESCRIPTION
------  ---------            -------  -----------
html    comment              30       Hidden HTML comment containing payload
html    invisible-div        55       Invisible div with display:none and offscreen positioning
html    aria-hidden          70       aria-hidden span with offscreen CSS positioning
html    css-hide             75       CSS class-based hiding with zero font size and transparent color
html    microdata            60       Payload in schema.org microdata meta content attributes
html    chunk-boundary       65       Fragments distributed across DOM with filler between
html    offscreen            80       Payload positioned far offscreen with CSS transforms
html    color-transparent    85       Very small transparent text on matching background
html    noscript             60       Payload inside noscript tags

Omit --format to see techniques across all 11 formats:

hemlock list-techniques

63 techniques total

hemlock implements 10 HTML, 8 DOCX, 7 PDF, 5 TXT, 5 Markdown, 5 RTF, 6 EPUB, 5 CSV, 4 JSON, 4 XLSX, and 4 Image techniques across 11 formats. See Techniques for detailed documentation of each.


Step 3: Target a Specific Technique

Generate a single DOCX document using the fontzero technique with a redirect payload:

hemlock craft \
  --format docx \
  --technique fontzero \
  --payload redirect \
  --count 1 \
  --output ./test-docs
[hemlock] Generated 1 documents in ./test-docs
  poisoned-fontzero-001.docx  (stealth: 80)

The fontzero technique embeds the payload as a 1-point font run---invisible when the document is opened in Word but extracted by most DOCX parsers used in RAG pipelines.


Step 4: Validate Payload Survival

Test whether a payload in a generated document survives extraction by a specific RAG framework:

hemlock validate \
  --file ./test-docs/poisoned-invisible-div-001.html \
  --framework langchain \
  --payload "Ignore all previous instructions."
File:           ./test-docs/poisoned-invisible-div-001.html
Framework:      langchain
Payload Found:  true
Confidence:     high
Notes:          Payload found in extracted text. LangChain strips HTML tags but retains display:none content.

The validation engine simulates how each framework's document loader processes the file. A result of Payload Found: true means the hidden injection text appears in the extracted content that would be sent to the LLM.

Supported frameworks

The --framework flag accepts langchain, llamaindex, unstructured, or haystack. Each simulates the extraction behavior of the corresponding Python library's default document loaders.


Step 5: Explore Payload Templates

List all available payload templates:

hemlock list-payloads
CATEGORY    NAME                                  DESCRIPTION
--------    ----                                  -----------
override    override-system-tag                   Uses a fake [SYSTEM] tag to discard prior context
override    override-deprecation-notice           Claims previous knowledge-base entries are deprecated
override    override-correction                   Asserts prior documents contain errors and supplies a correction
override    override-priority                     Declares a priority override that supersedes other sources
override    override-admin-note                   Impersonates an admin context-window update
...
authority   authority-version-control             Uses version control metadata to imply authoritative document lineage
authority   authority-classification-marking      Uses classification markings and handling instructions to convey authority
manyshot    manyshot-10                           10-shot ICL attack — fits small retrieval chunks (~2k tokens)
manyshot    manyshot-25                           25-shot ICL attack — fits medium retrieval chunks (~4k tokens)
manyshot    manyshot-50                           50-shot ICL attack — targets 8k context windows
manyshot    manyshot-100                          100-shot ICL attack — targets 8-16k context windows
manyshot    manyshot-250                          250-shot ICL attack — targets 32k+ contexts, highest success rate

list-payloads prints one row per concrete template. The current registry contains 75 templates across seven preset categories: override (10), exfiltrate (10), redirect (10), denial (10), multistage (20), authority (10), and manyshot (5). custom is accepted by craft and batch, but it is not listed because the text is supplied by the operator.


Step 6: Generate a Full Batch

Produce one document per technique across every format---a complete engagement document set:

hemlock batch \
  --payload override \
  --output ./engagement-docs \
  --count 1
[hemlock] Generated 63 documents in ./engagement-docs
  poisoned-comment-001.html              (stealth: 30)
  poisoned-invisible-div-001.html        (stealth: 55)
  poisoned-aria-hidden-001.html          (stealth: 70)
  poisoned-css-hide-001.html             (stealth: 75)
  poisoned-microdata-001.html            (stealth: 60)
  poisoned-chunk-boundary-001.html       (stealth: 65)
  poisoned-offscreen-001.html            (stealth: 80)
  poisoned-color-transparent-001.html    (stealth: 85)
  poisoned-noscript-001.html             (stealth: 60)
  poisoned-camouflage-001.html           (stealth: 90)
  poisoned-html-comment-001.md           (stealth: 35)
  poisoned-frontmatter-001.md            (stealth: 55)
  poisoned-link-title-001.md             (stealth: 65)
  poisoned-image-alt-001.md              (stealth: 60)
  poisoned-chunk-boundary-001.md         (stealth: 50)
  poisoned-zero-width-001.txt            (stealth: 85)
  poisoned-homoglyph-001.txt             (stealth: 80)
  poisoned-bidi-override-001.txt         (stealth: 70)
  poisoned-diacritical-001.txt           (stealth: 85)
  poisoned-chunk-boundary-001.txt        (stealth: 45)
  poisoned-metadata-001.docx             (stealth: 60)
  poisoned-fontzero-001.docx             (stealth: 80)
  poisoned-whitefont-001.docx            (stealth: 70)
  poisoned-comment-001.docx              (stealth: 50)
  poisoned-custom-xml-001.docx           (stealth: 65)
  poisoned-metadata-distributed-001.docx (stealth: 70)
  poisoned-chunk-boundary-001.docx       (stealth: 60)
  poisoned-hidden-paragraph-001.docx     (stealth: 75)
  poisoned-annotation-001.pdf            (stealth: 65)
  poisoned-invisible-text-001.pdf        (stealth: 75)
  poisoned-javascript-001.pdf            (stealth: 40)
  poisoned-xmp-metadata-001.pdf          (stealth: 60)
  ...and 33 more across RTF, EPUB, CSV, JSON, XLSX, Image

The batch command iterates over all 11 formats and all techniques within each format, producing one variant per technique (with --count 1). This is the recommended starting point for an engagement: it gives you one document per attack vector to test which techniques survive the target's specific RAG pipeline.

File count scales with --count

The default --count is 5, which produces 315 documents (63 techniques x 5 variants). Set --count 1 for initial reconnaissance, then increase for techniques that prove effective.


Next Steps

  • Concepts --- Understand the RAG poisoning research behind hemlock
  • CLI Reference --- Full documentation for every command and flag
  • Techniques --- Deep dive into each hiding technique
  • Payloads --- Payload categories, templates, and custom injection
  • Validation --- Framework simulation details and extraction behavior
  • Go API --- Use hemlock as a library in your own tools