Techniques Overview¶

hemlock ships 63 hiding techniques spread across 11 document formats, each designed to embed prompt injection payloads in locations that survive RAG pipeline text extraction. Techniques range from trivial HTML comments (stealth 30) to LSB steganography in PNG images (stealth 90), and their effectiveness varies significantly depending on which framework ingests the document.

This section provides a technical reference for every technique: how it works at the byte level, which extraction pipelines it defeats, and what defenders should look for.

Master Technique Matrix¶

The table below summarizes every technique, its stealth score, and whether the hidden payload survives extraction by each of the four major RAG frameworks.

Format	Technique	Stealth	Description
HTML	`comment`	30	Hidden HTML comment
HTML	`invisible-div`	55	`display:none` div with offscreen positioning
HTML	`aria-hidden`	70	`aria-hidden="true"` span with offscreen CSS
HTML	`css-hide`	75	Class-based `font-size:0; color:transparent`
HTML	`microdata`	60	Payload in schema.org microdata meta tag
HTML	`chunk-boundary`	65	Payload split across `<p>` tags at chunk-size intervals
HTML	`offscreen`	80	`position:absolute;left:-9999px` div
HTML	`color-transparent`	85	`color:transparent; user-select:none` text
HTML	`noscript`	60	Payload in `<noscript>` tag
DOCX	`metadata`	60	Payload in `docProps/core.xml` Dublin Core fields
DOCX	`metadata-distributed`	70	Payload split across subject, description, keywords, category
DOCX	`fontzero`	80	1pt font `w:r` run in document body
DOCX	`whitefont`	70	White text on white background
DOCX	`comment`	50	Word comment in `comments.xml`
DOCX	`custom-xml`	65	Custom XML data part in the ZIP archive
DOCX	`chunk-boundary`	60	Payload in separate `<w:p>` elements with filler between
DOCX	`hidden-paragraph`	75	`<w:vanish/>` paragraph (Word's hidden text flag)
PDF	`annotation`	65	Near-invisible text annotation
PDF	`invisible-text`	75	0.5pt white text in content stream
PDF	`javascript`	40	PDF JavaScript action containing payload
PDF	`xmp-metadata`	60	XMP metadata stream (Subject, Keywords, Author)
PDF	`xmp-distributed`	70	Payload split across multiple XMP properties
PDF	`chunk-boundary`	55	Payload on separate pages
PDF	`offpage`	70	Text placed at coordinates beyond page dimensions
TXT	`zero-width`	85	Zero-width Unicode character encoding
TXT	`homoglyph`	80	Cyrillic lookalike substitution with hidden payload
TXT	`bidi-override`	70	RTL override characters to hide payload direction
TXT	`chunk-boundary`	45	Payload fragments separated by ~512 chars of benign text
Markdown	`html-comment`	35	HTML comments in `.md` files
Markdown	`frontmatter`	55	Payload in YAML front matter metadata
Markdown	`link-title`	65	Payload distributed across link title attributes
Markdown	`image-alt`	60	Payload distributed across image alt text
Markdown	`chunk-boundary`	50	Payload fragments in separate heading sections
RTF	`metadata`	55	Payload in RTF `\info` block properties
RTF	`fontzero`	75	Zero-point font group invisible to readers
RTF	`comment`	40	Payload in RTF `\*\annotation` group
EPUB	`metadata`	60	Payload in OPF metadata (`dc:description`)
EPUB	`metadata-distributed`	65	Payload split across multiple OPF `<meta>` tags
EPUB	`css-hide`	70	CSS-hidden span with zero font size
EPUB	`comment`	35	XHTML comment in chapter body
EPUB	`aria-hidden`	65	`aria-hidden="true"` span in XHTML content
EPUB	`toc`	55	Payload hidden in EPUB table-of-contents `<navPoint>` labels
CSV	`extra-column`	45	Payload in extra `_metadata` column
CSV	`bom-prefix`	50	Payload after UTF-8 BOM in Notes field
CSV	`formula-injection`	60	`=CONCATENATE()` formula reconstructing payload from cell references
JSON	`metadata-key`	55	Payload in `_metadata.description` JSON key
JSON	`unicode-escape`	70	Payload encoded as `\uXXXX` sequences
XLSX	`hidden-sheet`	75	Payload on a hidden worksheet
XLSX	`metadata`	60	Payload in `docProps/core.xml` properties
XLSX	`comment`	50	Payload in cell comment
XLSX	`fontzero`	80	Zero-point font cell invisible to readers
Image	`text-chunk`	55	Payload in PNG tEXt `Description` chunk
Image	`xmp-metadata`	60	Payload in XMP `dc:description` via PNG iTXt chunk
Image	`multi-chunk`	65	Payload distributed across Title, Author, Description, Comment
Image	`steganographic`	90	Payload encoded in LSB of pixel RGB channels

Legend: Survives Stripped Uncertain / pipeline-dependent

Understanding Stealth Scores¶

Stealth scores range from 0 to 100 and reflect how difficult a technique is to detect through casual inspection or automated scanning:

Range	Rating	Meaning
0--39	Low	Easily detected by basic sanitizers or manual review
40--59	Moderate	Requires format-aware inspection to detect
60--79	High	Survives most casual review; needs targeted detection tooling
80--100	Very High	Extremely difficult to detect without specialized analysis

Scores are assigned based on visual invisibility, resistance to automated stripping, and the likelihood that a human reviewer examining the raw file would notice the payload.

Stealth is not the same as survival

A technique can have a high stealth score (hard to detect) but low survival (stripped by most frameworks). The zero-width technique scores 85 for stealth because it is invisible in any text editor, yet Unstructured strips it during Unicode normalization. Conversely, fontzero in DOCX survives all four frameworks because every loader extracts w:t elements regardless of font size.

Format Deep Dives¶

Each format page covers the full technical detail for its techniques, including generated markup, framework-specific extraction behavior, CLI examples, and detection guidance.

HTML Techniques -- 9 techniques targeting web content and HTML-based knowledge bases
DOCX Techniques -- 8 techniques exploiting the Office Open XML ZIP structure
PDF Techniques -- 7 techniques leveraging PDF content streams, annotations, and metadata
TXT Techniques -- 4 Unicode-based techniques for plain text files
Markdown Techniques -- 5 techniques targeting raw Markdown ingestion
RTF Techniques -- 3 techniques exploiting legacy Rich Text Format structure
EPUB Techniques -- 6 techniques targeting EPUB XHTML and OPF metadata
CSV Techniques -- 3 techniques for comma-separated value data
JSON Techniques -- 2 techniques exploiting JSON key structures and encoding
XLSX Techniques -- 4 techniques targeting Excel spreadsheet internals
Image Techniques -- 4 techniques for PNG metadata injection and steganography