Skip to content

RTF Techniques

hemlock provides five hiding techniques for Rich Text Format documents. RTF is a legacy format still encountered in enterprise document pipelines, email attachments, and older knowledge management systems.

Technique Overview

Technique Stealth Description
metadata 55 Payload in RTF \info block properties
fontzero 75 Zero-point font group invisible to readers
comment 40 Payload in RTF \*\annotation group
fonttable 65 Payload as font name in \fonttbl group
white-text 70 White-on-white text via color table

metadata

How It Works

The payload is embedded inside the RTF {\info ...} block, which holds document metadata properties. This block is parsed by metadata-aware extractors but not rendered as visible text.

Framework Survival

Framework Survives Mechanism
LangChain Extracts body text and info block metadata
LlamaIndex Extracts body text only
Unstructured Extracts body text only, strips metadata
Haystack Extracts body text only

CLI Example

hemlock craft --format rtf --technique metadata --payload override --output ./output

fontzero

How It Works

The payload is placed in a font group with zero-point font size ({\fs0 ...}). The text exists in the RTF body but is invisible at zero font size. Basic extractors that read all text content will capture it.

Framework Survival

Framework Survives Mechanism
LangChain Extracts all body text regardless of font size
LlamaIndex Extracts all body text regardless of font size
Unstructured Extracts all body text regardless of font size
Haystack Extracts all body text regardless of font size

CLI Example

hemlock craft --format rtf --technique fontzero --payload override --output ./output

comment

How It Works

The payload is embedded as an RTF annotation group ({\*\annotation ...}). Annotations are typically not rendered by RTF viewers but may be extracted by document processing tools.

Framework Survival

Framework Survives Mechanism
LangChain Annotation groups not extracted
LlamaIndex Annotation groups not extracted
Unstructured Annotation groups stripped
Haystack Annotation groups not extracted

CLI Example

hemlock craft --format rtf --technique comment --payload override --output ./output

fonttable

How It Works

The payload is embedded as a font name entry in the RTF {\fonttbl} group. Font table entries define the fonts available in the document but are not rendered as body text. The payload appears as {\f1 PAYLOAD;} in the font definitions. Most text extractors skip font table contents entirely.

Framework Survival

Framework Survives Mechanism
LangChain Font table entries not extracted as text
LlamaIndex Font table entries not extracted
Unstructured Font table entries stripped
Haystack Font table entries not extracted

CLI Example

hemlock craft --format rtf --technique fonttable --payload override --output ./output

white-text

How It Works

The payload is placed in the RTF body using {\cf1 ...} where color index 1 is defined as white (255,255,255) in the color table. The text is invisible in rendered output (white on white background) but present in all text extractors that ignore formatting attributes.

Framework Survival

Framework Survives Mechanism
LangChain Extracts all body text regardless of color
LlamaIndex Extracts all body text regardless of color
Unstructured Extracts all body text regardless of color
Haystack Extracts all body text regardless of color

CLI Example

hemlock craft --format rtf --technique white-text --payload override --output ./output

Survival Matrix

Technique Stealth LangChain LlamaIndex Haystack Unstructured
metadata 55
fontzero 75
comment 40
fonttable 65
white-text 70

RTF extraction varies by technique type

Body text techniques (fontzero, white-text) survive all frameworks since extractors process all text runs. Structural groups (fonttable) and metadata (metadata, comment) have inconsistent survival depending on how each framework's RTF parser handles non-body content.