Skip to content

XLSX Techniques

hemlock provides four hiding techniques for XLSX spreadsheet files. XLSX files are ZIP archives containing XML parts (like DOCX), making them susceptible to metadata injection, hidden sheets, comments, and font-size tricks. XLSX is ubiquitous in enterprise data workflows, financial reporting, and knowledge base exports.

Technique Overview

Technique Stealth Description
hidden-sheet 75 Payload on a hidden worksheet
metadata 60 Payload in docProps/core.xml properties
comment 50 Payload in cell comment
fontzero 80 Zero-point font cell invisible to readers

hidden-sheet

How It Works

A second worksheet (Sheet2) is added to the workbook with state="hidden" in the workbook XML. The payload is placed in this hidden sheet's cell A1 via the shared strings table. Most spreadsheet viewers only display visible sheets, but extractors that read all sheets will find the payload.

Framework Survival

Framework Survives Mechanism
LangChain Reads all shared strings including hidden sheet references
LlamaIndex Reads all shared strings
Unstructured Reads all shared strings
Haystack Reads all shared strings

CLI Example

hemlock craft --format xlsx --technique hidden-sheet --payload override --output ./output

metadata

How It Works

The payload is embedded in docProps/core.xml within the dc:description Dublin Core field. This follows the same OPC metadata pattern as DOCX metadata injection.

Framework Survival

Framework Survives Mechanism
LangChain Extracts shared strings and core.xml metadata
LlamaIndex Reads shared strings only
Unstructured Reads shared strings only, strips metadata
Haystack Reads shared strings and comments, not metadata

CLI Example

hemlock craft --format xlsx --technique metadata --payload override --output ./output

comment

How It Works

The payload is placed in a cell comment on A1. The comment is stored in xl/comments1.xml and linked via a sheet-level relationship. Cell comments are a common hiding spot because many spreadsheet viewers show them only on hover.

Framework Survival

Framework Survives Mechanism
LangChain Does not read comment XML parts
LlamaIndex Does not read comment XML parts
Unstructured Strips comments
Haystack Reads shared strings and comments

CLI Example

hemlock craft --format xlsx --technique comment --payload override --output ./output

fontzero

How It Works

The payload is placed in cell B1 referencing shared string index 1, styled with s="1" which maps to a zero-point font (fontId="1" with sz val="0"). The text exists in the shared strings table and is extracted by all loaders, but is invisible at zero font size.

Framework Survival

Framework Survives Mechanism
LangChain Reads all shared strings regardless of styling
LlamaIndex Reads all shared strings
Unstructured Reads all shared strings
Haystack Reads all shared strings

CLI Example

hemlock craft --format xlsx --technique fontzero --payload override --output ./output