XLSX Techniques¶
hemlock provides four hiding techniques for XLSX spreadsheet files. XLSX files are ZIP archives containing XML parts (like DOCX), making them susceptible to metadata injection, hidden sheets, comments, and font-size tricks. XLSX is ubiquitous in enterprise data workflows, financial reporting, and knowledge base exports.
Technique Overview¶
| Technique | Stealth | Description |
|---|---|---|
hidden-sheet |
75 | Payload on a hidden worksheet |
metadata |
60 | Payload in docProps/core.xml properties |
comment |
50 | Payload in cell comment |
fontzero |
80 | Zero-point font cell invisible to readers |
hidden-sheet¶
How It Works¶
A second worksheet (Sheet2) is added to the workbook with state="hidden" in the workbook XML. The payload is placed in this hidden sheet's cell A1 via the shared strings table. Most spreadsheet viewers only display visible sheets, but extractors that read all sheets will find the payload.
Framework Survival¶
| Framework | Survives | Mechanism |
|---|---|---|
| LangChain | Reads all shared strings including hidden sheet references | |
| LlamaIndex | Reads all shared strings | |
| Unstructured | Reads all shared strings | |
| Haystack | Reads all shared strings |
CLI Example¶
metadata¶
How It Works¶
The payload is embedded in docProps/core.xml within the dc:description Dublin Core field. This follows the same OPC metadata pattern as DOCX metadata injection.
Framework Survival¶
| Framework | Survives | Mechanism |
|---|---|---|
| LangChain | Extracts shared strings and core.xml metadata | |
| LlamaIndex | Reads shared strings only | |
| Unstructured | Reads shared strings only, strips metadata | |
| Haystack | Reads shared strings and comments, not metadata |
CLI Example¶
comment¶
How It Works¶
The payload is placed in a cell comment on A1. The comment is stored in xl/comments1.xml and linked via a sheet-level relationship. Cell comments are a common hiding spot because many spreadsheet viewers show them only on hover.
Framework Survival¶
| Framework | Survives | Mechanism |
|---|---|---|
| LangChain | Does not read comment XML parts | |
| LlamaIndex | Does not read comment XML parts | |
| Unstructured | Strips comments | |
| Haystack | Reads shared strings and comments |
CLI Example¶
fontzero¶
How It Works¶
The payload is placed in cell B1 referencing shared string index 1, styled with s="1" which maps to a zero-point font (fontId="1" with sz val="0"). The text exists in the shared strings table and is extracted by all loaders, but is invisible at zero font size.
Framework Survival¶
| Framework | Survives | Mechanism |
|---|---|---|
| LangChain | Reads all shared strings regardless of styling | |
| LlamaIndex | Reads all shared strings | |
| Unstructured | Reads all shared strings | |
| Haystack | Reads all shared strings |