JSON Techniques¶
hemlock provides four hiding techniques for JSON files. JSON is the default data interchange format for APIs, configuration files, and structured data stores. Since JSON is plain text, payloads are embedded in additional keys or encoded values.
Technique Overview¶
| Technique | Stealth | Description |
|---|---|---|
metadata-key |
55 | Payload in _metadata.description key |
unicode-escape |
70 | Payload encoded as \uXXXX sequences |
nested-object |
65 | Payload deeply nested in object hierarchy |
prototype-key |
60 | Payload in __proto__ key |
metadata-key¶
How It Works¶
The payload is placed in a _metadata.description key within the JSON object. The cover text is split into named sections in the main body. Keys prefixed with _ are conventionally treated as internal/hidden metadata by many systems.
Framework Survival¶
| Framework | Survives | Mechanism |
|---|---|---|
| LangChain | Raw text pass-through, all keys visible | |
| LlamaIndex | Raw text pass-through | |
| Unstructured | Raw text pass-through | |
| Haystack | Raw text pass-through |
CLI Example¶
unicode-escape¶
How It Works¶
Each character of the payload is encoded as a \uXXXX Unicode escape sequence and placed in a notes field using json.RawMessage. While semantically equivalent to the plain text, the escape encoding makes the payload invisible to simple string matching on the raw file bytes—only JSON-aware parsers will decode it.
Framework Survival¶
| Framework | Survives | Mechanism |
|---|---|---|
| LangChain | Raw text pass-through (escapes are valid JSON) | |
| LlamaIndex | Raw text pass-through | |
| Unstructured | Raw text pass-through | |
| Haystack | Raw text pass-through |
Detection difficulty
The \uXXXX encoding is perfectly valid JSON. The payload is only revealed when the JSON is parsed and the string value is decoded, making it harder to detect with grep-style scanning of raw file contents.
CLI Example¶
nested-object¶
How It Works¶
The payload is placed deep inside a nested JSON object hierarchy at a path like _config.internal.system.trace_notes. Shallow extractors or depth-limited traversals may not reach the payload, while full JSON text extraction will include it.
Framework Survival¶
| Framework | Survives | Mechanism |
|---|---|---|
| LangChain | Raw text pass-through, all nesting levels visible | |
| LlamaIndex | Raw text pass-through | |
| Unstructured | Raw text pass-through | |
| Haystack | Raw text pass-through |
CLI Example¶
prototype-key¶
How It Works¶
The payload is placed as a value under a __proto__ key in the JSON object. In JavaScript environments, __proto__ is a special property that can trigger prototype pollution. Security-focused JSON sanitizers may strip these keys, but standard Python JSON parsers used by RAG frameworks pass them through without filtering.
Framework Survival¶
| Framework | Survives | Mechanism |
|---|---|---|
| LangChain | Python json.loads does not filter key names |
|
| LlamaIndex | Raw text pass-through | |
| Unstructured | Raw text pass-through | |
| Haystack | Raw text pass-through |
Security scanner interaction
Some security tools flag __proto__ and constructor keys as potential prototype pollution vectors. This makes the technique higher-stealth against content scanners focused on payload text, but potentially detectable by security-focused JSON validators.
CLI Example¶
Survival Matrix¶
| Technique | Stealth | LangChain | LlamaIndex | Haystack | Unstructured |
|---|---|---|---|---|---|
metadata-key |
55 | ||||
unicode-escape |
70 | ||||
nested-object |
65 | ||||
prototype-key |
60 |
JSON is plain text
Since JSON files are plain text, literal payload text survives all RAG frameworks via raw extraction. Only encoded techniques like unicode-escape (where the payload is encoded as \uXXXX sequences) fail to survive, as raw text extractors see the escape sequences rather than the decoded payload.