Skip to content

JSON Techniques

hemlock provides four hiding techniques for JSON files. JSON is the default data interchange format for APIs, configuration files, and structured data stores. Since JSON is plain text, payloads are embedded in additional keys or encoded values.

Technique Overview

Technique Stealth Description
metadata-key 55 Payload in _metadata.description key
unicode-escape 70 Payload encoded as \uXXXX sequences
nested-object 65 Payload deeply nested in object hierarchy
prototype-key 60 Payload in __proto__ key

metadata-key

How It Works

The payload is placed in a _metadata.description key within the JSON object. The cover text is split into named sections in the main body. Keys prefixed with _ are conventionally treated as internal/hidden metadata by many systems.

Framework Survival

Framework Survives Mechanism
LangChain Raw text pass-through, all keys visible
LlamaIndex Raw text pass-through
Unstructured Raw text pass-through
Haystack Raw text pass-through

CLI Example

hemlock craft --format json --technique metadata-key --payload override --output ./output

unicode-escape

How It Works

Each character of the payload is encoded as a \uXXXX Unicode escape sequence and placed in a notes field using json.RawMessage. While semantically equivalent to the plain text, the escape encoding makes the payload invisible to simple string matching on the raw file bytes—only JSON-aware parsers will decode it.

Framework Survival

Framework Survives Mechanism
LangChain Raw text pass-through (escapes are valid JSON)
LlamaIndex Raw text pass-through
Unstructured Raw text pass-through
Haystack Raw text pass-through

Detection difficulty

The \uXXXX encoding is perfectly valid JSON. The payload is only revealed when the JSON is parsed and the string value is decoded, making it harder to detect with grep-style scanning of raw file contents.

CLI Example

hemlock craft --format json --technique unicode-escape --payload override --output ./output

nested-object

How It Works

The payload is placed deep inside a nested JSON object hierarchy at a path like _config.internal.system.trace_notes. Shallow extractors or depth-limited traversals may not reach the payload, while full JSON text extraction will include it.

Framework Survival

Framework Survives Mechanism
LangChain Raw text pass-through, all nesting levels visible
LlamaIndex Raw text pass-through
Unstructured Raw text pass-through
Haystack Raw text pass-through

CLI Example

hemlock craft --format json --technique nested-object --payload override --output ./output

prototype-key

How It Works

The payload is placed as a value under a __proto__ key in the JSON object. In JavaScript environments, __proto__ is a special property that can trigger prototype pollution. Security-focused JSON sanitizers may strip these keys, but standard Python JSON parsers used by RAG frameworks pass them through without filtering.

Framework Survival

Framework Survives Mechanism
LangChain Python json.loads does not filter key names
LlamaIndex Raw text pass-through
Unstructured Raw text pass-through
Haystack Raw text pass-through

Security scanner interaction

Some security tools flag __proto__ and constructor keys as potential prototype pollution vectors. This makes the technique higher-stealth against content scanners focused on payload text, but potentially detectable by security-focused JSON validators.

CLI Example

hemlock craft --format json --technique prototype-key --payload override --output ./output

Survival Matrix

Technique Stealth LangChain LlamaIndex Haystack Unstructured
metadata-key 55
unicode-escape 70
nested-object 65
prototype-key 60

JSON is plain text

Since JSON files are plain text, literal payload text survives all RAG frameworks via raw extraction. Only encoded techniques like unicode-escape (where the payload is encoded as \uXXXX sequences) fail to survive, as raw text extractors see the escape sequences rather than the decoded payload.