Core Concepts¶
IEEE 754 Float32 Representation¶
Every float32 value is stored as 32 bits:
┌──────┬──────────┬───────────────────────┐
│ Sign │ Exponent │ Mantissa │
│ 1bit │ 8 bits │ 23 bits │
└──────┴──────────┴───────────────────────┘
bit 31 bits 30-23 bits 22-0
The mantissa (fractional part) stores the precision of the number. The lowest bits contribute the least to the value. Flipping bit 0 of a weight like 1.0 changes it by ~\(1.19 \times 10^{-7}\) — well below the noise floor that affects model behavior.
LSB Steganography¶
Least Significant Bit (LSB) steganography replaces the bottom N bits of each float32 mantissa with payload data. At lsb_depth=N, each value carries N bits of payload.
Empirical capacity numbers under embargo
Specific safe-depth ranges, per-model capacities, and perplexity impact measurements are part of the embargoed academic submission and are not published on this site.
bf16/fp16 Native Embedding¶
Most modern LLMs ship in bfloat16 (7 mantissa bits) or float16 (10 mantissa bits). Manta embeds directly in the native format using uint16 bit manipulation — no f32 cast required.
Depth vs. Stealth Tradeoff¶
| Depth | Bits/float | Perturbation | Stealth |
|---|---|---|---|
| 1–2 | 1–2 | Negligible | Extremely high |
| 3 | 3 | Below noise floor | High (default) |
| 4–5 | 4–5 | Measurable | Moderate |
| 6+ | 6+ | Significant | Low |
MANT Frame Format¶
Every embedded payload is wrapped in a frame:
┌────────────┬──────────────┬─────────────────┐
│ MAGIC │ LENGTH │ PAYLOAD │
│ "MANT" │ u32 LE │ N bytes │
│ 4 bytes │ 4 bytes │ │
└────────────┴──────────────┴─────────────────┘
- Magic (
0x4D414E54): Identifies embedded data. After encryption, this is ciphertext — not detectable. - Length: Payload size in bytes (little-endian u32). Max ~4 GB per tensor.
The frame allows the extractor to know exactly how many bytes to read.
Embed/Extract Pipeline¶
Embedding (writing)¶
payload
│
├─ [ECC encode] Reed-Solomon adds redundancy shards
│ (recovers from missing/zeroed tensor shards)
│
├─ [Encrypt] AES-256-GCM with Argon2id-derived key
│ (makes bit plane look random)
│
├─ [Frame] MANT header + length prefix
│
└─ [LSB embed] Write framed data into float32 mantissa bits
(distributed across target tensors)
Extraction (reading)¶
model file
│
├─ [LSB extract] Read bits from float32 mantissa
│
├─ [Unframe] Parse MANT header, read exact length
│
├─ [Decrypt] AES-256-GCM with same passphrase
│
└─ [ECC decode] Reed-Solomon recovers from missing shards
→ original payload
Tensor Targeting¶
Not all tensors are equal carriers. The tool targets tensors in priority order:
- MLP projection layers (
down_proj,gate_proj,up_proj) — large, tolerant to noise - Attention output projections (
o_proj) — moderate size, reasonable tolerance - Other non-sensitive tensors — fallback
Avoided tensors (high sensitivity to perturbation):
embed_tokens— token embeddings directly affect vocabulary mappinglm_head— output logits, directly affects generation- Layer norms (
layernorm,rmsnorm) — small tensors, high impact per-bit - Rotary position embeddings — structural, not suitable
Extraction Keys¶
After embedding, the tool outputs an extraction key — a JSON object containing everything needed to recover the payload:
{
"tensor_names": ["model.layers.0.mlp.down_proj.weight"],
"lsb_depth": 3,
"encrypted": true,
"ecc_enabled": true,
"ecc_ratio": 0.5
}
The passphrase is not included in the key — it must be known by the operator.
Operational Security
The extraction key must be transmitted out-of-band (never alongside the model). Consider encoding as base64, QR code, DNS TXT record, or embedding in another artifact.
Memory Requirements¶
Manta loads the entire model file into memory for both embedding and extraction (full-file buffering). During embedding, peak memory usage is approximately 2–3× the model file size due to the original read buffer, a mutable copy, and intermediary payload/ECC buffers.
| Model Size | Approx. RAM Needed |
|---|---|
| 1B (~2 GB) | ~4–6 GB |
| 7B (~14 GB safetensors) | ~30–40 GB |
| 13B+ (~26 GB) | ~64+ GB |
Large models
For models that exceed available RAM, run on a machine with sufficient memory (e.g. Strix Halo with 128 GB unified RAM). Streaming / mmap support is planned but not yet implemented.