Skip to content

manta metrics

Compare an original model and a modified model and rank the tensors with the largest perturbation impact.

Usage

manta metrics \
  --original clean_model.safetensors \
  --modified weaponized.safetensors

What It Measures

The command uses the cheap perturbation metrics implemented in manta-core, not language-model perplexity. It is intended for fast iteration while tuning carrier selection and lsb_depth.

Per tensor it reports:

  • l2_norm — Euclidean distance between original and modified tensor values
  • linf_norm — largest absolute element change
  • relative_error_mean — mean relative element error
  • relative_error_p99 — 99th percentile relative element error
  • impact_score — heuristic ranking score used to sort tensors
  • warnings — threshold labels exceeded by that tensor

Output Shape

{
  "summary": {
    "tensor_count": 2,
    "tensors_with_warnings": 0,
    "mean_l2_norm": 0.000012,
    "mean_relative_error_mean": 0.000000003,
    "max_l2_norm": {
      "name": "model.layers.0.mlp.down_proj.weight",
      "value": 0.000021
    },
    "max_linf_norm": {
      "name": "model.layers.0.mlp.down_proj.weight",
      "value": 0.000001
    },
    "max_relative_error_p99": {
      "name": "model.layers.0.mlp.down_proj.weight",
      "value": 0.0000002
    },
    "thresholds": {
      "l2_norm": 0.0001,
      "linf_norm": 0.000001,
      "relative_error_mean": 0.00000001,
      "relative_error_p99": 0.000001
    }
  },
  "per_tensor": [
    {
      "impact_rank": 1,
      "impact_score": 0.42,
      "name": "model.layers.0.mlp.down_proj.weight",
      "l2_norm": 0.000021,
      "linf_norm": 0.000001,
      "relative_error_mean": 0.000000003,
      "relative_error_p99": 0.0000002,
      "warnings": []
    }
  ]
}

Interpretation

Use the summary block first:

  • max_l2_norm shows the single tensor with the largest total perturbation
  • max_relative_error_p99 is usually the best quick indicator of outlier distortion
  • tensors_with_warnings tells you how many tensors exceeded the built-in heuristic thresholds

Then use per_tensor:

  • Start with impact_rank = 1
  • Check warnings before looking at raw magnitudes
  • Compare the same command across different lsb_depth values to see when a specific tensor starts standing out

Typical Workflow

manta embed \
  -m clean_model.safetensors \
  -o weaponized.safetensors \
  -p payload.json \
  -d 3 \
  > extraction_key.json

manta metrics \
  --original clean_model.safetensors \
  --modified weaponized.safetensors

python3 scripts/scanner_baseline.py \
  clean_model.safetensors \
  weaponized.safetensors \
  --lsb-depth 3

Use manta metrics to measure numerical drift, then scanner_baseline.py to estimate what a safetensors-aware scanner would notice.