Skip to content

hemlock score

Calculate composite effectiveness scores for format/technique/framework/payload combinations using hemlock's scoring engine. Scores are computed from lookup tables without generating any documents.

Synopsis

hemlock score --format <format> --framework <framework> [flags]

Flags

Flag Type Default Description
--format string (required) Document format to score: html, docx, pdf, txt, markdown, rtf, epub, csv, json, xlsx, image
--framework string (required) Target RAG framework: langchain, llamaindex, haystack, unstructured, generic
--technique string Specific technique to score. If omitted, scores all techniques for the format.
--payload string override Payload category for complexity factor: override, exfiltrate, redirect, denial, multistage, authority, custom
--json bool false Output results as JSON instead of a table

Description

score computes a composite effectiveness score for each technique based on three factors:

  1. Stealth score (0--100): How well hidden the payload is from human inspection
  2. Survival probability (0.0--1.0): Whether the technique survives extraction by the target framework
  3. Payload complexity (0.0--1.0): How sophisticated the payload category is

The composite score is calculated as:

$$\text{composite} = \frac{\text{stealth}}{100} \times \text{survival} \times (0.5 + 0.5 \times \text{complexity})$$

Each result receives a letter rating:

Rating Score Range
A ≥ 0.80
B ≥ 0.60
C ≥ 0.40
D ≥ 0.20
F < 0.20

Lookup-based scoring

Scores are computed entirely from hemlock's built-in lookup tables (stealth scores, framework survival maps, payload complexity factors). No documents are generated and no live validation is performed. Use hemlock validate for runtime survival testing.


Examples

Score all HTML techniques against LangChain

hemlock score --format html --framework langchain
FORMAT  TECHNIQUE            STEALTH  SURVIVAL  COMPLEXITY  SCORE  RATING
------  ---------            -------  --------  ----------  -----  ------
html    comment              30       0.00      0.60        0.00   F
html    invisible-div        55       1.00      0.60        0.44   C
html    aria-hidden          70       1.00      0.60        0.56   C
html    css-hide             75       1.00      0.60        0.60   B
html    microdata            60       1.00      0.60        0.48   C
html    chunk-boundary       65       1.00      0.60        0.52   C
html    offscreen            80       1.00      0.60        0.64   B
html    color-transparent    85       1.00      0.60        0.68   B
html    noscript             60       1.00      0.60        0.48   C

Score a specific technique with multistage payloads

hemlock score \
  --format html \
  --technique css-hide \
  --framework langchain \
  --payload multistage

JSON output for programmatic use

hemlock score \
  --format markdown \
  --framework langchain \
  --json | jq '.[] | select(.rating == "A" or .rating == "B")'

Compare techniques across frameworks

# Score Markdown techniques against each framework
for fw in langchain llamaindex haystack unstructured; do
  echo "=== $fw ==="
  hemlock score --format markdown --framework "$fw"
  echo
done

Payload Complexity Factors

The --payload flag determines the complexity factor used in scoring:

Category Complexity Rationale
override 0.6 Direct instruction override; moderate sophistication
exfiltrate 0.8 Requires external endpoint interaction; high sophistication
redirect 0.7 Social engineering cues; above average sophistication
denial 0.5 Simple disruption; lower sophistication
multistage 0.9 Two-phase attack with primer/trigger coordination; highest sophistication
authority 0.85 Authority mimicry with institutional framing; very high sophistication
custom 0.5 Default factor for user-provided payloads

Tips

Use score to plan engagements

Run score before generating documents to identify the most effective technique/framework combinations for your target. Focus document generation on techniques with B or A ratings.

Generic framework scoring

When --framework generic is used, survival probability is computed as the proportion of frameworks the technique survives (e.g., if a technique survives 3 of 4 frameworks, survival = 0.75).