hemlock score¶
Calculate composite effectiveness scores for format/technique/framework/payload combinations using hemlock's scoring engine. Scores are computed from lookup tables without generating any documents.
Synopsis¶
Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--format |
string |
(required) | Document format to score: html, docx, pdf, txt, markdown, rtf, epub, csv, json, xlsx, image |
--framework |
string |
(required) | Target RAG framework: langchain, llamaindex, haystack, unstructured, generic |
--technique |
string |
Specific technique to score. If omitted, scores all techniques for the format. | |
--payload |
string |
override |
Payload category for complexity factor: override, exfiltrate, redirect, denial, multistage, authority, custom |
--json |
bool |
false |
Output results as JSON instead of a table |
Description¶
score computes a composite effectiveness score for each technique based on three factors:
- Stealth score (0--100): How well hidden the payload is from human inspection
- Survival probability (0.0--1.0): Whether the technique survives extraction by the target framework
- Payload complexity (0.0--1.0): How sophisticated the payload category is
The composite score is calculated as:
$$\text{composite} = \frac{\text{stealth}}{100} \times \text{survival} \times (0.5 + 0.5 \times \text{complexity})$$
Each result receives a letter rating:
| Rating | Score Range |
|---|---|
| A | ≥ 0.80 |
| B | ≥ 0.60 |
| C | ≥ 0.40 |
| D | ≥ 0.20 |
| F | < 0.20 |
Lookup-based scoring
Scores are computed entirely from hemlock's built-in lookup tables (stealth scores, framework survival maps, payload complexity factors). No documents are generated and no live validation is performed. Use hemlock validate for runtime survival testing.
Examples¶
Score all HTML techniques against LangChain¶
FORMAT TECHNIQUE STEALTH SURVIVAL COMPLEXITY SCORE RATING
------ --------- ------- -------- ---------- ----- ------
html comment 30 0.00 0.60 0.00 F
html invisible-div 55 1.00 0.60 0.44 C
html aria-hidden 70 1.00 0.60 0.56 C
html css-hide 75 1.00 0.60 0.60 B
html microdata 60 1.00 0.60 0.48 C
html chunk-boundary 65 1.00 0.60 0.52 C
html offscreen 80 1.00 0.60 0.64 B
html color-transparent 85 1.00 0.60 0.68 B
html noscript 60 1.00 0.60 0.48 C
Score a specific technique with multistage payloads¶
JSON output for programmatic use¶
hemlock score \
--format markdown \
--framework langchain \
--json | jq '.[] | select(.rating == "A" or .rating == "B")'
Compare techniques across frameworks¶
# Score Markdown techniques against each framework
for fw in langchain llamaindex haystack unstructured; do
echo "=== $fw ==="
hemlock score --format markdown --framework "$fw"
echo
done
Payload Complexity Factors¶
The --payload flag determines the complexity factor used in scoring:
| Category | Complexity | Rationale |
|---|---|---|
override |
0.6 | Direct instruction override; moderate sophistication |
exfiltrate |
0.8 | Requires external endpoint interaction; high sophistication |
redirect |
0.7 | Social engineering cues; above average sophistication |
denial |
0.5 | Simple disruption; lower sophistication |
multistage |
0.9 | Two-phase attack with primer/trigger coordination; highest sophistication |
authority |
0.85 | Authority mimicry with institutional framing; very high sophistication |
custom |
0.5 | Default factor for user-provided payloads |
Tips¶
Use score to plan engagements
Run score before generating documents to identify the most effective technique/framework combinations for your target. Focus document generation on techniques with B or A ratings.
Generic framework scoring
When --framework generic is used, survival probability is computed as the proportion of frameworks the technique survives (e.g., if a technique survives 3 of 4 frameworks, survival = 0.75).