Skip to content

hemlock attack optimize

Declaration-gated front door for the existing CEM, genetic, and whitebox optimizers. Differs from craft --optimize-iterations / --genetic / --whitebox in three deliberate ways:

  1. The optimizer is mandatory and the budget is pinned by a declaration. A TOML file (.hemlock-threatmodel.toml) selects the optimizer and caps max_iterations. CLI flags can shrink the budget but never exceed it.
  2. The declaration captures operator-claimed white-box access, payload category, and legitimate-use justification. Whitebox optimizer runs are rejected unless the operator has declared at least embedder-level access.
  3. The output is an optimization-record.json that pins the declaration's SHA-256, hemlock SHA, engagement context, and the documents generated together. A single artifact is sufficient to audit who ran what optimizer against what target with what claimed access.

Synopsis

hemlock attack optimize init     [--out PATH]
hemlock attack optimize validate [--threat-model PATH]
hemlock attack optimize run      [flags]

Subcommands

init

Write a starter .hemlock-threatmodel.toml with placeholder values for the operator to fill in. The template is structurally valid but Validate will reject the placeholder strings until they are replaced with real values.

Flag Type Default Description
--out string ./.hemlock-threatmodel.toml Destination path

validate

Validate a declaration without running the optimizer. Reports the SHA-256 of the declaration on success.

Flag Type Default Description
--threat-model string ./.hemlock-threatmodel.toml Path to declaration

run

Run the optimizer named in the declaration and write an optimization record alongside the generated documents.

Flag Type Default Description
--threat-model string ./.hemlock-threatmodel.toml Path to declaration
--target-query string (required) Retrieval query to optimize against
--embed-provider string (required) Embedding provider for similarity scoring: openai, ollama
--cover-text string Explicit cover text for the document body
--cover-text-file string Path to a file whose contents become the cover text
--topic string general knowledge base Topic for auto-generated cover text
--injection-model-host string http://localhost:9090 Reward model server URL (only used when declaration's injection_weight > 0)
--technique string all Hiding technique to use
--format string txt Output format
--count int 1 Number of variants per technique
--seed int64 0 Random seed (0 = random)
--output string ./output Output directory for generated files and the optimization record
--record string <output>/optimization-record.json Optimization record path
--scope-check bool true Verify declared target.framework is in engagement scope
--max-iterations int 0 Override the declaration's max_iterations (must be ≤ declared)

Threat-model schema

schema_version = 1
declared_at = "2026-05-06T12:00:00Z"   # RFC3339

[target]
  framework = "langchain"               # langchain | llamaindex | unstructured | haystack | generic
  model = "qwen"                        # qwen | llama3 | mistral | gemma | phi | deepseek | gpt-4 | claude | llama
  payload_category = "override"         # override | exfiltrate | redirect | denial | multistage | authority | manyshot

[access]
  white_box = "embedder"                # none | embedder | embedder_and_llm | full
  knowledge_source = "vendor open-source docs"

[justification]
  purpose = "engagement EX-2026-001 authorized red-team measurement"
  ethics_review = "in_scope"            # optional

[constraints]
  optimizer = "genetic"                 # cem | genetic | whitebox
  max_iterations = 30
  trigger_length = 10
  injection_weight = 0.0                # [0,1]
  naturalness_weight = 0.3              # [0,1]
  population_size = 20                  # genetic only
  generations = 30                      # genetic only

access.white_box=none is rejected when constraints.optimizer = "whitebox" — the whitebox optimizer requires gradient access to at least the embedder, so claiming no white-box access while running whitebox is incoherent.

Output

The optimization-record.json written by run is a snapshot of the run including:

  • threat_model_sha256 — SHA-256 of the declaration at run time (auditable, content-addressable)
  • threat_model_snapshot — the declaration's content embedded inline
  • hemlock_sha, hemlock_version, operator_identity, engagement_id, engagement_chain_sha256
  • target_query, embed_provider, effective_iterations, elapsed_seconds
  • outputs[] — one entry per generated document (filename, technique, similarity score, stealth score)

Examples

# 1. Write the template
hemlock attack optimize init

# 2. Edit ./.hemlock-threatmodel.toml — fill in justification.purpose and
#    access.knowledge_source, set the optimizer + budget, etc.

# 3. Confirm valid
hemlock attack optimize validate
#   [hemlock] threat model OK (hash=77dce42cee51)

# 4. Run
hemlock attack optimize run \
  --target-query "What is the recommended security policy?" \
  --topic "corporate security policy" \
  --embed-provider ollama \
  --technique zero-width \
  --format txt \
  --count 1 \
  --output ./out

A budget exceedance is rejected at flag-parse time:

hemlock attack optimize run --max-iterations 999 ...
# Error: optimize run: --max-iterations=999 exceeds declared budget 5