hemlock Integration¶

hemlock-lab's primary purpose is to validate and improve hemlock's accuracy. This page covers the complete workflow from generating test documents to fixing drifts.

Workflow¶

graph TD
    A["1. hemlock batch<br/>Generate poisoned docs"] --> B["2. hemlock validate --json<br/>Export predictions"]
    B --> C["3. make test<br/>Run against real pipelines"]
  C --> D["4. Read drift.md<br/>Identify incorrect predictions"]
    D --> E{"Drifts found?"}
    E -->|Yes| F["5. Fix pkg/validate/*.go<br/>Update survival matrix"]
    F --> G["6. go test ./...<br/>Verify hemlock tests pass"]
    G --> H["7. Reinstall hemlock on VM"]
    H --> I["8. make restore && make test<br/>Verify drifts are fixed"]
    I --> D
    E -->|No| J["Done! Matrix is<br/>calibrated for current versions"]

Step 1: Generate Test Documents¶

# On the VM (or locally if hemlock is installed)
hemlock batch --output-dir /tmp/hemlock-batch --all-formats --all-techniques

This generates poisoned documents for every valid format×technique combination. With 10 formats and 36 techniques, this produces hundreds of documents (not all combinations are valid).

Step 2: Export Predictions¶

hemlock validate --json --output hemlock-predictions.json

This produces hemlock's prediction for every format×technique×framework combination:

{
  "predictions": [
    {
      "format": "html",
      "technique": "csshide",
      "framework": "langchain",
      "prediction": "survive"
    },
    {
      "format": "html",
      "technique": "csshide",
      "framework": "unstructured",
      "prediction": "stripped"
    }
  ]
}

Step 3: Test Against Real Pipelines¶

make test

The harness sends each document to each pipeline's /extract endpoint and records whether the payload was found in the extracted text.

Step 4: Read the Drift Report¶

cat reports/*/drift.md

The drift report shows every prediction that was wrong:

## Drifts (28 total)

### LangChain (8 drifts)
| Technique | Format | Predicted | Actual |
|-----------|--------|-----------|--------|
| csshide   | html   | survive   | stripped |

Step 5: Fix hemlock's Validators¶

Each drift maps to a specific location in hemlock's code:

cd ~/projects/hemlock

The validators live in pkg/validate/:

Framework	File
LangChain	`pkg/validate/langchain.go`
LlamaIndex	`pkg/validate/llamaindex.go`
Unstructured	`pkg/validate/unstructured.go`

Find the prediction for the drifted combination and update it:

// Before (incorrect)
case "csshide":
    return "survive"

// After (matches real behavior)
case "csshide":
    return "stripped"

Step 6: Run hemlock's Tests¶

go test ./pkg/validate/...

hemlock's own test suite verifies the validator logic is internally consistent.

Step 7: Reinstall hemlock¶

# Build locally
go build -o hemlock ./cmd/hemlock

# Install on host (hemlock runs alongside Docker stack)
sudo install -m 755 hemlock /usr/local/bin/hemlock

Or push a new release and install via go install.

Step 8: Verify Fixes¶

# Reset ChromaDB and re-run tests
docker compose down -v && docker compose up -d
docker compose --profile test run --rm harness

The new drift report should show fewer drifts. Repeat until all drifts are resolved.

Version Tracking¶

When all drifts are resolved, the survival matrix is calibrated for the current framework versions:

Framework	Version	Matrix Status
LangChain	0.3.35	✓ Calibrated
LlamaIndex	0.12.33	✓ Calibrated
Unstructured	0.17.2	✓ Calibrated
Haystack	2.12.1	✓ Calibrated

When a framework releases a new version, update the Dockerfile or docker-compose.yml, rebuild with docker compose up -d --build, and run the cycle again to find new drifts.

Next Steps¶

Drift Report — Detailed drift report format
aipostex Integration — Cross-lab connectivity