hemlock Integration¶
hemlock-lab's primary purpose is to validate and improve hemlock's accuracy. This page covers the complete workflow from generating test documents to fixing drifts.
Workflow¶
graph TD
A["1. hemlock batch<br/>Generate poisoned docs"] --> B["2. hemlock validate --json<br/>Export predictions"]
B --> C["3. make test<br/>Run against real pipelines"]
C --> D["4. Read drift.md<br/>Identify incorrect predictions"]
D --> E{"Drifts found?"}
E -->|Yes| F["5. Fix pkg/validate/*.go<br/>Update survival matrix"]
F --> G["6. go test ./...<br/>Verify hemlock tests pass"]
G --> H["7. Reinstall hemlock on VM"]
H --> I["8. make restore && make test<br/>Verify drifts are fixed"]
I --> D
E -->|No| J["Done! Matrix is<br/>calibrated for current versions"]
Step 1: Generate Test Documents¶
# On the VM (or locally if hemlock is installed)
hemlock batch --output-dir /tmp/hemlock-batch --all-formats --all-techniques
This generates poisoned documents for every valid format×technique combination. With 10 formats and 36 techniques, this produces hundreds of documents (not all combinations are valid).
Step 2: Export Predictions¶
This produces hemlock's prediction for every format×technique×framework combination:
{
"predictions": [
{
"format": "html",
"technique": "csshide",
"framework": "langchain",
"prediction": "survive"
},
{
"format": "html",
"technique": "csshide",
"framework": "unstructured",
"prediction": "stripped"
}
]
}
Step 3: Test Against Real Pipelines¶
The harness sends each document to each pipeline's /extract endpoint and records whether the payload was found in the extracted text.
Step 4: Read the Drift Report¶
The drift report shows every prediction that was wrong:
## Drifts (28 total)
### LangChain (8 drifts)
| Technique | Format | Predicted | Actual |
|-----------|--------|-----------|--------|
| csshide | html | survive | stripped |
Step 5: Fix hemlock's Validators¶
Each drift maps to a specific location in hemlock's code:
The validators live in pkg/validate/:
| Framework | File |
|---|---|
| LangChain | pkg/validate/langchain.go |
| LlamaIndex | pkg/validate/llamaindex.go |
| Unstructured | pkg/validate/unstructured.go |
Find the prediction for the drifted combination and update it:
// Before (incorrect)
case "csshide":
return "survive"
// After (matches real behavior)
case "csshide":
return "stripped"
Step 6: Run hemlock's Tests¶
hemlock's own test suite verifies the validator logic is internally consistent.
Step 7: Reinstall hemlock¶
# Build locally
go build -o hemlock ./cmd/hemlock
# Install on host (hemlock runs alongside Docker stack)
sudo install -m 755 hemlock /usr/local/bin/hemlock
Or push a new release and install via go install.
Step 8: Verify Fixes¶
# Reset ChromaDB and re-run tests
docker compose down -v && docker compose up -d
docker compose --profile test run --rm harness
The new drift report should show fewer drifts. Repeat until all drifts are resolved.
Version Tracking¶
When all drifts are resolved, the survival matrix is calibrated for the current framework versions:
| Framework | Version | Matrix Status |
|---|---|---|
| LangChain | 0.3.35 | ✓ Calibrated |
| LlamaIndex | 0.12.33 | ✓ Calibrated |
| Unstructured | 0.17.2 | ✓ Calibrated |
| Haystack | 2.12.1 | ✓ Calibrated |
When a framework releases a new version, update the Dockerfile or docker-compose.yml, rebuild with docker compose up -d --build, and run the cycle again to find new drifts.
Next Steps¶
- Drift Report — Detailed drift report format
- aipostex Integration — Cross-lab connectivity