Drift Report¶
The drift report is the primary output of hemlock-lab. It identifies every case where hemlock's survival predictions don't match real-world framework behavior.
What is Drift?¶
Drift occurs when hemlock predicts one outcome but reality produces another:
hemlock predicts: csshide + langchain → survive
actual result: csshide + langchain → stripped
^^^^^^^^ DRIFT
Each drift may indicate outdated hemlock predictions, framework version changes, or differences between the harness matching logic and hemlock's validator. Not every drift is necessarily a hemlock bug.
Report Structure¶
drift.md is generated by harness/drift_report.py and contains:
Summary¶
# Drift Report — 2026-04-02T10:30:00
## Summary
| Metric | Count |
|--------|-------|
| Total tests | 576 |
| Matches | 541 |
| Drifts | 28 |
| Errors | 7 |
| Accuracy | 93.9% |
Drifts by Framework¶
## LangChain (8 drifts)
| Technique | Format | Predicted | Actual |
|-----------|--------|-----------|--------|
| csshide | html | survive | stripped |
| metadata | pdf | stripped | survive |
| ... | ... | ... | ... |
## Unstructured (12 drifts)
| Technique | Format | Predicted | Actual |
|-----------|--------|-----------|--------|
| comment | html | survive | stripped |
| ... | ... | ... | ... |
Action Items¶
Action items suggest areas to investigate. Drifts may stem from hemlock prediction errors, framework version changes, or harness matching differences — triage before updating hemlock.
## Action Items
### pkg/validate/langchain.go
- [ ] Line 45: Review csshide+html — predicted "survive", observed "stripped"
- [ ] Line 78: Review metadata+pdf — predicted "stripped", observed "survive"
### pkg/validate/unstructured.go
- [ ] Line 23: Review comment+html — predicted "survive", observed "stripped"
Reading the Report¶
Match Rate¶
The overall accuracy percentage tells you how well hemlock's survival matrix matches reality:
| Accuracy | Meaning |
|---|---|
| >95% | Matrix is well-calibrated for current framework versions |
| 90-95% | Some updates needed, likely from framework version changes |
| <90% | Significant drift — may indicate a major framework release |
Drift Patterns¶
Look for patterns in the drifts:
- All drifts for one framework → Framework updated its parser
- All drifts for one technique → Technique behavior changed across frameworks
- All drifts for one format → Format handler changed
Acting on Drifts¶
1. Identify the Validator¶
Each framework has a validator in hemlock:
| Framework | File |
|---|---|
| LangChain | pkg/validate/langchain.go |
| LlamaIndex | pkg/validate/llamaindex.go |
| Unstructured | pkg/validate/unstructured.go |
| Haystack | (uses generic validator) |
2. Find the Prediction¶
The drift report's action items point to specific lines. Open the file and find the technique×format prediction:
// pkg/validate/langchain.go
func (v *LangChainValidator) Predict(technique, format string) string {
switch technique {
case "csshide":
switch format {
case "html":
return "survive" // ← DRIFT: should be "stripped"
}
}
}
3. Update and Verify¶
# Fix the prediction in hemlock
cd ~/projects/hemlock
# ... edit the validator ...
# Run hemlock's own tests
go test ./pkg/validate/...
# Reinstall on the VM
make restore && make test
4. Iterate¶
The goal is to reach 100% accuracy for the current framework versions. Each iteration should reduce the drift count:
Run 1: 28 drifts (93.9% accuracy)
Run 2: 12 drifts (97.9% accuracy) ← fixed 16 predictions
Run 3: 3 drifts (99.5% accuracy) ← fixed 9 predictions
Run 4: 0 drifts (100% accuracy) ← matrix calibrated
Drift Report as CI Signal¶
In a CI workflow, the drift report can gate releases:
# Fail if any drifts detected
drift_count=$(jq '.results | map(select(.status == "DRIFT")) | length' extraction-results.json)
if [[ "$drift_count" -gt 0 ]]; then
echo "FAIL: $drift_count drifts detected"
exit 1
fi
Future: automated drift detection
A future hemlock-lab enhancement could run make test on a schedule and open GitHub issues for new drifts.
Next Steps¶
- Extraction Tests — Understanding Layer 1 results
- Quick Start — First test run walkthrough
- Integration with hemlock — The full feedback loop