Skip to content

Drift Report

The drift report is the primary output of hemlock-lab. It identifies every case where hemlock's survival predictions don't match real-world framework behavior.


What is Drift?

Drift occurs when hemlock predicts one outcome but reality produces another:

hemlock predicts: csshide + langchain → survive
actual result:    csshide + langchain → stripped
                                        ^^^^^^^^ DRIFT

Each drift may indicate outdated hemlock predictions, framework version changes, or differences between the harness matching logic and hemlock's validator. Not every drift is necessarily a hemlock bug.


Report Structure

drift.md is generated by harness/drift_report.py and contains:

Summary

# Drift Report — 2026-04-02T10:30:00

## Summary

| Metric | Count |
|--------|-------|
| Total tests | 576 |
| Matches | 541 |
| Drifts | 28 |
| Errors | 7 |
| Accuracy | 93.9% |

Drifts by Framework

## LangChain (8 drifts)

| Technique | Format | Predicted | Actual |
|-----------|--------|-----------|--------|
| csshide | html | survive | stripped |
| metadata | pdf | stripped | survive |
| ... | ... | ... | ... |

## Unstructured (12 drifts)

| Technique | Format | Predicted | Actual |
|-----------|--------|-----------|--------|
| comment | html | survive | stripped |
| ... | ... | ... | ... |

Action Items

Action items suggest areas to investigate. Drifts may stem from hemlock prediction errors, framework version changes, or harness matching differences — triage before updating hemlock.

## Action Items

### pkg/validate/langchain.go

- [ ] Line 45: Review csshide+html — predicted "survive", observed "stripped"
- [ ] Line 78: Review metadata+pdf — predicted "stripped", observed "survive"

### pkg/validate/unstructured.go

- [ ] Line 23: Review comment+html — predicted "survive", observed "stripped"

Reading the Report

Match Rate

The overall accuracy percentage tells you how well hemlock's survival matrix matches reality:

Accuracy Meaning
>95% Matrix is well-calibrated for current framework versions
90-95% Some updates needed, likely from framework version changes
<90% Significant drift — may indicate a major framework release

Drift Patterns

Look for patterns in the drifts:

  • All drifts for one framework → Framework updated its parser
  • All drifts for one technique → Technique behavior changed across frameworks
  • All drifts for one format → Format handler changed

Acting on Drifts

1. Identify the Validator

Each framework has a validator in hemlock:

Framework File
LangChain pkg/validate/langchain.go
LlamaIndex pkg/validate/llamaindex.go
Unstructured pkg/validate/unstructured.go
Haystack (uses generic validator)

2. Find the Prediction

The drift report's action items point to specific lines. Open the file and find the technique×format prediction:

// pkg/validate/langchain.go
func (v *LangChainValidator) Predict(technique, format string) string {
    switch technique {
    case "csshide":
        switch format {
        case "html":
            return "survive"  // ← DRIFT: should be "stripped"
        }
    }
}

3. Update and Verify

# Fix the prediction in hemlock
cd ~/projects/hemlock
# ... edit the validator ...

# Run hemlock's own tests
go test ./pkg/validate/...

# Reinstall on the VM
make restore && make test

4. Iterate

The goal is to reach 100% accuracy for the current framework versions. Each iteration should reduce the drift count:

Run 1: 28 drifts (93.9% accuracy)
Run 2: 12 drifts (97.9% accuracy)  ← fixed 16 predictions
Run 3:  3 drifts (99.5% accuracy)  ← fixed 9 predictions
Run 4:  0 drifts (100% accuracy)   ← matrix calibrated

Drift Report as CI Signal

In a CI workflow, the drift report can gate releases:

# Fail if any drifts detected
drift_count=$(jq '.results | map(select(.status == "DRIFT")) | length' extraction-results.json)
if [[ "$drift_count" -gt 0 ]]; then
    echo "FAIL: $drift_count drifts detected"
    exit 1
fi

Future: automated drift detection

A future hemlock-lab enhancement could run make test on a schedule and open GitHub issues for new drifts.


Next Steps