Optimization Architecture¶

This page describes how the optimization components across both hemlock (Go) and hemlock-lab (Python) fit together to form the joint optimization pipeline.

System Overview¶

flowchart TB
    subgraph "hemlock-lab (Python)"
        BT["build_training_data.py<br/>Reports → Parquet"]
        RM["reward_model.py<br/>Train MLP"]
        RS["reward_server.py<br/>FastAPI :9090"]
        BO["bayesian_optimizer.py<br/>GP + EI search"]
        IT["injection_test.py<br/>Injection detection"]
        RT["retrieval_test.py<br/>Retrieval measurement"]
        PS["pareto_sweep.py<br/>Weight ablation"]
        VR["validation_runner.py<br/>A/B experiments"]
        SA["statistical_analysis.py<br/>Bootstrap CIs"]
        GF["generate_figures.py<br/>Publication plots"]
    end

    subgraph "hemlock (Go)"
        HC["hemlock batch<br/>Document generation"]
        OPT["Optimizers<br/>CEM / Genetic / Whitebox"]
        SI["score_injection.go<br/>Reward model HTTP client"]
    end

    subgraph "Infrastructure"
        CD["ChromaDB :8000"]
        OL["Ollama :11434"]
        FW["RAG Pipelines<br/>:8100–:8104"]
    end

    BT -->|training_data.parquet| RM
    RM -->|reward_model.pt| RS
    RS -.->|POST /predict-injection| SI
    SI --> OPT
    OPT --> HC

    BO -->|hemlock batch| HC
    HC -->|documents| CD
    CD --> FW
    FW --> IT
    FW --> RT

    PS -->|hemlock batch| HC
    VR -->|hemlock batch| HC

    IT -->|injection-results.json| SA
    RT -->|retrieval-results.json| SA
    PS -->|pareto-summary.json| SA
    VR -->|validation-summary.json| SA
    SA -->|statistics.json| GF

    OL --> HC
    OL --> FW

    style RS fill:#00695c,stroke:#00897b,color:#ffffff
    style OPT fill:#4a148c,stroke:#7c43bd,color:#ffffff
    style BO fill:#4a148c,stroke:#7c43bd,color:#ffffff

Data Flow¶

The optimization system has three main data flows:

1. Training Pipeline¶

reports/**/injection-results.json
reports/**/retrieval-results.json
         │
         ▼
  build_training_data.py → training_data.parquet
         │
         ▼
  reward_model.py (5-fold CV, class-weighted BCE)
         │
         ▼
  reward_model.pt (MLP weights + scaler)
         │
         ▼
  reward_server.py (FastAPI on :9090)

This pipeline runs once to produce a trained model, then again whenever new experiment data is available.

2. Optimization Loop¶

bayesian_optimizer.py
  │
  ├─ Select parameters (GP + EI)
  │   │
  │   ▼
  ├─ hemlock batch --genetic --injection-weight W ...
  │   │
  │   ▼
  ├─ Ingest into ChromaDB collection
  │   │
  │   ▼
  ├─ injection_test.py → reward
  │   │
  │   ▼
  └─ Update GP with (params, reward) → next iteration

Each Bayesian optimizer evaluation is a full generate-ingest-test cycle. The reward model runs inside hemlock during document generation (via the --injection-weight flag), while the Bayesian optimizer uses end-to-end injection test results as its objective.

3. Evaluation Pipeline¶

validation_runner.py / pareto_sweep.py
         │
         ▼
  injection-results.json + retrieval-results.json
         │
         ▼
  statistical_analysis.py → statistics.json
         │
         ▼
  generate_figures.py → PDF + PNG figures

Component Interaction¶

During Document Generation¶

When hemlock runs with --injection-weight > 0, the scoring function in each optimizer (CEM, Genetic, Whitebox) calls scoreInjection() for each candidate:

sequenceDiagram
    participant G as Genetic Optimizer
    participant E as Ollama Embeddings
    participant R as Reward Server

    loop Each generation
        loop Each candidate in population
            G->>E: Embed candidate text
            E-->>G: Embedding vector (768-dim)
            G->>G: similarity = cosine(embedding, query)
            G->>R: POST /predict-injection
            R-->>G: {score: 0.73}
            G->>G: fitness = (1-w_inj-w_nat)*sim + w_nat*nat + w_inj*inj
        end
        G->>G: Selection + crossover + mutation
    end

During Bayesian Optimization¶

The Bayesian optimizer treats the entire generate-ingest-test pipeline as a black-box objective function:

sequenceDiagram
    participant BO as Bayesian Optimizer
    participant H as hemlock batch
    participant C as ChromaDB
    participant IT as injection_test.py

    participant RT as retrieval_test.py

    loop 50–100 evaluations
        BO->>BO: GP surrogate → EI → next params
        BO->>H: subprocess: hemlock batch --<mapped flags>
        H->>H: Generate documents (may call reward server internally)
        BO->>C: Ingest generated documents
        BO->>IT: Run injection tests
        IT-->>BO: Per-framework injection results
        BO->>RT: Run retrieval tests
        RT-->>BO: Per-framework retrieval rank
        BO->>BO: reward = 0.3×retrieval + 0.7×injection
        BO->>C: Delete collection (cleanup)
        BO->>BO: Update GP with observation
    end

Port Map¶

Service	Port	Role
ChromaDB	8000	Vector store
LangChain	8100	RAG pipeline
LlamaIndex	8101	RAG pipeline
Unstructured	8102	RAG pipeline
Haystack	8103	RAG pipeline
ColPALI	8104	RAG pipeline
Reward Server	9090	Injection score prediction
Ollama	11434	LLM inference + embeddings

Dependencies Between Scripts¶

Script	Requires Running	Requires Files
`build_training_data.py`	—	`reports/**/injection-results.json`
`reward_model.py`	—	`training_data.parquet`
`reward_server.py`	—	`reward_model.pt`
`bayesian_optimizer.py`	Docker stack, Ollama	Config JSON
`pareto_sweep.py`	Docker stack, Ollama, reward server	Config JSON
`validation_runner.py`	Docker stack, Ollama	Config JSON, optionally `best-params.json`
`statistical_analysis.py`	—	`*-summary.json`
`generate_figures.py`	—	`statistics.json`