Skip to content

Optimization Architecture

This page describes how the optimization components across both hemlock (Go) and hemlock-lab (Python) fit together to form the joint optimization pipeline.

System Overview

flowchart TB
    subgraph "hemlock-lab (Python)"
        BT["build_training_data.py<br/>Reports → Parquet"]
        RM["reward_model.py<br/>Train MLP"]
        RS["reward_server.py<br/>FastAPI :9090"]
        BO["bayesian_optimizer.py<br/>GP + EI search"]
        IT["injection_test.py<br/>Injection detection"]
        RT["retrieval_test.py<br/>Retrieval measurement"]
        PS["pareto_sweep.py<br/>Weight ablation"]
        VR["validation_runner.py<br/>A/B experiments"]
        SA["statistical_analysis.py<br/>Bootstrap CIs"]
        GF["generate_figures.py<br/>Publication plots"]
    end

    subgraph "hemlock (Go)"
        HC["hemlock batch<br/>Document generation"]
        OPT["Optimizers<br/>CEM / Genetic / Whitebox"]
        SI["score_injection.go<br/>Reward model HTTP client"]
    end

    subgraph "Infrastructure"
        CD["ChromaDB :8000"]
        OL["Ollama :11434"]
        FW["RAG Pipelines<br/>:8100–:8104"]
    end

    BT -->|training_data.parquet| RM
    RM -->|reward_model.pt| RS
    RS -.->|POST /predict-injection| SI
    SI --> OPT
    OPT --> HC

    BO -->|hemlock batch| HC
    HC -->|documents| CD
    CD --> FW
    FW --> IT
    FW --> RT

    PS -->|hemlock batch| HC
    VR -->|hemlock batch| HC

    IT -->|injection-results.json| SA
    RT -->|retrieval-results.json| SA
    PS -->|pareto-summary.json| SA
    VR -->|validation-summary.json| SA
    SA -->|statistics.json| GF

    OL --> HC
    OL --> FW

    style RS fill:#00695c,stroke:#00897b,color:#ffffff
    style OPT fill:#4a148c,stroke:#7c43bd,color:#ffffff
    style BO fill:#4a148c,stroke:#7c43bd,color:#ffffff

Data Flow

The optimization system has three main data flows:

1. Training Pipeline

reports/**/injection-results.json
reports/**/retrieval-results.json
  build_training_data.py → training_data.parquet
  reward_model.py (5-fold CV, class-weighted BCE)
  reward_model.pt (MLP weights + scaler)
  reward_server.py (FastAPI on :9090)

This pipeline runs once to produce a trained model, then again whenever new experiment data is available.

2. Optimization Loop

bayesian_optimizer.py
  ├─ Select parameters (GP + EI)
  │   │
  │   ▼
  ├─ hemlock batch --genetic --injection-weight W ...
  │   │
  │   ▼
  ├─ Ingest into ChromaDB collection
  │   │
  │   ▼
  ├─ injection_test.py → reward
  │   │
  │   ▼
  └─ Update GP with (params, reward) → next iteration

Each Bayesian optimizer evaluation is a full generate-ingest-test cycle. The reward model runs inside hemlock during document generation (via the --injection-weight flag), while the Bayesian optimizer uses end-to-end injection test results as its objective.

3. Evaluation Pipeline

validation_runner.py / pareto_sweep.py
  injection-results.json + retrieval-results.json
  statistical_analysis.py → statistics.json
  generate_figures.py → PDF + PNG figures

Component Interaction

During Document Generation

When hemlock runs with --injection-weight > 0, the scoring function in each optimizer (CEM, Genetic, Whitebox) calls scoreInjection() for each candidate:

sequenceDiagram
    participant G as Genetic Optimizer
    participant E as Ollama Embeddings
    participant R as Reward Server

    loop Each generation
        loop Each candidate in population
            G->>E: Embed candidate text
            E-->>G: Embedding vector (768-dim)
            G->>G: similarity = cosine(embedding, query)
            G->>R: POST /predict-injection
            R-->>G: {score: 0.73}
            G->>G: fitness = (1-w_inj-w_nat)*sim + w_nat*nat + w_inj*inj
        end
        G->>G: Selection + crossover + mutation
    end

During Bayesian Optimization

The Bayesian optimizer treats the entire generate-ingest-test pipeline as a black-box objective function:

sequenceDiagram
    participant BO as Bayesian Optimizer
    participant H as hemlock batch
    participant C as ChromaDB
    participant IT as injection_test.py

    participant RT as retrieval_test.py

    loop 50–100 evaluations
        BO->>BO: GP surrogate → EI → next params
        BO->>H: subprocess: hemlock batch --<mapped flags>
        H->>H: Generate documents (may call reward server internally)
        BO->>C: Ingest generated documents
        BO->>IT: Run injection tests
        IT-->>BO: Per-framework injection results
        BO->>RT: Run retrieval tests
        RT-->>BO: Per-framework retrieval rank
        BO->>BO: reward = 0.3×retrieval + 0.7×injection
        BO->>C: Delete collection (cleanup)
        BO->>BO: Update GP with observation
    end

Port Map

Service Port Role
ChromaDB 8000 Vector store
LangChain 8100 RAG pipeline
LlamaIndex 8101 RAG pipeline
Unstructured 8102 RAG pipeline
Haystack 8103 RAG pipeline
ColPALI 8104 RAG pipeline
Reward Server 9090 Injection score prediction
Ollama 11434 LLM inference + embeddings

Dependencies Between Scripts

Script Requires Running Requires Files
build_training_data.py reports/**/injection-results.json
reward_model.py training_data.parquet
reward_server.py reward_model.pt
bayesian_optimizer.py Docker stack, Ollama Config JSON
pareto_sweep.py Docker stack, Ollama, reward server Config JSON
validation_runner.py Docker stack, Ollama Config JSON, optionally best-params.json
statistical_analysis.py *-summary.json
generate_figures.py statistics.json

See Also