Services¶

hemlock-lab runs 6 Docker containers on a hemlock-net bridge network, with Ollama running on the host. This page documents each service's configuration and management.

ChromaDB¶

The vector database backing all 5 RAG pipelines.

Property	Value
Image	`chromadb/chroma:0.6.3`
Port	8000
Volume	`chromadb-data`
Network	`hemlock-net`
Healthcheck	`curl -f http://localhost:8000/api/v2/heartbeat`

Configuration¶

ChromaDB runs as an official Docker image with persistent storage via a named volume:

# docker/docker-compose.yml (excerpt)
chromadb:
  image: chromadb/chroma:0.6.3
  ports:
    - "8000:8000"
  volumes:
    - chromadb-data:/chroma/chroma
  networks:
    - hemlock-net
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8000/api/v2/heartbeat"]
    interval: 10s
    timeout: 5s
    retries: 5

Management¶

# Check status
docker compose ps chromadb

# View logs
docker compose logs -f chromadb

# Health check
curl http://localhost:8000/api/v2/heartbeat

Collections¶

After seeding, ChromaDB contains:

Collection	Documents	Purpose
`noise-corpus`	11	Legitimate business documents

Test runs create additional temporary collections for retrieval and injection testing.

Ollama¶

LLM inference server running on the host (not in Docker), providing embeddings and text generation.

Property	Value
Port	11434
Bind address	`localhost` (default)
LLM model	`smollm2:135m`
Embedding model	`nomic-embed-text`

Containers reach Ollama via the extra_hosts mapping:

extra_hosts:
  - "host.docker.internal:host-gateway"

Each pipeline's OLLAMA_HOST is set to http://host.docker.internal:11434.

Models¶

# Pull required models
ollama pull smollm2:135m       # LLM inference
ollama pull nomic-embed-text   # Embedding model
ollama pull nomic-embed-text   # Embeddings

# List available models
curl http://localhost:11434/api/tags | jq '.models[].name'

# Test embedding generation
curl http://localhost:11434/api/embed -d '{"model":"nomic-embed-text","input":"test"}'

Pipeline Containers¶

All 5 RAG pipeline containers follow the same pattern:

Dockerfile in docker/<framework>-rag/
FastAPI application with 4 uniform endpoints
depends_on ChromaDB healthcheck
restart: unless-stopped for automatic recovery
Environment variables for Ollama host, model, and ChromaDB connection

LangChain Pipeline¶

Property	Value
Version	0.3.35
Port	8100
Dockerfile	`docker/langchain-rag/Dockerfile`

Extraction approach: Maps file extensions to loaders:

Extension	Loader
`.html`	`BSHTMLLoader`
`.pdf`	`PyPDFLoader`
`.docx`	`Docx2txtLoader`
`.md`	`UnstructuredMarkdownLoader`
`.txt`	Plain `open()` read

RAG chain: RetrievalQA with ChromaDB retriever + Ollama LLM.

LlamaIndex Pipeline¶

Property	Value
Version	0.12.33
Port	8101
Dockerfile	`docker/llamaindex-rag/Dockerfile`

Extraction approach: Writes uploaded file to a temp directory, then uses SimpleDirectoryReader which auto-detects format.

RAG chain: VectorStoreIndex with ChromaDB vector store + Ollama via Settings.

Unstructured Pipeline¶

Property	Value
Version	0.17.2
Port	8102
Dockerfile	`docker/unstructured-rag/Dockerfile`

Extraction approach: Uses partition() which returns typed Element objects. Text content is joined from all elements.

RAG chain: Manual embedding + ChromaDB search + Ollama prompt (no wrapper chain).

Haystack Pipeline¶

Property	Value
Version	2.12.1
Port	8103
Dockerfile	`docker/haystack-rag/Dockerfile`

Extraction approach: Maps file extensions to converters:

Extension	Converter
`.html`	`HTMLToDocument`
`.pdf`	`PyPDFToDocument`
`.docx`	`DocxToDocument`
`.md`	`MarkdownToDocument`
`.txt`	`TextFileToDocument`

RAG chain: Document objects + ChromaDB retriever + Ollama generator.

ColPALI Pipeline¶

Property	Value
Port	8104
Dockerfile	`docker/colpali-rag/Dockerfile`

Extraction approach: Vision-based multimodal extraction — renders document pages as images and uses ColPALI embeddings rather than text-based parsing.

RAG chain: ColPALI embeddings + ChromaDB retriever + Ollama generator.

Container Environment¶

All pipeline containers receive the same environment variables:

environment:
  CHROMADB_HOST: chromadb
  CHROMADB_PORT: "8000"
  OLLAMA_HOST: http://host.docker.internal:11434
  OLLAMA_MODEL: ${OLLAMA_MODEL:-smollm2:135m}
  OLLAMA_EMBED_MODEL: ${OLLAMA_EMBED_MODEL:-nomic-embed-text}
  SYSTEM_PROMPT_FILE: ${SYSTEM_PROMPT_FILE:-}

Managing All Pipelines¶

# Status of all containers
docker compose ps

# View logs for a specific pipeline
docker compose logs -f langchain-rag

# Restart a single pipeline
docker compose restart haystack-rag

# Restart all pipelines
docker compose restart langchain-rag llamaindex-rag unstructured-rag haystack-rag colpali-rag

Harness Container¶

The test harness runs as a separate container with the test profile:

Property	Value
Dockerfile	`docker/harness/Dockerfile`
Network mode	`host` (accesses all services via localhost)
Reports	Mounted at `../reports:/app/reports`

# Run full test suite
docker compose --profile test run --rm harness

Next Steps¶

Inventory — Configuration reference
Pipelines — Detailed framework comparison
API Reference — Endpoint documentation