Query Endpoint¶

Runs a full RAG pipeline: embed the query, retrieve relevant documents from ChromaDB, build a prompt with the context, and generate an answer via Ollama.

Request¶

POST /query
Content-Type: application/json

{
  "query": "What is the refund policy?",
  "collection": "test-collection"
}

Field	Type	Required	Description
`query`	string	✓	The question to answer
`collection`	string	✓	ChromaDB collection to search

Response¶

Success (200)¶

{
  "answer": "The refund policy allows returns within 30 days...",
  "sources": [
    "refund-policy.html",
    "faq.html"
  ],
  "retrieved_count": 5
}

Field	Type	Description
`answer`	string	LLM-generated response
`sources`	array[string]	Document IDs of retrieved context
`retrieved_count`	integer	Number of documents retrieved (top-k)

Error (400)¶

{
  "error": "Missing required field: query"
}

Error (503)¶

{
  "error": "Ollama unavailable",
  "detail": "Connection refused on port 11434"
}

RAG Pipeline Flow¶

graph LR
    Q["Query text"] --> E["Embed query<br/>(Ollama)"]
    E --> R["Vector search<br/>(ChromaDB)"]
    R --> K["Top-k docs"]
    K --> P["Build prompt"]
    P --> G["Generate answer<br/>(Ollama LLM)"]
    G --> A["Response"]

Embed query — Convert query text to a vector using nomic-embed-text
Vector search — Find the closest document chunks in the specified collection
Top-k retrieval — Return the k most relevant chunks (default k=3)
Build prompt — Construct a prompt with the retrieved context + user query
Generate — Send prompt to Ollama for text generation
Return — Answer + source document IDs

Examples¶

Basic Query¶

curl -X POST http://localhost:8100/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the refund policy?", "collection": "noise-corpus"}'

Query with jq Formatting¶

curl -s -X POST http://localhost:8100/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the refund policy?", "collection": "noise-corpus"}' \
| jq '{answer: .answer, sources: .sources}'

Compare Across Frameworks¶

for port in 8100 8101 8102 8103; do
  echo "=== Port ${port} ==="
  curl -s -X POST "http://localhost:${port}/query" \
    -H "Content-Type: application/json" \
    -d '{"query": "What is the refund policy?", "collection": "noise-corpus"}' \
  | jq -r '.answer' | head -3
  echo
done

Injection Testing via Query¶

The /query endpoint is the entry point for Layer 3 (injection) tests. When poisoned documents have been ingested:

The vector search may retrieve the poisoned document as context
The prompt includes the poisoned content alongside the user's query
The LLM may follow the injected instructions

# Query after ingesting poisoned documents
curl -s -X POST http://localhost:8100/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the refund policy?", "collection": "test-poisoned"}' \
| jq -r '.answer'
# If injection succeeds, the answer contains override/redirect/denial content

Response Time¶

Expected response times with smollm2:135m:

Component	Time
Query embedding	~50ms
ChromaDB search	~20ms
LLM generation	~500ms-2s
Total	~1-3 seconds

Larger models will increase the LLM generation time significantly.

Next Steps¶

Health — Health check endpoint
Injection Tests — Automated testing of query injection