Skip to content

Query Endpoint

Runs a full RAG pipeline: embed the query, retrieve relevant documents from ChromaDB, build a prompt with the context, and generate an answer via Ollama.


Request

POST /query
Content-Type: application/json
{
  "query": "What is the refund policy?",
  "collection": "test-collection"
}
Field Type Required Description
query string The question to answer
collection string ChromaDB collection to search

Response

Success (200)

{
  "answer": "The refund policy allows returns within 30 days...",
  "sources": [
    "refund-policy.html",
    "faq.html"
  ],
  "retrieved_count": 5
}
Field Type Description
answer string LLM-generated response
sources array[string] Document IDs of retrieved context
retrieved_count integer Number of documents retrieved (top-k)

Error (400)

{
  "error": "Missing required field: query"
}

Error (503)

{
  "error": "Ollama unavailable",
  "detail": "Connection refused on port 11434"
}

RAG Pipeline Flow

graph LR
    Q["Query text"] --> E["Embed query<br/>(Ollama)"]
    E --> R["Vector search<br/>(ChromaDB)"]
    R --> K["Top-k docs"]
    K --> P["Build prompt"]
    P --> G["Generate answer<br/>(Ollama LLM)"]
    G --> A["Response"]
  1. Embed query — Convert query text to a vector using nomic-embed-text
  2. Vector search — Find the closest document chunks in the specified collection
  3. Top-k retrieval — Return the k most relevant chunks (default k=3)
  4. Build prompt — Construct a prompt with the retrieved context + user query
  5. Generate — Send prompt to Ollama for text generation
  6. Return — Answer + source document IDs

Examples

Basic Query

curl -X POST http://localhost:8100/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the refund policy?", "collection": "noise-corpus"}'

Query with jq Formatting

curl -s -X POST http://localhost:8100/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the refund policy?", "collection": "noise-corpus"}' \
| jq '{answer: .answer, sources: .sources}'

Compare Across Frameworks

for port in 8100 8101 8102 8103; do
  echo "=== Port ${port} ==="
  curl -s -X POST "http://localhost:${port}/query" \
    -H "Content-Type: application/json" \
    -d '{"query": "What is the refund policy?", "collection": "noise-corpus"}' \
  | jq -r '.answer' | head -3
  echo
done

Injection Testing via Query

The /query endpoint is the entry point for Layer 3 (injection) tests. When poisoned documents have been ingested:

  1. The vector search may retrieve the poisoned document as context
  2. The prompt includes the poisoned content alongside the user's query
  3. The LLM may follow the injected instructions
# Query after ingesting poisoned documents
curl -s -X POST http://localhost:8100/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the refund policy?", "collection": "test-poisoned"}' \
| jq -r '.answer'
# If injection succeeds, the answer contains override/redirect/denial content

Response Time

Expected response times with smollm2:135m:

Component Time
Query embedding ~50ms
ChromaDB search ~20ms
LLM generation ~500ms-2s
Total ~1-3 seconds

Larger models will increase the LLM generation time significantly.


Next Steps