Query Endpoint¶
Runs a full RAG pipeline: embed the query, retrieve relevant documents from ChromaDB, build a prompt with the context, and generate an answer via Ollama.
Request¶
| Field | Type | Required | Description |
|---|---|---|---|
query |
string | ✓ | The question to answer |
collection |
string | ✓ | ChromaDB collection to search |
Response¶
Success (200)¶
{
"answer": "The refund policy allows returns within 30 days...",
"sources": [
"refund-policy.html",
"faq.html"
],
"retrieved_count": 5
}
| Field | Type | Description |
|---|---|---|
answer |
string | LLM-generated response |
sources |
array[string] | Document IDs of retrieved context |
retrieved_count |
integer | Number of documents retrieved (top-k) |
Error (400)¶
Error (503)¶
RAG Pipeline Flow¶
graph LR
Q["Query text"] --> E["Embed query<br/>(Ollama)"]
E --> R["Vector search<br/>(ChromaDB)"]
R --> K["Top-k docs"]
K --> P["Build prompt"]
P --> G["Generate answer<br/>(Ollama LLM)"]
G --> A["Response"]
- Embed query — Convert query text to a vector using
nomic-embed-text - Vector search — Find the closest document chunks in the specified collection
- Top-k retrieval — Return the k most relevant chunks (default k=3)
- Build prompt — Construct a prompt with the retrieved context + user query
- Generate — Send prompt to Ollama for text generation
- Return — Answer + source document IDs
Examples¶
Basic Query¶
curl -X POST http://localhost:8100/query \
-H "Content-Type: application/json" \
-d '{"query": "What is the refund policy?", "collection": "noise-corpus"}'
Query with jq Formatting¶
curl -s -X POST http://localhost:8100/query \
-H "Content-Type: application/json" \
-d '{"query": "What is the refund policy?", "collection": "noise-corpus"}' \
| jq '{answer: .answer, sources: .sources}'
Compare Across Frameworks¶
for port in 8100 8101 8102 8103; do
echo "=== Port ${port} ==="
curl -s -X POST "http://localhost:${port}/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is the refund policy?", "collection": "noise-corpus"}' \
| jq -r '.answer' | head -3
echo
done
Injection Testing via Query¶
The /query endpoint is the entry point for Layer 3 (injection) tests. When poisoned documents have been ingested:
- The vector search may retrieve the poisoned document as context
- The prompt includes the poisoned content alongside the user's query
- The LLM may follow the injected instructions
# Query after ingesting poisoned documents
curl -s -X POST http://localhost:8100/query \
-H "Content-Type: application/json" \
-d '{"query": "What is the refund policy?", "collection": "test-poisoned"}' \
| jq -r '.answer'
# If injection succeeds, the answer contains override/redirect/denial content
Response Time¶
Expected response times with smollm2:135m:
| Component | Time |
|---|---|
| Query embedding | ~50ms |
| ChromaDB search | ~20ms |
| LLM generation | ~500ms-2s |
| Total | ~1-3 seconds |
Larger models will increase the LLM generation time significantly.
Next Steps¶
- Health — Health check endpoint
- Injection Tests — Automated testing of query injection