Documentation
RAG Example
Trace a retrieval-augmented generation pipeline.
Overview
This example demonstrates a RAG pipeline with tracing. A simulated document store returns relevant context, which is fed to OpenAI for grounded generation. Each stage (retrieval, context assembly, generation) is captured as a step in a single trace via @trace_tool and @trace.
Code
rag_example.pyCopy
python
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
from openai import OpenAI
from tracellm import trace, trace_tool
from tracellm.integrations.openai import wrap_openai
client = OpenAI()
client = wrap_openai(client)
KNOWLEDGE_BASE = {
"observability": (
"Observability in LLM systems provides visibility into prompts, completions, "
"token usage, latency, error rates, and cost across the entire pipeline."
),
"tracing": (
"Tracing captures the full execution graph of LLM applications including "
"tool calls, retrieval steps, intermediate reasoning, and external API calls."
),
"monitoring": (
"Production LLM monitoring tracks token throughput, error rates, latency "
"percentiles (p50/p95/p99), and cost per request across models and providers."
),
"rag": (
"Retrieval-Augmented Generation combines vector search with LLM generation "
"to produce grounded, factually accurate responses with source attribution."
),
}
@trace_tool(name="retrieve")
def retrieve(query: str) -> list[dict]:
query_lower = query.lower()
results = []
for topic, content in KNOWLEDGE_BASE.items():
if topic in query_lower or any(word in topic for word in query_lower.split()):
results.append({"topic": topic, "content": content, "score": 0.95})
if not results:
results.append({
"topic": "general",
"content": "LLM systems benefit from comprehensive observability practices.",
"score": 0.50,
})
return results
@trace_tool(name="rerank")
def rerank(documents: list[dict], query: str) -> list[dict]:
return sorted(documents, key=lambda d: d["score"], reverse=True)[:2]
@trace_tool(name="generate")
def generate(context: str, query: str) -> str:
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[
{
"role": "system",
"content": "You are a RAG system. Answer concisely using only the "
"provided context. Cite sources.",
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {query}",
},
],
max_tokens=300,
temperature=0.2,
)
return response.choices[0].message.content
@trace(project="rag-demo", environment="development")
def run_rag(query: str) -> dict:
docs = retrieve(query)
top_docs = rerank(docs, query)
context = "\n---\n".join(d["content"] for d in top_docs)
answer = generate(context, query)
return {
"query": query,
"sources": [d["topic"] for d in top_docs],
"answer": answer,
}
if __name__ == "__main__":
result = run_rag("How does tracing help in LLM observability?")
print(f"\nSources: {', '.join(result['sources'])}")
print(f"\nAnswer:\n{result['answer']}")Tip
Set
export OPENAI_API_KEY="sk-..." before running. The retrieval and reranking steps are simulated; the generation step calls OpenAI.Expected Output
Console outputCopy
text
╭── TraceLLM Trace ───────────────────────────── SUCCESS ──╮ │ │ │ Trace ID tr_e1f4a7c2 │ │ Prompt How does tracing help in LLM observability? │ │ Model unknown │ │ Project rag-demo │ │ Environment development │ │ Latency 2,145.80 ms │ │ Token Count 198 │ │ Retries 0 │ │ Steps 3 │ │ Status SUCCESS │ │ │ ╰──────────────────────────────────────────────────────────────╯ # Tool Duration Status Detail 1 retrieve 12ms OK 2 rerank 2ms OK 3 generate 2130ms OK Sources: tracing, observability Answer: Tracing helps LLM observability by capturing the full execution graph of your application, including each tool call, retrieval step, and API interaction. With TraceLLM, every step is recorded with its duration, input, output, and status, making it possible to pinpoint exactly where latency spikes or errors occur in the pipeline. This granular visibility is essential for debugging production RAG systems where failures can originate in retrieval, context assembly, or generation stages.
Dashboard Result
The RAG pipeline renders as a single trace with 3 ordered steps:
Dashboard UICopy
text
TraceLLM Dashboard > Traces
Status Trace ID Prompt Model Latency Tokens Time
─────── ─────────────── ───────────────────────────────────────────── ─────── ────────── ──────── ─────────────────────
● Success tr_e1f4a7c2 How does tracing help in LLM observability? unknown 2,146 ms 198 2026-05-31 14:26:30
> Detail view — step timeline shows the full RAG pipeline order:
┌─ Step Timeline ───────────────────────────────────────────────────────────┐
│ │
│ retrieve ──── 12ms OK │
│ rerank ─ 2ms OK │
│ generate ──────────────────────────────────────────── 2130ms OK │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
> Prompt panel shows the original user query, Response panel
shows the generated answer with source citations.
> The Analytics page attributes all traces to "rag-demo" project,
making it easy to monitor RAG pipeline health separately from
other application traces.Replay Result
Replay animates each RAG stage with input/output detail:
terminalCopy
bash
tracellm replay tr_e1f4a7c2 --speed 1.5
Replay outputCopy
text
╭────────────────── Replaying execution timeline... ──────────────────╮
│ │
│ ╭─ Replay ───────────────────────────────────────────────────────╮ │
│ │ │ │
│ │ trace_id tr_e1f4a7c2 │ │
│ │ status SUCCESS │ │
│ │ latency 2145.80 ms │ │
│ │ retries 0 │ │
│ │ steps 3 │ │
│ │ │ │
│ ╰─────────────────────────────────────────────────────────────────╯ │
│ │
│ ╭─ Step 1/3 ───────────────────────────────────────╮ │
│ │ step 1/3 │ │
│ │ tool retrieve │ │
│ │ duration 12 ms │ │
│ │ status OK │ │
│ │ input {'query': 'How does tracing help...'} │ │
│ │ output {'result': [{'topic': 'tracing', ...}] │ │
│ ╰────────────────────────────────────────────────────╯ │
│ │
│ ╭─ Step 2/3 ───────────────────────────────────────╮ │
│ │ step 2/3 │ │
│ │ tool rerank │ │
│ │ duration 2 ms │ │
│ │ status OK │ │
│ ╰────────────────────────────────────────────────────╯ │
│ │
│ ╭─ Step 3/3 ───────────────────────────────────────╮ │
│ │ step 3/3 │ │
│ │ tool generate │ │
│ │ duration 2130 ms │ │
│ │ status OK │ │
│ │ output {'content': 'Tracing helps LLM │ │
│ │ observability by capturing...'} │ │
│ ╰────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────╯
Replay complete