Documentation

RAG Example

Trace a retrieval-augmented generation pipeline.

Overview

This example demonstrates a RAG pipeline with tracing. A simulated document store returns relevant context, which is fed to OpenAI for grounded generation. Each stage (retrieval, context assembly, generation) is captured as a step in a single trace via @trace_tool and @trace.

Code

rag_example.pyCopy

python

import os
import sys

sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))

from openai import OpenAI
from tracellm import trace, trace_tool
from tracellm.integrations.openai import wrap_openai


client = OpenAI()
client = wrap_openai(client)


KNOWLEDGE_BASE = {
    "observability": (
        "Observability in LLM systems provides visibility into prompts, completions, "
        "token usage, latency, error rates, and cost across the entire pipeline."
    ),
    "tracing": (
        "Tracing captures the full execution graph of LLM applications including "
        "tool calls, retrieval steps, intermediate reasoning, and external API calls."
    ),
    "monitoring": (
        "Production LLM monitoring tracks token throughput, error rates, latency "
        "percentiles (p50/p95/p99), and cost per request across models and providers."
    ),
    "rag": (
        "Retrieval-Augmented Generation combines vector search with LLM generation "
        "to produce grounded, factually accurate responses with source attribution."
    ),
}


@trace_tool(name="retrieve")
def retrieve(query: str) -> list[dict]:
    query_lower = query.lower()
    results = []
    for topic, content in KNOWLEDGE_BASE.items():
        if topic in query_lower or any(word in topic for word in query_lower.split()):
            results.append({"topic": topic, "content": content, "score": 0.95})
    if not results:
        results.append({
            "topic": "general",
            "content": "LLM systems benefit from comprehensive observability practices.",
            "score": 0.50,
        })
    return results


@trace_tool(name="rerank")
def rerank(documents: list[dict], query: str) -> list[dict]:
    return sorted(documents, key=lambda d: d["score"], reverse=True)[:2]


@trace_tool(name="generate")
def generate(context: str, query: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[
            {
                "role": "system",
                "content": "You are a RAG system. Answer concisely using only the "
                           "provided context. Cite sources.",
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {query}",
            },
        ],
        max_tokens=300,
        temperature=0.2,
    )
    return response.choices[0].message.content


@trace(project="rag-demo", environment="development")
def run_rag(query: str) -> dict:
    docs = retrieve(query)
    top_docs = rerank(docs, query)
    context = "\n---\n".join(d["content"] for d in top_docs)
    answer = generate(context, query)
    return {
        "query": query,
        "sources": [d["topic"] for d in top_docs],
        "answer": answer,
    }


if __name__ == "__main__":
    result = run_rag("How does tracing help in LLM observability?")
    print(f"\nSources: {', '.join(result['sources'])}")
    print(f"\nAnswer:\n{result['answer']}")

Tip

Set export OPENAI_API_KEY="sk-..." before running. The retrieval and reranking steps are simulated; the generation step calls OpenAI.

Expected Output

Console outputCopy

text

  ╭── TraceLLM Trace ───────────────────────────── SUCCESS ──╮
  │                                                              │
  │  Trace ID     tr_e1f4a7c2                                    │
  │  Prompt       How does tracing help in LLM observability?    │
  │  Model        unknown                                        │
  │  Project      rag-demo                                       │
  │  Environment  development                                    │
  │  Latency      2,145.80 ms                                    │
  │  Token Count  198                                            │
  │  Retries      0                                              │
  │  Steps        3                                              │
  │  Status        SUCCESS                                       │
  │                                                              │
  ╰──────────────────────────────────────────────────────────────╯

  #  Tool              Duration  Status  Detail
  1  retrieve              12ms     OK
  2  rerank                 2ms     OK
  3  generate           2130ms     OK

Sources: tracing, observability

Answer:
Tracing helps LLM observability by capturing the full execution
graph of your application, including each tool call, retrieval step,
and API interaction. With TraceLLM, every step is recorded with its
duration, input, output, and status, making it possible to pinpoint
exactly where latency spikes or errors occur in the pipeline.
This granular visibility is essential for debugging production
RAG systems where failures can originate in retrieval, context
assembly, or generation stages.

Dashboard Result

The RAG pipeline renders as a single trace with 3 ordered steps:

Dashboard UICopy

text

TraceLLM Dashboard  >  Traces

  Status  Trace ID        Prompt                                        Model    Latency    Tokens    Time
  ─────── ─────────────── ───────────────────────────────────────────── ─────── ────────── ──────── ─────────────────────
  ● Success  tr_e1f4a7c2  How does tracing help in LLM observability?  unknown  2,146 ms   198      2026-05-31 14:26:30

  > Detail view — step timeline shows the full RAG pipeline order:

  ┌─ Step Timeline ───────────────────────────────────────────────────────────┐
  │                                                                             │
  │    retrieve   ──── 12ms  OK                                                 │
  │    rerank     ─ 2ms  OK                                                     │
  │    generate   ──────────────────────────────────────────── 2130ms  OK       │
  │                                                                             │
  └─────────────────────────────────────────────────────────────────────────────┘

  > Prompt panel shows the original user query, Response panel
    shows the generated answer with source citations.

  > The Analytics page attributes all traces to "rag-demo" project,
    making it easy to monitor RAG pipeline health separately from
    other application traces.

Replay Result

Replay animates each RAG stage with input/output detail:

terminalCopy

bash

tracellm replay tr_e1f4a7c2 --speed 1.5

Replay outputCopy

text

╭────────────────── Replaying execution timeline... ──────────────────╮
│                                                                      │
│  ╭─ Replay ───────────────────────────────────────────────────────╮ │
│  │                                                                 │ │
│  │  trace_id  tr_e1f4a7c2                                         │ │
│  │  status    SUCCESS                                              │ │
│  │  latency   2145.80 ms                                           │ │
│  │  retries   0                                                    │ │
│  │  steps     3                                                    │ │
│  │                                                                 │ │
│  ╰─────────────────────────────────────────────────────────────────╯ │
│                                                                      │
│  ╭─ Step 1/3 ───────────────────────────────────────╮               │
│  │   step     1/3                                    │               │
│  │   tool     retrieve                               │               │
│  │   duration 12 ms                                  │               │
│  │   status   OK                                     │               │
│  │   input    {'query': 'How does tracing help...'}   │               │
│  │   output   {'result': [{'topic': 'tracing', ...}]  │               │
│  ╰────────────────────────────────────────────────────╯               │
│                                                                      │
│  ╭─ Step 2/3 ───────────────────────────────────────╮               │
│  │   step     2/3                                    │               │
│  │   tool     rerank                                 │               │
│  │   duration 2 ms                                   │               │
│  │   status   OK                                     │               │
│  ╰────────────────────────────────────────────────────╯               │
│                                                                      │
│  ╭─ Step 3/3 ───────────────────────────────────────╮               │
│  │   step     3/3                                    │               │
│  │   tool     generate                               │               │
│  │   duration 2130 ms                                │               │
│  │   status   OK                                     │               │
│  │   output   {'content': 'Tracing helps LLM         │               │
│  │             observability by capturing...'}       │               │
│  ╰────────────────────────────────────────────────────╯               │
╰──────────────────────────────────────────────────────────────────────╯

Replay complete