Documentation

What is TraceLLM?

A short overview of TraceLLM and the problems it solves.

What is TraceLLM?

Tracey Says

Welcome to TraceLLM! I'm Tracey, your guide. I'll help you get started with installation, setup, and troubleshooting throughout this documentation.

TraceLLM is an open-source, local-first observability platform for LLMs and AI agents. It captures every step of every execution — prompts, responses, tool calls, latency, token usage, and errors — so you can debug, replay, and analyze your AI workflows with surgical precision.

Trace prompts

Record every prompt sent to any LLM provider

Monitor execution

Watch agentic workflows unfold in real time

Inspect tool calls

See which tools were invoked and what they returned

Replay workflows

Step through past executions as if they were live

Debug failures

Pinpoint errors, retries, and unexpected branches

Analyze latency

Measure time spent per span and per tool call

Track token usage

Count prompt and completion tokens per trace

What problem does TraceLLM solve?

Building with LLMs means dealing with non-deterministic outputs, hidden tool call chains, and unpredictable latency. When something goes wrong — a hallucination, a broken tool call, a timeout — you need more than logs. You need a complete, replayable record of exactly what happened.

TraceLLM gives you that record. Every trace is a first-class artifact you can inspect, share, and replay. Instead of guessing why an agent took a wrong turn, you open the trace and see every decision, every API response, every millisecond.

Info

Traces are stored locally by default. No data leaves your machine unless you choose to export it.

Why traditional debugging fails for AI systems

Traditional logging and APM tools assume deterministic request-response patterns. A web request comes in, the server processes it, a response goes out. AI systems break that model entirely.

A single user request can spawn multiple LLM calls, tool executions, retries, and parallel branches. The execution is a directed graph — not a straight line. Standard metrics like p99 latency lose meaning when you need to trace an agentic loop that called three tools, retried twice, and then took a different path on the fifth attempt.

TraceLLM models execution as a directed graph of spans, events, and state transitions — the only abstraction rich enough to represent AI workflows. It stores the full context, not just a log line.

How TraceLLM works

When you instrument your code — via the @trace decorator, CLI, or SDK integration — TraceLLM captures the full execution context:

  • Prompts and model responses
  • Tool calls and their return values
  • Latency per span and per call
  • Token counts (prompt and completion)
  • Errors, retries, and exception stacks

Each trace is stored in MongoDB, streamed in real time via WebSocket, and surfaced in the dashboard for inspection and replay. The entire stack runs locally — no cloud dependency, no data leakage.

Trace shapeCopy
json
# A trace captures the full lifecycle of an LLM interaction
{
  "id": "tr_abc123",
  "prompt": "Explain transformers",
  "response": "A transformer is a neural network architecture...",
  "model": "gpt-4o",
  "latency_ms": 1240,
  "tokens": { "prompt": 12, "completion": 184 },
  "timestamp": "2026-05-31T10:30:00Z",
  "spans": [ ... ]
}

Who should use TraceLLM

  • AI engineers debugging complex agentic workflows with multiple tool calls and branching logic.
  • Teams shipping LLM features to production who need visibility into latency, cost, and failure modes.
  • Researchers analyzing model behavior across prompts, providers, and parameters.
  • Anyone tired of printf-debugging their AI stack and ready for first-class observability tooling.