Documentation

MongoDB

Configure and operate the trace database.

MongoDB

Version 1 Requirement

TraceLLM currently requires MongoDB for trace storage.

Required environment variables:

  • MONGO_URL
  • DB_NAME

Future versions may support additional storage options.

Tracey Guide

TraceLLM requires a MongoDB connection before traces can be stored. Make sure your MONGO_URL and DB_NAME are set before running tracellm start.

TraceLLM uses MongoDB as its persistent store for all trace documents, project records, and API keys. The connection is managed via the Motor async driver (AsyncIOMotorClient), which integrates natively with FastAPI's async event loop. The CLI bridges sync code to Motor through a persistent event loop in db.py.

Connection Management

MongoDB connection is managed in app/database/mongodb.py. The module uses a singleton pattern with module-level globals:

Connection managerCopy
python
# Module-level globals in mongodb.py
client: Optional[AsyncIOMotorClient] = None
database: Optional[AsyncIOMotorDatabase] = None

async def connect_to_mongo(mongo_url, db_name):
    if database is not None:
        return database  # Already connected

    client = AsyncIOMotorClient(mongo_url)
    database = client[db_name]
    await client.admin.command("ping")  # Verify connectivity
    return database

def get_database() -> AsyncIOMotorDatabase:
    if database is None:
        raise RuntimeError("MongoDB is not connected yet.")
    return database

async def close_mongo_connection():
    if client is not None:
        client.close()

Warning

If the connection fails, the API still starts — traces are finalized in memory but not persisted. The decorator logs a yellow "Trace persistence skipped" warning. Previously skipped traces are not retroactively saved when MongoDB becomes available.

Collections & Schema

Three MongoDB collections store all TraceLLM data:

CollectionSchema ModelPurposeKey Indexes
tracesTraceSchemaFull trace documents with steps, metadata, statustrace_id, created_at, status, model_name, project_id, environment
projectsProjectSchemaProject records with name, description, timestampsproject_id (unique), name (unique)
api_keysApiKeySchemaAPI key records with key hash, project, environmentkey (unique), project_id, environment

Each trace document follows the TraceSchema Pydantic model, which enforces field types, defaults, and validators at both write and read boundaries:

Schema definitionsCopy
python
TraceSchema:
  trace_id: str           # UUID4, prefixed "tr_"
  prompt: str              # Input prompt or operation name
  response: Optional[str]  # LLM or system response text
  latency: float           # Total execution time in ms (>= 0)
  token_count: int         # Estimated or actual tokens (>= 0)
  model_name: Optional[str]# Model identifier (e.g. gpt-4o)
  project_id: str          # Project grouping ("default")
  project_name: Optional[str]
  api_key: Optional[str]   # Stored for audit purposes
  environment: str         # "development", "staging", "production"
  status: Literal["success", "warning", "failed"]
  steps: list[StepSchema]  # Ordered execution steps
  retry_count: int         # Number of retries
  slow_request: bool       # True if latency >= 1500ms
  failure_reason: Optional[str]
  created_at: datetime     # Execution start (UTC)
  updated_at: datetime     # Persistence time (UTC)

StepSchema:
  step_id: str             # UUID4
  tool_name: str           # e.g. "vector_retrieval"
  input: dict              # Input parameters
  output: dict             # Returned result
  duration: float          # Wall-clock time in ms (>= 0)
  success: bool            # Completed without error
  timestamp: datetime      # Execution time (UTC)

Index Strategy

Indexes are created automatically during the FastAPI startup event via the on_event("startup") handler. The creation functions are idempotent and safe to call on every restart:

MongoDB indexesCopy
python
# traces collection
traces.create_index("trace_id")    # Single trace lookup
traces.create_index("created_at")   # Time-range queries
traces.create_index("status")       # Filter by status
traces.create_index("model_name")   # Filter by model
traces.create_index("project_id")   # Multi-tenant isolation
traces.create_index("environment")  # Environment scoping

# projects collection
projects.create_index("project_id", unique=True)
projects.create_index("name", unique=True)

# api_keys collection
api_keys.create_index("key", unique=True)         # Key lookup
api_keys.create_index("project_id")               # List by project
api_keys.create_index("environment")              # Filter by env

Info

The traces collection indexes support all filter combinations used by the dashboard: status + project, model + environment, latency range + status, and time-sorted queries for the analytics time-series charts.

Trace Normalization Pipeline

Before insertion, every trace document passes through a normalization pipeline in normalize_trace_document() (in trace_service.py):

Normalization pipelineCopy
text
Input: raw trace dict from @trace/CLI
  │
  ├── 1. Parse created_at ──► _coerce_datetime()
  │      Supports datetime objects, ISO strings, or falls back to utcnow()
  │
  ├── 2. Normalize steps ──► _normalize_steps()
  │      Maps input/input_data, output/output_data keys
  │      Validates each step against StepSchema
  │      Generates step_id if missing
  │
  ├── 3. Infer retry count ──► _infer_retry_count()
  │      Counts duplicate tool_name occurrences in step list
  │      Uses explicit retry_count if provided
  │
  ├── 4. Infer status ──► _infer_status()
  │      explicit status > any failed step > failure_reason/retries > success
  │
  ├── 5. Infer failure_reason ──► _infer_failure_reason()
  │      explicit message > first failed step's output.error > tool_name
  │
  ├── 6. Set slow_request flag
  │      True if latency >= SLOW_TRACE_THRESHOLD_MS (1500ms)
  │
  └── 7. Validate ──► TraceSchema.model_dump(mode="python")
         Pydantic validation catches negative values, wrong types, etc.

Output: clean MongoDB document

Common Query Patterns

The trace service provides these query patterns used by the API and dashboard:

MongoDB query patternsCopy
javascript
# List traces with filters
db.traces.find({
    status: "failed",
    project_id: "my-app",
    environment: "production",
    latency: { $gte: 100, $lte: 5000 },
    token_count: { $gte: 50 }
}).sort({ created_at: -1 }).limit(50)

# Get single trace
db.traces.findOne({ trace_id: "tr_2kf9q3m1" })

# Analytics - all traces in date order
db.traces.find({}).sort({ created_at: 1 })

# Failures - recent failed/retry/slow traces
db.traces.find({
    $or: [
        { status: "failed" },
        { retry_count: { $gt: 0 } },
        { slow_request: true }
    ]
}).sort({ created_at: -1 }).limit(25)

Running MongoDB

Start a local MongoDB instance for development:

Start MongoDBCopy
bash
# Docker (recommended)
docker run -d --name tracellm-mongo -p 27017:27017 mongo:7

# Native
mongod --dbpath /data/db --port 27017

# Verify connection
mongosh --eval "db.runCommand({ ping: 1 })"

Tip

MongoDB Atlas works seamlessly. Set MONGO_URL to your Atlas SRV connection string. The startup.py module tests connectivity with a 3-second timeout and logs a warning if unreachable.