Documentation

Groq Example

Trace Groq API calls via OpenAI-compatible client.

Overview

Groq exposes an OpenAI-compatible API, so the TraceLLM OpenAI integration works directly. The only change is setting base_url and using a Groq API key. This example runs llama-3.3-70b-versatile on Groq hardware.

Code

groq_example.pyCopy
python
import os
import sys

sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))

from openai import OpenAI
from tracellm import trace
from tracellm.integrations.openai import wrap_openai


client = OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key=os.environ["GROQ_API_KEY"],
)
client = wrap_openai(client)


@trace(
    prompt="groq_inference",
    model_name="llama-3.3-70b-versatile",
    project="multi-provider",
    environment="development",
)
def run_groq(prompt: str) -> str:
    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3,
        max_tokens=1024,
    )
    return response.choices[0].message.content


if __name__ == "__main__":
    result = run_groq(
        "Explain how Groq's LPU inference architecture achieves "
        "low latency compared to traditional GPU-based inference."
    )
    print(f"\nResponse ({len(result)} chars):\n{result}")

Warning

Set export GROQ_API_KEY="gsk_..." before running. Keys are available at console.groq.com.

Expected Output

Console outputCopy
text
  ╭── TraceLLM Trace ───────────────────────────── SUCCESS ──╮
  │                                                              │
  │  Trace ID     tr_d7c4b2e9                                    │
  │  Prompt       groq_inference                                 │
  │  Model        llama-3.3-70b-versatile                        │
  │  Project      multi-provider                                 │
  │  Environment  development                                    │
  │  Latency      542.18 ms                                      │
  │  Token Count  267                                            │
  │  Retries      0                                              │
  │  Steps        1                                              │
  │  Status        SUCCESS                                       │
  │                                                              │
  ╰──────────────────────────────────────────────────────────────╯

  #  Tool              Duration  Status  Detail
  1  openai_chat          542ms     OK

Response (891 chars):
Groq's LPU (Language Processing Unit) achieves low latency by using
a deterministic, sequential processor architecture specifically
designed for LLM inference workloads. Unlike GPUs, which rely on
massive parallel SIMT execution and face memory bandwidth bottlenecks
from HBM, the LPU eliminates the need for external memory lookups
during autoregressive decoding. Its near-calculator compute model
enables tokens to be processed in a single pass through the silicon,
reducing per-token latency by 10-50x compared to GPU-based inference
for models like Llama. This makes Groq ideal for real-time
applications where response time is critical.

Dashboard Result

Open http://localhost:3000/traces to see the Groq trace:

Dashboard UICopy
text
TraceLLM Dashboard  >  Traces

  Status  Trace ID        Prompt                    Model                       Latency    Tokens    Time
  ─────── ─────────────── ───────────────────────── ────────────────────────── ────────── ──────── ─────────────────────
  ● Success  tr_d7c4b2e9  groq_inference            llama-3.3-70b-versatile    542 ms     267      2026-05-31 14:23:45

  > Detail view summary bar:

  Model: llama-3.3-70b-versatile  |  Latency: 542 ms  |  Tokens: 267
  Retries: 0  |  Steps: 1  |  At: 2026-05-31 14:23:45

  > The Analytics page (/analytics) groups this trace under the
    "multi-provider" project, showing it alongside OpenAI traces for
    cross-provider latency and cost comparisons.

Replay Result

Replay the trace to see the step execution timeline:

terminalCopy
bash
tracellm replay tr_d7c4b2e9 --speed 2.0
Replay outputCopy
text
╭────────────────── Replaying execution timeline... ──────────────────╮
│                                                                      │
│  ╭─ Replay ───────────────────────────────────────────────────────╮ │
│  │                                                                 │ │
│  │  trace_id  tr_d7c4b2e9                                         │ │
│  │  status    SUCCESS                                              │ │
│  │  latency   542.18 ms                                            │ │
│  │  retries   0                                                    │ │
│  │  steps     1                                                    │ │
│  │                                                                 │ │
│  ╰─────────────────────────────────────────────────────────────────╯ │
│                                                                      │
│  ╭─ Step 1/1 ───────────────────────────────────────╮               │
│  │                                                    │               │
│  │  step     1/1                                      │               │
│  │  tool     openai_chat                              │               │
│  │  duration 542 ms                                   │               │
│  │  status   OK                                       │               │
│  │  input    {'model': 'llama-3.3-70b-versatile', ...}│               │
│  │  output   {'content': "Groq's LPU (Language...",   │               │
│  │            'usage': {'total_tokens': 267}}         │               │
│  │                                                    │               │
│  ╰────────────────────────────────────────────────────╯               │
╰──────────────────────────────────────────────────────────────────────╯

Replay complete