LangSmith tells you something broke. TraceGuard AI fixes it - fetching your real code, generating a patch PR, and scoring the fix before you ever look at it.
Your monitoring tool tells you something broke. Everything after that - root cause, fix, test, PR, review - is still entirely manual.
LangSmith, Langfuse, and Helicone tell you a run failed. Finding which file caused it, understanding why, writing a fix, and opening a PR is still entirely your problem.
An infinite loop or hallucination bug doesn't crash anything - it silently burns credits and returns bad answers until someone notices and the manual fix cycle begins.
Reproduce the failure, find the file, write the patch, open a PR, add a regression test - for every single incident. This is toil that compounds as your agent usage grows.
A reviewer rejects your patch with notes. Those notes live in a GitHub comment. The next iteration starts from scratch - no memory of what was tried, what failed, or why.
When a failure alert arrives from LangSmith, Langfuse, or any monitoring tool, TraceGuard classifies the root cause, dispatches a LangGraph agent to fetch your actual code from GitHub, generates a targeted multi-file fix, and opens a PR - all in under 60 seconds.
If you reject a fix with notes, the patch bot reads your feedback and generates a revised PR automatically. You stay in the loop for final approval - everything else is handled.
Failure arrives via LangSmith webhook, Langfuse webhook, or generic ingest endpoint.
Groq LLM maps the trace to one of 10 failure types with severity and root cause.
LangGraph agent fetches your real code from GitHub and generates a multi-file fix in a single clean commit.
Groq-as-judge scores the fix quality before you approve. Scores are written back to LangSmith.
One click to merge. Reject with notes and the bot generates a revised fix addressing your feedback.
Every component is async and independently sessionized - a failure in any stage doesn't block the others.
Each stage runs asynchronously with its own DB session. A crash in one stage doesn't cascade to the others.
A Groq LLM (llama-3.3-70b-versatile) receives a truncated excerpt of the failing trace - inputs, outputs, error, and child_run names - and returns structured JSON: failure type, severity, title, description, root cause, and trace evidence quotes.
A LangGraph state machine runs three nodes: fetch_code extracts real file paths from stack traces and pulls them from your GitHub repo; generate_fix produces a minimal targeted patch; open_pr creates a branch, commits, and opens a PR with explanation.
Groq synthesizes a LangSmith evaluator function - Python code you can add directly to your test suite. It includes a representative test input derived from the original failing trace and an expected output that the patched agent should produce.
Groq-as-judge scores the original failing output and the patched expected output on a 0.0–1.0 quality scale. If the patched score is ≥10% better, the patch is auto-promoted. Both scores are written back to LangSmith as traceguard/quality_before and traceguard/quality_after_patch feedback.
The React dashboard shows every failure card with its classification, root cause, linked PR URL, diff, and shadow scores. Approve squash-merges the PR on GitHub and marks the failure resolved. Reject closes the PR with a comment. All state updates are pushed live over WebSocket.
On startup, TraceGuard scans for failures stuck in classified state (e.g. from a previous container crash) and re-queues them through the patch pipeline with a staggered 10-second delay between each to stay within Groq rate limits.
Every trace is mapped to one of these types. The taxonomy drives which patch strategy is applied and which evaluator is generated.
Agent or tool calling itself in a cycle with no exit condition
Agent fabricated facts, citations, or data not present in context
Wrong arguments, schema mismatch, or wrong tool chosen for task
Input exceeded the model's maximum context window limit
Agent returned empty or null output with no fallback
Response did not match the required output schema or format
Chain-of-thought broke down, leading to a wrong conclusion
Response time degraded significantly from the established baseline
External tool or API call timed out with no fallback strategy
Failure could not be mapped to any known taxonomy type
Native support for LangSmith and Langfuse webhooks. A generic /ingest endpoint accepts normalized failures from Helicone, Arize, custom scripts, or any tool that can fire an HTTP POST.
Reject a PR with reviewer notes and the patch bot reads your feedback, incorporating it into a revised fix - automatically. No restarting from scratch; the context carries forward.
Uses llama-3.3-70b-versatile on Groq's free tier. All traces are truncated to stay within the 12k TPM limit. Swap the model via GROQ_MODEL env var.
Every pipeline event - failure classified, patch generated, eval written, shadow scored - is pushed to the dashboard instantly over a persistent WebSocket connection.
Set API_KEY to protect write endpoints. /api/webhook/langsmith is intentionally open - LangSmith can't send custom headers. Frontend picks up the key from VITE_API_KEY.
SQLite for local dev, PostgreSQL for production. Railway's plugin injects DATABASE_URL automatically. Alembic manages schema migrations on every deploy.
Shadow scores are written back to LangSmith as structured feedback (traceguard/quality_before, traceguard/quality_after_patch), closing the observability loop.
Patch Bot is a proper LangGraph StateGraph - each node gets the full typed state, errors are logged individually, and the graph is compiled once and reused.
The patch bot fixes multiple files in a single clean Git commit - system prompt, tool definition, and agent runner in one PR. No piecemeal commits, no rebasing.
Pre-built image published on every version tag. No Python setup required - just docker run with your env vars and you're live in under 60 seconds.
GitHub Actions runs backend import checks and frontend TypeScript + build on every push. Docker Hub publish triggers on v* tags.
Dark-theme React 19 UI with TanStack Query and live WebSocket updates.
recursion_limit=10 to the agent invocation config to prevent unbounded tool call loops. Shadow score improved from 0.21 → 0.82.
Three paths depending on your use case.
# Fastest path - no clone required docker run -d \ -e GROQ_API_KEY=gsk_... \ -e LANGCHAIN_API_KEY=lsv2_... \ -e LANGCHAIN_PROJECT=traceguard-ai \ -e GITHUB_TOKEN=ghp_... \ -e GITHUB_REPO=your-org/your-agent-repo \ -e DATABASE_URL=postgresql://user:pass@host:5432/db \ -e API_KEY=your-secret \ -e CORS_ORIGINS=http://localhost:5173 \ -p 8000:8000 \ sauvast/traceguard-ai:latest
curl -X POST http://localhost:8000/api/webhook/simulate \ -H "Content-Type: application/json" \ -H "X-API-Key: your-secret" \ -d '{"failure_hint": "infinite_loop"}' # Expected: {"status":"simulated","failure_id":"..."} # Watch Railway logs - pipeline runs in ~30 seconds
git clone https://github.com/saurabh-oss/traceguard-ai
cd traceguard-ai/backend
python3.12 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env - set GROQ_API_KEY at minimum
uvicorn app.main:app --reload --port 8000
cd frontend
npm install
npm run dev
# → http://localhost:5173
curl -X POST http://localhost:8000/api/webhook/simulate \ -H "Content-Type: application/json" \ -d '{"failure_hint": "hallucination"}'
Railway → New Project → Deploy from GitHub → select traceguard-ai. Add a PostgreSQL plugin - Railway injects DATABASE_URL automatically.
| Variable | Value | |
|---|---|---|
GROQ_API_KEY | Your Groq key from console.groq.com | required |
LANGCHAIN_API_KEY | Your LangSmith key | required |
LANGCHAIN_PROJECT | traceguard-ai | required |
GITHUB_TOKEN | GitHub PAT with repo scope | for PRs |
GITHUB_REPO | your-org/your-agent-repo | for PRs |
API_KEY | Strong random secret | recommended |
CORS_ORIGINS | Your Vercel frontend URL | recommended |
SECRET_KEY | openssl rand -hex 32 | recommended |
Vercel → Add New Project → import traceguard-ai. Set Root Directory to frontend. Add two env vars:
VITE_API_URL=https://your-railway-url.up.railway.app VITE_API_KEY=same-value-as-API_KEY-in-railway
LangSmith → your project → Settings → Webhooks → Add Webhook:
URL: https://your-railway-url.up.railway.app/api/webhook/langsmith Trigger: Run Failed
Every failed run in your LangSmith project now flows into TraceGuard automatically.
Full source - FastAPI backend, LangGraph agents, React 19 frontend, Docker + Railway config.
Companion repo with 5 failure scenarios you can fire against your TraceGuard instance to see the full pipeline in action.
Pre-built image. Run the full backend with a single docker run - no Python environment needed.
How to set up a local dev environment, add new failure types, and submit PRs.
Free tier. No credit card required. Fast enough to run the full pipeline in under 30 seconds per failure.
Connect your existing LangSmith project to TraceGuard via webhook. Every failed run becomes a tracked failure automatically.