Production-Ready LLM Optimization

Closed-loop system for building and automatically optimizing LangChain RAG applications. Prompt optimization with GEPA. Workflow optimization with MEGA.

Get Started Learn More

Three-Layer Architecture

Build

LangChain

Retrieval + Generation

→

Measure

MLflow

Tracing + Metrics

→

Optimize

GEPA + MEGA

Prompts + Workflows

Repeat → Continuous Improvement

Core Features

GEPA Optimization

Evolutionary prompt optimization using LLM reflection. Generates prompt variants, tests against your evaluation set, and keeps the Pareto frontier of best candidates.

MEGA Workflow

Optimize workflow structure and routing decisions. Tests different tool selection strategies, retrieval intensity, and agent decision logic.

MLflow Integration

Automatic tracing of all executions. Track metrics, versions, and parameters. Full observability into what your system is doing.

Shared Eval Harness

Single evaluation set drives both GEPA and MEGA optimization. Ensure consistency across all optimization layers.

Groq Integration

Fast, cost-effective inference with Groq. Perfect for iterative optimization loops. Easy to swap with OpenAI, Anthropic, or any LLM.

Production Ready

Clean architecture, comprehensive documentation, and battle-tested patterns. Deploy to production with confidence.

Two Optimization Layers

🎯 GEPA: What Your Agent Says

Optimize system prompts
Improve answer templates
Better tone & formatting
Reduce hallucinations
~35× more efficient than RL

🔄 MEGA: How Your Agent Works

Optimize routing decisions
Tool selection & ordering
Retrieval strategy tuning
Block-level scoring
Workflow structure evolution

See It In Action

Terminal: Baseline RAG App

Baseline Application

Run your LangChain RAG app. Ask questions, observe behavior. This is what we'll optimize.

Terminal: GEPA Optimization

GEPA Prompt Optimization

Watch GEPA generate variants, test them, and converge on the best prompt. 14% improvement.

Terminal: MEGA Workflow Opt

MEGA Workflow Optimization

MEGA tests workflow variants and finds optimal routing. 25% improvement through better structure.

Metrics: Before → After

Results Comparison

Side-by-side metrics: Accuracy +17%, Groundedness +13%, Hallucinations -83%.

MLflow: Traces & Experiments

MLflow Dashboard

Full observability into traces, metrics, and prompt versions. Track everything.

Architecture: 3-Layer System

Production Architecture

Clean separation: Build, Measure, Optimize. Scales from prototypes to production.

Perfect For

Support Chatbots

Answer questions from knowledge bases. Improve accuracy and reduce hallucinations systematically.

Internal Assistants

HR, IT, compliance bots. Help employees find information quickly and accurately.

Developer Copilots

Code documentation assistants. Provide better suggestions through continuous optimization.

Tool-Using Agents

Complex workflows with retrieval. Optimize both what agents say and how they route.

Quick Start

Get Up and Running in 5 Minutes

# Clone the repo
git clone https://github.com/saurabh-oss/gepa-langchain-lab
cd gepa-langchain-lab

# Setup
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env

# Add your GROQ_API_KEY to .env

# Start MLflow UI
mlflow ui --host 0.0.0.0 --port 5000

# Run the optimization pipeline
python src/app.py          # Baseline
python src/eval.py         # Evaluate
python src/optimize.py     # GEPA prompt optimization
python src/optimize_mega.py  # MEGA workflow optimization
            

That's it! Your system is now optimized. Both prompts and workflows have been automatically improved based on your evaluation set.

View Full Documentation