Optimize

Agents that optimize
themselves - automatically

Tune every hyperparameter in your agent stack - prompts, model selection, temperature, retrieval config, tool descriptions, few-shot examples - using evolutionary algorithms. GEPA (ICLR 2026) outperforms RL by 6% with 35x fewer rollouts. No fine-tuning. No weight changes.

Start for Free Book a Demo

Full Agent Config Not just prompts

Prompts, model, temp, retrieval, tools

Parameters 12 tuned

GEPA ICLR 2026 Oral

Evolutionary search via reflection

vs GRPO +6% · 35x fewer

Any LLM No weight changes

Config-level optimization - portable

Models GPT · Claude · Gemini

Agent Optimization

Converged

billing-support-agent | GEPA · 8 generations

Overall Score

0.91 from 0.72 +26%

Evaluations

342 rollouts

Params Tuned

12 hyperparams

Parameter Before After Impact

System Prompt

Instruction · Persona

v1 - 847 tokens

v8 - 623 tokens

+19%

Temperature

Sampling · Creativity

0.7

0.42

+8%

Few-shot Examples

Selection · Ordering

3 static

5 optimized

+11%

Retrieval top_k

RAG · Context Window

+5%

GEPA reflection - Generation 6 → 7

Reflection: Context noise Pareto-dominant

"top_k=10 retrieves 4 irrelevant chunks that dilute answer quality. Reducing to 6 with reranker threshold 0.7 removes noise without losing relevant context."

Mutation: top_k 10→6, reranker_threshold 0.5→0.7 - faithfulness +0.05, relevancy +0.03

Reflection: Temperature interaction +0.08 composite

"Lower temperature (0.42) works synergistically with tighter retrieval - fewer retrieved chunks + less sampling randomness = more grounded answers."

Tool Descriptions

Function Calling · Args

4 tools · manual

4 tools · refined

+7%

Model

Provider · Cost

gpt-4o

gpt-4o-mini

−$$$

12 params optimized | 342 rollouts · 8 gen | Pareto: 3 solutions

Core Features

Self-optimizing agents -
not manual prompt tweaking

GEPA Engine

evolving

Algorithm Selection

6 strategies

LLM Integration

universal

Evaluation Engine

scoring

01 Optimizes the full agent - not just the prompt

System prompts, few-shot examples, tool descriptions, temperature, top_p, model selection, retrieval parameters (chunk size, top_k, reranker config), routing logic, output format - every hyperparameter that shapes your agent's behavior. The optimizer explores the full configuration space, not a single text field. One run tunes the whole agent stack.

See agent config space

02 GEPA - evolutionary search that outperforms RL

Genetic-Pareto Evolutionary optimization uses natural language reflection - not scalar reward gradients - to evolve agent configurations over generations. Accepted at ICLR 2026 as an oral presentation. Outperforms GRPO by 6% on average (up to 19pp) while using 35x fewer rollouts. Multi-objective Pareto frontier balances accuracy, safety, cost, and latency simultaneously.

Read the GEPA paper

03 6 algorithms - pick the right one for your budget

Random Search for quick baselines. Bayesian Search (Optuna-backed) for intelligent hyperparameter tuning. ProTeGi for textual gradients that identify failure patterns. Meta-Prompt for teacher-model rewriting. PromptWizard for mutation-critique-refinement pipelines. GEPA for multi-objective evolutionary search across the full config space. From 10 trials to 500 - you control the compute.

Compare optimizers

04 Evaluation-driven - scored by 50+ metrics, not vibes

Every candidate configuration is scored by real evaluation metrics - faithfulness, toxicity, function call accuracy, instruction adherence, BLEU, ROUGE, embedding similarity, LLM-as-judge with custom rubrics, or 50+ pre-built templates. Compose multiple metrics into a single optimization objective. The optimizer improves what you measure, not what you hope for.

Explore evaluation templates

Use Cases

Optimize the full agent
for any use case

Tune a RAG agent end-to-end

Optimize the system prompt, chunk size, top_k retrieval count, reranker threshold, and temperature together. The optimizer finds configurations where retrieval and generation work in concert - not prompt-only fixes that ignore retrieval quality.

GEPA Faithfulness RAG Config

Optimize tool-calling agents

Tool descriptions, argument schemas, routing logic, and the system prompt that orchestrates them - optimize the full tool-calling stack. Score with function call accuracy and argument correctness across your entire tool suite.

ProTeGi Function Call Accuracy

Same quality, cheaper model

Jointly optimize the prompt + temperature + few-shot selection so GPT-4o-mini matches GPT-4o quality. The optimizer searches for configurations that close the gap - not just prompt tweaks, but the full parameter set that makes the cheaper model perform.

Bayesian Search Cost + Quality

Balance safety and helpfulness

Safety guardrails often hurt helpfulness. GEPA's multi-objective Pareto optimization finds agent configurations that maximize both simultaneously - safety constraints, temperature, response length, and prompt tone tuned together. Pareto frontier shows you exactly where the trade-offs are.

GEPA Pareto Frontier

Maximize conversion and CSAT

Sales and support agents need to convert - but you can't fine-tune GPT-4o. Optimize the full agent config for conversion rate, CSAT, or deal value. Improvement ships as a config change - model selection, temperature, prompt, and few-shot examples all tuned together.

Meta-Prompt Custom Rewards

Per-language agent optimization

Multilingual agents need different configurations per language - not just different prompts, but different temperature, few-shot examples, and even model selection. Run independent optimization per locale with language-specific evaluation datasets.

PromptWizard Datasets

How It Works

From baseline agent to
optimized config in three steps

Define your agent config space

Specify which parameters to optimize - system prompt, few-shot examples, temperature, top_p, model, tool descriptions, retrieval settings, or any custom hyperparameter. Choose an optimizer (Random, Bayesian, ProTeGi, Meta-Prompt, PromptWizard, GEPA) and set your scoring metrics.

Configure Optimizer

Run optimization on your dataset

The optimizer explores your config space over generations - mutating, reflecting, critiquing, and selecting the best agent configurations. GEPA converges in 100-500 evaluations, not 25,000. Multi-objective Pareto search balances competing goals automatically.

Optimization Running

Generation 8/8 - Converged

Deploy the optimized agent

The result is a better agent config - not a model checkpoint. Export as JSON/YAML, push to your deployment pipeline, or A/B test against the original in Experiments. No weight changes - works with any LLM provider, any model.

Deploy Optimized Prompt

Powering teams from
prototype to production

From ambitious startups to global enterprises, teams trust Future AGI to ship AI agents confidently.

Agents that optimizethemselves - automatically