Agents that optimize
themselves - automatically
Tune every hyperparameter in your agent stack - prompts, model selection, temperature, retrieval config, tool descriptions, few-shot examples - using evolutionary algorithms. GEPA (ICLR 2026) outperforms RL by 6% with 35x fewer rollouts. No fine-tuning. No weight changes.
Agent Optimization
Converged Self-optimizing agents -
not manual prompt tweaking
System prompts, few-shot examples, tool descriptions, temperature, top_p, model selection, retrieval parameters (chunk size, top_k, reranker config), routing logic, output format - every hyperparameter that shapes your agent's behavior. The optimizer explores the full configuration space, not a single text field. One run tunes the whole agent stack.
See agent config spaceGenetic-Pareto Evolutionary optimization uses natural language reflection - not scalar reward gradients - to evolve agent configurations over generations. Accepted at ICLR 2026 as an oral presentation. Outperforms GRPO by 6% on average (up to 19pp) while using 35x fewer rollouts. Multi-objective Pareto frontier balances accuracy, safety, cost, and latency simultaneously.
Read the GEPA paperRandom Search for quick baselines. Bayesian Search (Optuna-backed) for intelligent hyperparameter tuning. ProTeGi for textual gradients that identify failure patterns. Meta-Prompt for teacher-model rewriting. PromptWizard for mutation-critique-refinement pipelines. GEPA for multi-objective evolutionary search across the full config space. From 10 trials to 500 - you control the compute.
Compare optimizersEvery candidate configuration is scored by real evaluation metrics - faithfulness, toxicity, function call accuracy, instruction adherence, BLEU, ROUGE, embedding similarity, LLM-as-judge with custom rubrics, or 50+ pre-built templates. Compose multiple metrics into a single optimization objective. The optimizer improves what you measure, not what you hope for.
Explore evaluation templates Optimize the full agent
for any use case
Tune a RAG agent end-to-end
Optimize the system prompt, chunk size, top_k retrieval count, reranker threshold, and temperature together. The optimizer finds configurations where retrieval and generation work in concert - not prompt-only fixes that ignore retrieval quality.
Optimize tool-calling agents
Tool descriptions, argument schemas, routing logic, and the system prompt that orchestrates them - optimize the full tool-calling stack. Score with function call accuracy and argument correctness across your entire tool suite.
Same quality, cheaper model
Jointly optimize the prompt + temperature + few-shot selection so GPT-4o-mini matches GPT-4o quality. The optimizer searches for configurations that close the gap - not just prompt tweaks, but the full parameter set that makes the cheaper model perform.
Balance safety and helpfulness
Safety guardrails often hurt helpfulness. GEPA's multi-objective Pareto optimization finds agent configurations that maximize both simultaneously - safety constraints, temperature, response length, and prompt tone tuned together. Pareto frontier shows you exactly where the trade-offs are.
Maximize conversion and CSAT
Sales and support agents need to convert - but you can't fine-tune GPT-4o. Optimize the full agent config for conversion rate, CSAT, or deal value. Improvement ships as a config change - model selection, temperature, prompt, and few-shot examples all tuned together.
Per-language agent optimization
Multilingual agents need different configurations per language - not just different prompts, but different temperature, few-shot examples, and even model selection. Run independent optimization per locale with language-specific evaluation datasets.
From baseline agent to
optimized config in three steps
Define your agent config space
Specify which parameters to optimize - system prompt, few-shot examples, temperature, top_p, model, tool descriptions, retrieval settings, or any custom hyperparameter. Choose an optimizer (Random, Bayesian, ProTeGi, Meta-Prompt, PromptWizard, GEPA) and set your scoring metrics.
Run optimization on your dataset
The optimizer explores your config space over generations - mutating, reflecting, critiquing, and selecting the best agent configurations. GEPA converges in 100-500 evaluations, not 25,000. Multi-objective Pareto search balances competing goals automatically.
Deploy the optimized agent
The result is a better agent config - not a model checkpoint. Export as JSON/YAML, push to your deployment pipeline, or A/B test against the original in Experiments. No weight changes - works with any LLM provider, any model.
Powering teams from
prototype to production
From ambitious startups to global enterprises, teams trust Future AGI to ship AI agents confidently.