Company News

Future AGI April Roundup

Last Updated

Apr 30, 2025

Rishav Hada

Time to read

1 min read

Explore Future AGI

At Future AGI, we are committed to building the next generation of evaluation-first AI systems. April was super significant for us, packed with exciting new features, vibrant community events, and serious engineering wins. Let’s dive into everything we shipped, celebrated, and discovered this month.

✅ Product Updates

Launched Compare Data - A New Standard for LLM Comparison

Comparing model outputs across different experiments has always been a tedious and manual task for AI engineers. Without standardized tools, teams are forced to rely on spreadsheets, screenshots, and subjective assessments to determine which model or prompt performed better. This approach not only slows down iteration cycles but also introduces inconsistencies and biases in model selection.

Future AGI's Compare Data is designed to make LLM comparisons structured, visual, and lightning-fast, enabling:

Side-by-side output comparisons across models and prompts
Prompt-level breakdowns and behavior diagnostics
Faster iteration cycles with clearer decisions
Visual summaries that surface patterns without the noise

Users can zoom out for high-level summaries across datasets or zoom in to perform detailed prompt-level comparisons. This structured comparison eliminates subjectivity, provides granular visibility into model behavior shifts, and enables faster, data-backed decision-making.

To compare your LLM Models to the best in the world, click here!

Launched Knowledge Base Integration- for reliable Synthetic Data

Traditional synthetic data generation often lacks grounding in real organizational context, leading to hallucinated outputs that are unusable in high-stakes environments like finance, healthcare, and legal. Organizations building evaluation sets or fine-tuning models need a way to create synthetic data that reflects their real-world knowledge and domain-specific language.

We introduced the Knowledge Base-powered Synthetic Data Generation feature to directly solve this gap.

With this capability:

Users can upload their own documents like PDFs, SOPs, product manuals and internal guidelines to build a custom knowledge base.
Synthetic data is generated with every datapoint anchored to the uploaded knowledge, ensuring factual precision.
The system adapts to the organization's specific language, structure, and tone, avoiding generic, hallucinated outputs.

By maintaining a ~90% content overlap with the original documents, the generated datasets become high-fidelity and immediately usable for creating evaluation datasets or fine-tuning models.

This gives organizations complete control over synthetic data generation while ensuring regulatory compliance and relevance.

To know how Future AGI creates accurate synthetic datasets, read our documentation here!

Launched Audio Evaluations - Powering the Multimodal Stack

Evaluating audio data or output has been a major challenge due to the lack of consistent tools, high manual review costs, and unreliable subjective assessments. As audio LLMs become central to customer interactions, from IVR systems to AI-powered support calls, ensuring high-quality audio at scale is now essential.

To address this, we launched state-of-the-art Audio Evaluations, a comprehensive set of metrics for automated, objective, and scalable evaluation of audio outputs.

Here’s how it works:

Users can import audio datasets via CSV/JSON uploads, Hugging Face datasets, or SDK scripts.
Our system provides pre-built evaluation metrics tailored for audio.
Evaluations can be run at scale, with support for testing on over 5,000 audio datapoints in a batch.
Error localization highlights exactly where an audio output fails, enabling targeted feedback and improvements.

The platform not only accelerates development cycles by providing instant evaluation reports but also helps fine-tune audio LLMs for domain-specific needs like multilingual IVR conversations and customer support call analysis.

To see how you can evaluate your audio using LLMs, read our documentation here!

Future AGI integrated with OpenAI Agents SDK

We’re excited to share that our platform has been officially recognized by OpenAI and now listed in the OpenAI Agents SDK documentation as a provider for tracing and evaluations.

With the OpenAI Agents SDK still in its early stages, we’re proud to offer essential tools that make observability, evaluation, and tracing more accessible and reliable for developers.

If you’re exploring OpenAI Agents, we invite you to check out our resources and see how we can help you build faster, smarter, and more safely. You can find all the details here.

🌐Other Updates

Webinar on "Evaluating AI with Confidence"

Too often, teams focus on building and fine-tuning models first and only test for issues like hallucinations, incomplete responses, or reliability gaps just before or sometimes after the launch. By then, fixes are slower, costlier, and riskier.

In this session, we dove deep into Future AGI’s evaluation workflow: covering multi-modal evaluations, custom metrics, feedback loops, and error localization. This empowers AI teams in catching issues early, improving model reliability, and building with confidence.

Perfect for anyone looking to make AI development faster, sharper, and more aligned.

Watch the webinar-https://futureagi.com/blogs/evaluating-ai-with-confidence

Register now for our upcoming Webinar: "Modern AI Engineering: Strategies That Scale"
Sandeep Kaipu, Engineering Leader @Broadcom will share actionable strategies on building scalable infra for your modern GenAI stack.

Data & Eval Driven Development: A hands-on session at the AI User Conference, SF

One of April’s biggest highlights was the AI User Conference- a major global event attended by AI professionals across industries and countries. Our Founder, Nikhil, led a hands-on workshop on making AI agents truly customer-ready using a data and evaluation-driven development approach.

A key takeaway from the event was the growing recognition that powerful AI alone isn’t enough, what matters is how reliably it performs in real-world use. The conversations underscored a rising demand for evaluation and observability as core pillars in building trustworthy, user-centric AI systems. For us, it reaffirmed our mission to make transparency and continuous assessment foundational to every AI deployment.

Closing Thoughts

Every launch, every conversation points to one truth: AI needs more discipline, trust, and care.

At Future AGI, we’re staying curious, moving fast, and staying true to our mission: Helping teams build AI that works — reliably, safely, and at scale.

For more updates, join the conversation in our Slack Community.

Your partner in building Trustworthy AI!

Automated Optimization For Your Agents: A Complete Workflow

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

How to Implement Voice AI Observability for Real-Time Production Monitoring

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Future AGI's Voice Evaluation: Beyond Transcript Testing for Voice AI

Automated Optimization For Your Agents: A Complete Workflow

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

How to Implement Voice AI Observability for Real-Time Production Monitoring

Automated Optimization For Your Agents: A Complete Workflow

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

How to Implement Voice AI Observability for Real-Time Production Monitoring

Automated Optimization For Your Agents: A Complete Workflow

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

How to Implement Voice AI Observability for Real-Time Production Monitoring

Rishav Hada

Senior Applied Scientist

Rishav Hada is an Applied Scientist at Future AGI, specializing in AI evaluation and observability. Previously at Microsoft Research, he built frameworks for generative AI evaluation and multilingual language technologies. His research, funded by Twitter and Meta, has been published in top AI conferences and earned the Best Paper Award at FAccT’24.

Rishav Hada

Sep 30, 2025

Future AGI September Roundup

Future AGI September: Launch Agent Compass for 98% faster debugging, AWS Marketplace integration, enterprise RBAC, reusable prompts, and AI Conference highlights.

Company News

Rishav Hada

Jul 31, 2025

Future AGI July Roundup

Future AGI July 2025 roundup: Launch of open-source AI evaluation library, Vercel SDK integration, user feedback tools & cybersecurity webinar insights.

Company News

Rishav Hada

Jun 30, 2025

Future AGI June Roundup

Explore Future AGI’s June 2025 updates: Inline Evaluations, Audio Error Localizer, AI eval library, TypeScript ADK, MCP webinar & SuperAI event highlights.

Company News

Rishav Hada

May 31, 2025

Future AGI May Roundup

Explore Future AGI May Roundup: MCP Server launch, Synthetic Data Generation boost, Inline Trace View, Dataset Creation, Prompt Playground, webinar, podcast.

Company News

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

AI Agents

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Podcasts

Products

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

Podcasts

Products

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Podcasts

Products

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

Podcasts

Products

AI Agents

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

AI Agents

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Podcasts

Products

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

Podcasts

Products

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Podcasts

Products

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

Podcasts

Products

AI Agents

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Podcasts

Products

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

Podcasts

Products

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Podcasts

Products

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

Podcasts

Products

AI Agents

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

FutureAGI for Startups: Get 6 months of Pro access free plus $5,000 in credits. Apply Now!