Company News

Future AGI May Roundup

Last Updated

May 31, 2025

Rishav Hada

Time to read

13 mins

Explore Future AGI

At Future AGI, we’re focused on solving the real challenges teams face when working with LLMs - from evaluation and observability to faster iteration and safer deployment. May was a steady step forward, with practical product upgrades, hackathons, webinars, podcasts, and case study showing what better infra and workflows can unlock.

Here’s everything we built, supported, and shared this month.

✅ Product Updates

Introduced Future AGI MCP Server

We’re excited to announce that Future AGI now runs its own MCP (Model Context Protocol) server, enabling seamless integration with tools like Claude, Cursor, Crew AI, and any other MCP-compatible clients.

With this integration, you can now connect your LLM workflows directly to Future AGI’s evaluation engine - no context switching, no manual uploads.

The MCP protocol standardizes how models, tools, and evaluation layers communicate. By running our own MCP server, we’ve made it easier for teams to:

Run inline evaluations during development and testing
Automate feedback loops for continuous improvement
Debug agent behavior with structured trace-level insights

👉 Ready to democratize AI workflows at your organization? Explore the docs and get started today.

30% Faster Synthetic Data Generation with Improved UI

We’ve rolled out a major update to our Synthetic Data Generation workflow - making it faster, easier to use, and better suited for teams working in regulated environments like finance, healthcare, and legal.

🔧 What’s New:

Revamped UI/UX: A cleaner interface now guides users step-by-step with sample examples, making the process more intuitive, even for non-technical users.
30% Faster Generation: We’ve improved system performance to reduce the time it takes to generate synthetic datasets, helping teams move from raw data to training-ready assets faster than ever.

Whether you're fine-tuning models or building robust eval sets, this update helps you ship faster with safer, smarter data.

👉 Get a step-by-step guidance on how to generate synthetic data using Future AGI.

Improved Trace View with Inline Annotations

Debugging and analyzing LLM behavior just got a whole lot smoother.

Our new trace view is now cleaner, more navigable, and optimized for real-time analysis. Whether you're prototyping or evaluating in production, this update helps you move faster with more clarity.

🔍 What’s New:

A streamlined interface for exploring traces and spans.
Quick filters to slice metadata like evaluation scores, token usage, and processing times.
Ability to add and view inline annotations directly in the trace tree.

With a cleaner trace view and inline annotations, understanding model behavior is faster and more precise - helping you go from prototype to production with confidence.

👉 Dive into our docs to see how prototyping can streamline your LLM development and de-risk production.

50% Faster Dataset Creation for AI Workflows

Creating datasets for LLM experimentation, prompt tuning, or evaluation used to be slow, manual, and error-prone.

Now, it’s automated and up to 50% faster.

With our latest update, you can extract datapoints from traces - including inputs, outputs, latency, evaluation scores, and more - and instantly convert them into structured datasets for analysis or training.

👉 Read the full release notes here

🌐 In the Field

Case-study: Future AGI Prompt Playground Cuts Problem Resolution Time by 30 %

What do you do when a 500+ agent team still can’t keep up with thousands of daily support tickets? That was the reality for one global tech company - overwhelmed by repetitive Tier-1 queries, burning out agents, and watching CSAT scores drop fast.

Future AGI’s Prompt Playground flipped the script: by automating 85 % of Tier-1 tickets, optimizing prompts in real time, and streamlining triage, the team cut average resolution time 30% while freeing agents to tackle 25 % more high-value cases.

Future AGI’s Improve Existing Prompt feature enabled the customer support team to refine prompts in real time based on agent feedback and evolving support needs.

📖 Read the full case-study to see how they did it.

Webinar on "Modern AI Engineering: Strategies That Scale"

Too often, AI systems are scaled without the right foundations - leading to high costs, latency issues, and misalignment with business outcomes. As models grow more complex, the need for scalable infrastructure, robust observability, and clear performance tracking becomes non-negotiable.

In this session, we shared a practical playbook for modern AI engineering, featuring Sandeep Kaipu, Engineering Leader at Broadcom, and Nikhil Pareek, Founder at Future AGI. From aligning AI initiatives with real KPIs to designing scalable systems and embedding compliance from day one, we covered the critical steps to take AI from prototype to production, at scale.

Perfect for teams looking to build faster, operate smarter, and deploy AI that performs in the real world.

👉 Watch the full webinar here!

Podcast on "Unlocking Product Management with Reliable AI"

We aired another awesome episode of ‘Accelerate AI’ where, Nikhil sat down with Jorge Alcantara, Founder & CEO of Zentrix, for a sharp and honest conversation on how AI is reshaping product management.

In this episode, Jorge breaks down how PMs can move beyond the noise - automating routine work, debugging agent pipelines, and applying scientific thinking to product strategy.

💡 Key Takeaways:

How to design measurable, explainable GenAI products that move beyond surface-level demos
Why context-driven product thinking is the key to scaling reliable AI tools

🎧 Tune in here to explore why product managers are becoming Chief Context Officers, and what it takes to build GenAI tools that actually work in the real world.

Co-organised the AWS MCP Agents Hackathon in SF

Last week, we co-organized the AWS MCP Agents Hackathon alongside an incredible lineup of partners - including Anthropic, DuploCloud, Clarifai, Auth0, Make, n8n, and many more.

Developers from across the world came together to build the next generation of AI agents - with over $50,000 in prizes up for grabs. Participants got exclusive access to Future AGI’s evaluation, optimization, and guardrail toolkit, designed to help teams move fast without compromising trust or reliability.

💡 Closing Thoughts

Everything we worked on this month came back to one idea: making it easier for teams to build AI with confidence and care. Whether through new features, community events, or shared learnings, we’re here to support the people behind the progress.

For more updates, join the conversation in our Slack Community or get in touch with us directly.

Your partner in building Trustworthy AI!

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Step-by-Step Guide on Building Generative AI Chatbot 2025

How to Stress-Test Your LLM Before It Fails in Production

Top 5 AI Guardrailing Tools in 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Step-by-Step Guide on Building Generative AI Chatbot 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Step-by-Step Guide on Building Generative AI Chatbot 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Step-by-Step Guide on Building Generative AI Chatbot 2025

Rishav Hada

Senior Applied Scientist

Rishav Hada is an Applied Scientist at Future AGI, specializing in AI evaluation and observability. Previously at Microsoft Research, he built frameworks for generative AI evaluation and multilingual language technologies. His research, funded by Twitter and Meta, has been published in top AI conferences and earned the Best Paper Award at FAccT’24.

Rishav Hada

May 31, 2025

Future AGI May Roundup

Explore Future AGI May Roundup: MCP Server launch, Synthetic Data Generation boost, Inline Trace View, Dataset Creation, Prompt Playground, webinar, podcast.

Company News

Rishav Hada

Apr 30, 2025

Future AGI April Roundup

April recap: Future AGI updates with Compare Data for LLM output comparison, Knowledge Base integration, Audio Evaluations, and OpenAI Agents SDK integration.

Company News

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

A comprehensive comparison of Future AGI and Weights & Biases for AI teams. Explore their capabilities, features, pricing, user experience, performance, integrations, use cases, pros & cons, and find out which platform excels in LLMOps, generative AI pipelines, and classic ML experiment tracking.

AI Evaluations

LLMs

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

AI Evaluations

LLMs

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

AI Evaluations

LLMs

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

AI Evaluations

AI Regulations

LLMs

AI Agents

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

AI Evaluations

LLMs

Podcasts

Products

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

AI Evaluations

LLMs

Podcasts

Products

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

AI Evaluations

LLMs

Podcasts

Products

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

AI Evaluations

AI Regulations

LLMs

Podcasts

Products

AI Agents

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

AI Evaluations

LLMs

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

AI Evaluations

LLMs

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

AI Evaluations

LLMs

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

AI Evaluations

AI Regulations

LLMs

AI Agents

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

AI Evaluations

LLMs

Podcasts

Products

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

AI Evaluations

LLMs

Podcasts

Products

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

AI Evaluations

LLMs

Podcasts

Products

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

AI Evaluations

AI Regulations

LLMs

Podcasts

Products

AI Agents

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

AI Evaluations

LLMs

Podcasts

Products

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

AI Evaluations

LLMs

Podcasts

Products

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

AI Evaluations

LLMs

Podcasts

Products

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

AI Evaluations

AI Regulations

LLMs

Podcasts

Products

AI Agents

RAG

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

Rishav Hada

Jul 24, 2025

Future AGI vs Weights & Biases: Which Platform Actually Delivers

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

Rishav Hada

Jul 24, 2025

Future AGI vs. Braintrust.dev: The Showdown Every AI Team Needs

Compare Future AGI and Braintrust.dev on features, pricing, and performance. Discover which AI evaluation platform fits your team’s needs best.

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

Rishav Hada

Jul 24, 2025

LLM Evaluation: Frameworks, Metrics, and Best Practices (2025 Edition)

Comprehensive guide to LLM evaluation frameworks, metrics, and best practices. Learn how AI teams in the USA assess language models and agents for accuracy and reliability.Introduction

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

Rishav Hada

Jul 24, 2025

Step-by-Step Guide on Building Generative AI Chatbot 2025

Explore a detailed, step-by-step guide on building generative AI chatbots for AI teams in the USA. Learn about RAG, chatbot evaluation, and continuous monitoring.

FutureAGI for Startups: Get 6 months of Pro access free plus $5,000 in credits. Apply Now!