AI Evaluations

Company News

Protect: Trustworthy AI Guardrails for Enterprises

Last Updated

Oct 21, 2025

Rishav Hada

Time to read

1 min read

Explore Future AGI

Introduction

As Artificial Intelligence rapidly moves from labs to boardrooms, one question looms large - how do we keep large language models (LLMs) safe, reliable, and compliant in the real world?

From financial chatbots to healthcare assistants, enterprises rely on AI systems to handle sensitive data and high-stakes decisions. But without proper guardrails, these models can hallucinate, leak private information, or even be tricked into unsafe actions through prompt injection attacks.

That’s where Future AGI’s Protect comes in - a next-generation, multi-modal guardrailing framework built to make enterprise AI safe, explainable, and production-ready.

📃 Read the full research paper here.

What Are AI Guardrails and Why Current Ones Fall Short

Guardrails act as the safety filters for AI systems. They monitor what goes in (user prompts) and what comes out (AI responses), ensuring compliance with company policies and regulatory standards.

However, most existing guardrails share three major weaknesses:

Text-only focus - They can’t handle images or audio, even though modern enterprises use voice assistants, visual search, and document understanding tools daily.
Lack of explainability - They flag issues but rarely explain why, which limits trust and makes auditing difficult.
Slow and fragmented systems - Chaining multiple external safety checks adds latency and complexity.

In industries like finance, healthcare, and public services, where regulations are strict and decisions have real-world impact, these gaps make legacy guardrails unfit for deployment.

Introducing Protect: A Multi-Modal Guardrailing Stack

Protect is a first-of-its-kind, enterprise-grade guardrailing system designed to work seamlessly across text, image, and audio inputs - the full range of today’s multi-modal AI.

At its core, Protect combines three key innovations:

Multi-Modal Safety Intelligence
Protect doesn’t just scan text. It can analyze spoken conversations, screenshots, memes, or visual content to detect risks like toxicity, sexism, data leaks, or prompt injection attempts.
Teacher-Assisted Annotation Pipeline
A “teacher” model helps generate smarter, context-aware safety labels by reasoning about why something might be unsafe. This improves accuracy and interpretability - a big step up from basic keyword filters, enriching the dataset quality.
Lightweight, Real-Time Performance
Protect uses Low-Rank Adaptation (LoRA) fine-tuning to stay fast and efficient, making it ideal for on-device or cloud deployments with minimal latency.

In short, Protect gives enterprises a unified safety layer for all forms of AI interaction - from a chatbot message to a call center recording to a product image.

Four Critical Safety Dimensions Your Business Needs

Protect focuses on four areas that matter most to businesses:

Toxicity Detection Catches hate speech, harassment, and offensive language before it reaches your customers. This protects your brand reputation and creates safer customer interactions.
Gender Bias Prevention Identifies and blocks sexist content or gender discrimination. This is crucial for maintaining inclusive workplace communications and customer-facing content.
Data Privacy Protection Prevents accidental exposure of sensitive information like credit card numbers, social security numbers, medical records, or personal addresses. This helps you stay compliant with GDPR, CCPA, and other privacy regulations.
Prompt Injection Defense Stops attackers from manipulating your AI systems through clever prompts designed to bypass safety rules. Think of it as protection against AI "hacking" attempts.

Inside the Dataset: Teaching AI What “Unsafe” Means

A guardrail is only as good as the data it learns from. Protect’s team curated one of the most diverse multi-modal safety datasets to date, spanning:

Text datasets: (from sources like WildGuardTest, ToxicChat, and ToxiGen)
Image datasets: (including Hateful Memes, VizWiz-Priv, and graphical violence collections)
A large-scale, custom-synthesized Audio dataset

Each data point was carefully categorized under four safety dimensions: Toxicity, Sexism, Data Privacy, and Prompt Injection.

To create a unique audio dataset, our team synthesized the existing text samples using a sophisticated process. By systematically varying accents, emotions, and speaking rates, and adding realistic background noise, we built a dataset that teaches Protect to recognize crucial risk factors in a speaker’s tone - like sarcasm or anger - that plain text transcripts would miss. The diagram below illustrates our end-to-end pipeline for this process:

A key part of this pipeline was generating very specific commands to control the speech synthesis. Here are a few examples that show how we controlled the accent, emotion, and style for each audio clip:

Smarter Labeling Through “Teacher-Assisted” Learning

Traditional safety datasets rely on keyword tagging - a method that often misclassified nuanced or context-dependent content. Protect fixes this using a teacher-assisted relabeling pipeline:

The teacher model first explains its reasoning (“thinking trace”) before suggesting a label (Passed/Failed).
Human reviewers then validated these automated suggestions through iterative audits, ensuring the final labels were high-quality, consistent, and accurate.
The result: fewer false positives and a dataset enriched with rationales for why something was unsafe.

This approach cut labeling disagreements by over 20%, meaning Protect’s training data became cleaner, more consistent, and better suited for regulated enterprise contexts where transparency is critical.

Training the Guardrail: Four Specialized Safety Adapters

Rather than using one giant model for everything, Protect employs four small, specialized adapters, each fine-tuned for a specific safety task (toxicity, sexism, privacy, and prompt injection).

These adapters were trained under different configurations - some focusing purely on classification (“Vanilla”), while others generated reasoning or explanations before giving a verdict (“Thinking” and “Explanation” variants). Here’s what those different output formats look like in practice for a single prompt injection attempt:

Our results showed that different adapter styles excelled at different tasks. While the simple 'Vanilla' adapter was most effective for clear-cut violations like Prompt Injection, the 'Explanation' variant proved superior for nuanced categories like Sexism. This highlights the importance of explainability, which is critical for audit trails and building trust in enterprise settings.

Results: Beating the Best (Even GPT-4.1)

Protect’s performance on benchmark tests was exceptional. When we compared its ability to catch critical safety violations against other models, the results were clear:

Protect proved to be highly competitive with proprietary giants like GPT-4.1 and even outperformed them in critical enterprise scenarios. It was significantly better at catching prompt injection and privacy violations-two of the most difficult enterprise safety challenges.

And it did all this while staying lightweight enough for real-time deployment. Protect’s average decision latency was around 67 ms for text and 109 ms for images, making it one of the fastest guardrails for production workloads.

Why This Matters for Enterprises

AI guardrails aren’t just a security feature anymore, they’re a business requirement. As companies move toward AI-powered automation, voice agents, and data-driven insights, the need for trust, explainability, and auditability grows exponentially.

Protect delivers on all three fronts:

Native multi-modal coverage with text, image, and audio in one stack
Real-time safety checks without slowing applications
Transparent explanations for every decision
Open-source LoRA adapters trained exclusively on our text dataset, providing a transparent benchmark for text modality safety.

This makes Protect especially relevant for regulated industries like finance, healthcare, government, and education, where a single misstep in data or compliance can have massive consequences.

. Real Business Applications

Customer Service Centers Monitor voice calls and chat messages in real-time to ensure quality interactions and catch potential PR disasters before they escalate.
Content Moderation Platforms Automatically screen user-generated content across text, images, and video for social media platforms, forums, or review sites.
Healthcare Communications Protect patient privacy by automatically detecting and redacting PHI (Protected Health Information) in transcriptions, chat logs, and documentation.
Financial Services Prevent accidental disclosure of account numbers, SSNs, and other sensitive financial data in customer communications.
HR and Workplace Tools Maintain inclusive workplace communications by detecting and flagging biased or discriminatory language in emails, chat systems, and documents.

Conclusion

As AI becomes more sophisticated and integrated into business operations, safety can't be an afterthought. Protect represents the next generation of AI guardrails- comprehensive, multimodal, and built for real-world enterprise deployment.

The system's combination of multi-modal coverage, low latency, high accuracy, and explainable decisions makes it ideal for regulated industries where both performance and auditability matter.

With businesses facing increasing regulatory scrutiny around AI systems, having robust guardrails isn't just about avoiding problems - it's about building trust with customers and demonstrating responsible AI deployment. Our team is committed to the ongoing improvement of Protect, continuously expanding its knowledge base to better handle complex and nuanced content.

Ready to Secure Your AI Systems?

Protect your business with enterprise-grade AI safety. The text-based models are available as open source, allowing your team to evaluate and integrate them into your existing systems.

Learn more about Protect and access the models from HuggingFace. See how multi-modal guard railing can give you confidence in your AI deployments.

Get started or Contact our team for enterprise deployment support, custom training, or integration consulting. Let's build safer AI systems together.

Automated Optimization For Your Agents: A Complete Workflow

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

How to Implement Voice AI Observability for Real-Time Production Monitoring

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Future AGI's Voice Evaluation: Beyond Transcript Testing for Voice AI

Automated Optimization For Your Agents: A Complete Workflow

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

How to Implement Voice AI Observability for Real-Time Production Monitoring

Automated Optimization For Your Agents: A Complete Workflow

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

How to Implement Voice AI Observability for Real-Time Production Monitoring

Automated Optimization For Your Agents: A Complete Workflow

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

How to Implement Voice AI Observability for Real-Time Production Monitoring

Rishav Hada

Senior Applied Scientist

Rishav Hada is an Applied Scientist at Future AGI, specializing in AI evaluation and observability. Previously at Microsoft Research, he built frameworks for generative AI evaluation and multilingual language technologies. His research, funded by Twitter and Meta, has been published in top AI conferences and earned the Best Paper Award at FAccT’24.

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

AI Agents

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Podcasts

Products

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

Podcasts

Products

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Podcasts

Products

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

Podcasts

Products

AI Agents

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

AI Agents

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Podcasts

Products

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

Podcasts

Products

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Podcasts

Products

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

Podcasts

Products

AI Agents

Rishav Hada

Feb 2, 2026

Inference Performance as a Competitive Advantage

Join our webinar on LLM inference optimization with FriendliAI. Learn to reduce GPU costs 90%, boost model serving speed in production AI deployment.

Webinars

Podcasts

Products

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Master voice agent development from prototype to production using synthetic data, simulation, and AI-driven optimization. Build drive-thru agents in 1 hour.

Podcasts

Products

AI Agents

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Audit voice AI agents for compliance before launch. TCPA consent, HIPAA security, PII protection, and automated testing to avoid regulatory fines.

AI Evaluations

AI Regulations

Podcasts

Products

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Implement voice AI observability for real-time production monitoring. Track latency, conversation quality & detect performance drift with Future AGI.

Podcasts

Products

AI Agents

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

Rishav Hada

Jan 19, 2026

Automated Optimization For Your Agents: A Complete Workflow

Learn to build production-ready voice agents in 5 steps using synthetic data generation, simulation testing, and automated prompt optimization with FutureAGI.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

NVJK Kartik

Jan 7, 2026

How to Audit Voice AI Agents for Regulatory Compliance Before Going Live

Learn how to audit voice AI agents for TCPA, HIPAA compliance. Automated testing, PII protection, and regulatory requirements before going live.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Jan 6, 2026

How to Implement Voice AI Observability for Real-Time Production Monitoring

Monitor voice AI agents in production with real-time observability. Track latency, conversation quality & performance drift before customers complain.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

Sahil N

Dec 23, 2025

How to Test 10,000 Voice Agent Scenarios in Minutes Without Manual QA

Automate voice agent testing with Future AGI. Test 10,000 Vapi & Retell scenarios in minutes, eliminate manual QA bottlenecks, and catch failures before production.

FutureAGI for Startups: Get 6 months of Pro access free plus $5,000 in credits. Apply Now!