Evaluate Cloudflare Workers AI
Cloud Platforms
Score Cloudflare Workers AI responses against 70+ purpose-built evaluators — Groundedness, Context Relevance, Prompt Injection, Toxicity, function-call accuracy, and your own custom templates.
Recipes for Cloudflare Workers AI
Prerequisites
Before you start
- · A working Cloudflare Workers AI app — local or already in production.
- · A free Future AGI account with
FI_API_KEYandFI_SECRET_KEY. - · Python 3.9+ / Node 18+ / Java 17+ depending on which SDK you're installing.
- · Trace input/output payloads (or a dataset) ready to score.
Install
pip install traceAI-openaiEvaluate recipe
from fi.evals import EvalClient
from fi.evals.templates import (
ContextRelevance, Groundedness, PromptInjection, Toxicity
)
client = EvalClient(api_key="<FI_API_KEY>", secret_key="<FI_SECRET_KEY>")
# Reuse the trace input/output from your Cloudflare Workers AI run
result = client.evaluate(
eval_templates=[ContextRelevance(), Groundedness(), PromptInjection(), Toxicity()],
inputs=[{
"input": user_query,
"output": cloudflare_workers_ai_response,
"context": retrieved_context,
}],
)
print(result.eval_results)What Future AGI captures
Evaluate fields you'll see in the dashboard
-
Run any of the 70+ Future AGI evaluator templates against trace input/output
-
Score in real time on production spans, in CI on a dataset, or as a guardrail before response
-
Custom evaluators via the builder API — heuristic, LLM-as-judge, or fine-tuned Turing models
-
Eval results land back on the originating trace as searchable attributes
Common gotchas
Read these before you ship
- 01
Eval templates expect specific input keys — check the template signature in `fi.evals.templates`.
- 02
For RAG evaluators, pass the retrieved chunks as `context`, not the full document.
- 03
LLM-as-judge templates count against your eval-model token budget — switch to Turing flash for high-volume.
Next: chain it with the other recipes
Evaluate is the first step. Most teams add an evaluator the same week, and start optimising or simulating once they have a baseline. Each recipe takes minutes to wire up.
Adjacent integrations
More integrations like Cloudflare Workers AI
Vertex AI
Google Cloud's hosted Gemini, Anthropic, and Llama endpoints.
AWS Bedrock
Amazon Bedrock invocation across Claude, Llama, Mistral, Nova, and Titan.
Azure OpenAI
Microsoft Azure's regulated OpenAI deployments and assistants.
IBM watsonx
IBM watsonx.ai foundation models for regulated workloads.