Evaluate AWS Bedrock
Cloud Platforms
Score AWS Bedrock responses against 70+ purpose-built evaluators — Groundedness, Context Relevance, Prompt Injection, Toxicity, function-call accuracy, and your own custom templates.
Recipes for AWS Bedrock
Prerequisites
Before you start
- · A working AWS Bedrock app — local or already in production.
- · A free Future AGI account with
FI_API_KEYandFI_SECRET_KEY. - · Python 3.9+ / Node 18+ / Java 17+ depending on which SDK you're installing.
- · Trace input/output payloads (or a dataset) ready to score.
Install
pip install traceAI-bedrockEvaluate recipe
from fi.evals import EvalClient
from fi.evals.templates import (
ContextRelevance, Groundedness, PromptInjection, Toxicity
)
client = EvalClient(api_key="<FI_API_KEY>", secret_key="<FI_SECRET_KEY>")
# Reuse the trace input/output from your AWS Bedrock run
result = client.evaluate(
eval_templates=[ContextRelevance(), Groundedness(), PromptInjection(), Toxicity()],
inputs=[{
"input": user_query,
"output": aws_bedrock_response,
"context": retrieved_context,
}],
)
print(result.eval_results)What Future AGI captures
Evaluate fields you'll see in the dashboard
-
Run any of the 70+ Future AGI evaluator templates against trace input/output
-
Score in real time on production spans, in CI on a dataset, or as a guardrail before response
-
Custom evaluators via the builder API — heuristic, LLM-as-judge, or fine-tuned Turing models
-
Eval results land back on the originating trace as searchable attributes
Common gotchas
Read these before you ship
- 01
Eval templates expect specific input keys — check the template signature in `fi.evals.templates`.
- 02
For RAG evaluators, pass the retrieved chunks as `context`, not the full document.
- 03
LLM-as-judge templates count against your eval-model token budget — switch to Turing flash for high-volume.
Next: chain it with the other recipes
Evaluate is the first step. Most teams add an evaluator the same week, and start optimising or simulating once they have a baseline. Each recipe takes minutes to wire up.
Adjacent integrations
More integrations like AWS Bedrock
Vertex AI
Google Cloud's hosted Gemini, Anthropic, and Llama endpoints.
Azure OpenAI
Microsoft Azure's regulated OpenAI deployments and assistants.
IBM watsonx
IBM watsonx.ai foundation models for regulated workloads.
Replicate
Run open-source AI models on Replicate's serverless GPUs.