Guides

Future AGI vs LangSmith in 2026: Self-Improving Runtime vs Hosted Observability

Future AGI vs LangSmith on tracing, evaluation, prompt management, deployment, security, DX. Honest verdict, May 2026, why only one closes the loop.

May 12, 2026

20 min read

ai-gateway 2026 comparison langsmith llm-observability

If you have to pick today: Pick Future AGI if you want a runtime that closes the loop, trace to eval to optimizer to gateway, so the system updates its own prompts and routing policies from outcomes instead of staying an observation layer. Pick LangSmith if your runtime is LangChain or LangGraph, you want native graph-execution semantics in every trace, and you’re comfortable on LangChain Inc.’s closed-source hosted control plane.

Future AGI ranks first when the workload is continuous, the gateway has to keep improving on its own, and the team needs guardrails inline at the request boundary. LangSmith is a credible second when LangChain is the runtime, HIPAA is the procurement gate, and the team wants the most LangGraph-shaped trace UI available today. Six axes, honest scoring, pricing on both sides, four falls-short per side, and how the loop changes the math.

TL;DR: capability snapshot

Capability	Future AGI	LangSmith
Core identity	Self-improving runtime: trace + eval + optimizer + gateway + guardrails	LangChain Inc.’s closed-source eval + observability + agent platform
License	`traceAI`, `ai-evaluation`, `agent-opt` Apache 2.0; Agent Command Center closed	SDKs MIT; platform closed-source
Tracing	OTel-native; OpenInference spans in Python, TS, Java, C# (50+ integrations)	LangGraph-native + OTel ingest; Python, TS, Go, Java SDKs
Evaluation	72 metrics: RAG, agent trajectory, function calling, hallucination, code, toxicity	openevals, agentevals, LLM-as-judge, pytest/Vitest/Jest CI
Prompt management	Opinionated versioning + access control; optimizer writes the next version from eval signal	Playground + Polly + Prompt Hub trio
Optimizer	`agent-opt` with 6 algorithms (ProTeGi, Bayesian, GEPA, Meta-Prompt, Random Search, PromptWizard)	Not part of the product
Inline guardrails	Protect at 65 ms text / 107 ms image median time-to-label (arXiv 2510.13351)	Not part of the product
Gateway / router	Agent Command Center: routing, fallbacks, cost + quality attribution	Not part of the product
Agent builder	API-first	LangSmith Fleet (no-code) + Arcade MCP integrations
Self-host	Apache 2.0 libraries, BYOC, or SaaS; no Enterprise gate	Self-host gated to Enterprise (v0.13)
Compliance	SOC 2 Type II, HIPAA, GDPR, CCPA certified; ISO 27001 in active audit; AWS Marketplace	SOC 2 Type 2, HIPAA, GDPR shipped today
Pricing entry	Free (100K traces/mo), Scale $99/mo, Enterprise custom	Developer $0; Plus $39/seat/mo; Enterprise custom
Rank in 2026	#1 for self-improving runtime workloads	#2 for LangChain-native trace + eval with HIPAA in procurement

One-line verdict: Future AGI wins on the closed loop (trace + eval + optimizer + gateway + inline guardrails) and on three deployment on-ramps including Apache 2.0 libraries you can self-host without enterprise procurement. LangSmith wins on LangGraph-native depth, four-language SDK fleet, Fleet no-code builder, and HIPAA shipped today. Only one of the two updates the runtime itself. That’s the wedge.

What changed in 2026

LangSmith Fleet (March 19, 2026). Agent Builder was renamed and expanded. Templates accelerate “internal-tool agent” patterns. Arcade-backed MCP integrations cover Gmail, Slack, Drive, Notion, GitHub, Jira, Linear, Salesforce, and HubSpot. The Fleet tier on Plus ships 1 Fleet agent and 500 Fleet runs per month, with $0.05 per run overage. It isn’t closed-loop optimization, but it shortens “trace to internal agent” considerably.

LangSmith Self-Hosted v0.13 (January 16, 2026). IAM auth, mTLS for Postgres / Redis / ClickHouse, KEDA autoscaling, and IngestQueues on by default. AWS, GCP, and Azure supported BYOC. The gate is still the Enterprise contract.

Future AGI Protect on arXiv (October 2025) + AWS Marketplace (Q1 2026). Protect’s published latency of 65 ms text / 107 ms image median time-to-label (arXiv 2510.13351) put a number on the inline-guardrail story. AWS Marketplace collapsed the procurement path for AWS-committed buyers. agent-opt v1 shipped ProTeGi, Bayesian, GEPA, Meta-Prompt, Random Search, PromptWizard. Six algorithms that turn the trace + eval store into a system that proposes the next prompt or routing-policy revision. Neither is a feature LangSmith ships. Both are the wedge.

What each product actually is

Future AGI is a self-improving runtime for LLM agents. The Agent Command Center is the hosted control plane. The building blocks are three Apache 2.0 libraries:

traceAI (github.com/future-agi/traceAI) is OpenTelemetry-native from the first byte. OpenInference spans across Python, TypeScript, Java, and C# in 50+ framework integrations: LangChain, LangGraph, Pydantic AI, OpenAI Agents SDK, CrewAI, AutoGen, Vercel AI SDK, Mastra, and more.
ai-evaluation (github.com/future-agi/ai-evaluation) is FAGI’s evaluation platform. Five concrete pillars: (1) a 50+ pre-built rubric catalog covering task completion, faithfulness, tool-use, structured-output, agentic surfaces, hallucination, groundedness, context relevance, instruction-following, with 72 named metrics including RAG correctness, agent trajectory, function calling, code correctness, and toxicity; (2) unlimited custom evaluators authored by an in-product eval-authoring agent that uses tool calling on your code; (3) self-improving evaluators that learn from live production traces; (4) FAGI’s in-house classifier model family running continuous evaluation at very low cost-per-token (lower per-eval cost than Galileo Luna-2); (5) Apache 2.0 OSS library. LLM-as-judge with BYOK.
agent-opt (github.com/future-agi/agent-opt) is the optimizer. Six algorithms (ProTeGi, Bayesian, GEPA, Meta-Prompt, Random Search, PromptWizard) consume a labelled dataset and propose the next prompt or routing-policy revision.

Add Error Feed. FAGI’s part of the eval stack (the clustering and what-to-fix layer that feeds the self-improving evaluators via HDBSCAN clustering plus a Sonnet 4.5 Judge writing immediate_fix) runs zero-config the moment traces hit an Observe project. Detects errors in 5 categories (factual grounding failures, tool crashes, broken workflows, safety violations, reasoning gaps), auto-clusters related traces into named issues, auto-generates analysis per issue (what went wrong, root causes, evidence, quick fix, long-term recommendation), tracks trends per issue. Works with every OpenInference integration.

Add the Future AGI Protect model family for inline guardrails (65 ms text median time-to-label, 107 ms image median time-to-label per arXiv 2510.13351). Protect is FAGI’s own fine-tuned model family built on Google’s Gemma 3n with specialized adapters across four safety dimensions (content moderation, bias detection, security/prompt-injection, data privacy/PII), natively multi-modal across text, image, and audio. The Agent Command Center is also the gateway: it picks the cheaper model for easy turns, falls back on rate limits, and splits traffic by metadata.

LangSmith is LangChain Inc.’s closed-source platform across six surfaces: Observability, Evaluation, Deployment via Agent Servers, Prompt Engineering, Fleet, and Studio. SDKs are MIT across Python, TypeScript, Go, and Java. The platform is hosted by default with a self-hosted Enterprise option (v0.13, January 2026). The center of gravity is LangChain and LangGraph. Every graph node becomes a span, Prompt Hub versions flow through traces, openevals + agentevals ship LangChain-tuned evaluators. Fleet extends this into a visual agent builder. Polly is the AI assistant for trace analysis and prompt rewrites. What Polly doesn’t do is roll the rewrites into the next gateway request.

LangSmith is a polished observation + LangChain-affined eval layer. Future AGI is a runtime that observes and acts.

Head-to-head on the six axes

1. Tracing

Future AGI’s traceAI is OpenTelemetry-native and OpenInference-compatible, with first-party SDKs in Python, TypeScript, Java, and C#. Spans cover 50+ named integrations including LangGraph, CrewAI, AutoGen, Pydantic AI, OpenAI Agents SDK, Mastra, Vercel AI SDK, and Microsoft Agent Framework. Tool calls become child spans by default. Every model call attaches input, output, model, and eval score as span attributes. Apache 2.0 means you can read the instrumentation and fork it.

LangSmith’s tracing was designed around LangChain semantics. A LangGraph execution emits one span per node, chain and tool boundaries are first-class, and the trace tree mirrors the graph topology. Non-LangChain runtimes ingest via OTel through otlphttp/langsmith across 15 named agent frameworks, but the experience is two notches less polished than the LangGraph-native lane. SDKs cover Python, TypeScript, Go, and Java.

Verdict. Future AGI wins on the OTel-native default and on agent-aware spans joined to eval scores under Apache 2.0. LangSmith wins on LangGraph-native depth and on the only Go SDK in the cohort. For C# teams, Future AGI is the only option of the two.

2. Evaluation

Future AGI’s ai-evaluation ships a 50+ rubric catalog (72 named metrics in total) out of the box: RAG faithfulness, context relevance, answer correctness, agent trajectory, tool-call accuracy, function calling, hallucination, code correctness, toxicity, and more. The catalog isn’t the ceiling. Users define any rubric they need, and an in-product eval-authoring agent uses tool calling on the user’s code to generate, refine, and tune custom evaluators end-to-end. Every evaluator is self-improving, learning from live production traces so the rubric gets sharper as traffic flows. FAGI’s in-house classifier model family runs the continuous evaluation at very low cost-per-token (lower per-eval cost than Galileo Luna-2) and scores any rubric, built-in or custom. LLM-as-judge with BYOK to avoid platform markup. Apache 2.0. Named metrics drop in as imports. The eval scores aren’t just dashboard data. They feed agent-opt, which rewrites the prompt or routing policy from the failure clusters. Error Feed sits alongside as the zero-config error monitor that auto-clusters related failures into named issues.

LangSmith ships two OSS eval packages (openevals, agentevals) with named-prompt evaluators (CORRECTNESS_PROMPT, CONCISENESS_PROMPT, exact-match), plus LLM-as-judge (default openai:o3-mini), pairwise + code evaluators, annotation queues, and first-class pytest, Vitest, Jest integrations for CI gates. Polly walks through trace data and proposes prompt optimizations in the UI.

Verdict. Future AGI wins on the depth of the eval platform: ai-evaluation SDK (Apache 2.0, 60+ EvalTemplate classes, 13 guardrail backends including 9 open-weight Llama Guard / Qwen3Guard / Granite Guardian / WildGuard / ShieldGemma plus 4 API backends, and 8 fast Scanners) layered with the Future AGI Platform (self-improving evaluators, in-product agent authoring from natural language to rubric, lower per-eval cost than Galileo Luna-2) + unlimited custom evaluators authored by an in-product agent + self-improving rubrics + in-house classifier models at scale + Error Feed for auto-clustered error monitoring + the eval-to-optimizer wire. LangSmith wins on named-prompt evaluator polish and on CI surface breadth across pytest, Vitest, and Jest.

3. Prompt management

Future AGI’s prompt surface ships versioning, environment-based deploys, and access control. The optimizer is the wedge here: agent-opt consumes eval scores and proposes the next prompt version automatically, so the typical “edit, deploy, watch, repeat” loop shortens to “watch the score, accept the optimizer’s suggestion, ship.” The prompt library is opinionated by design. Fewer review-and-collaboration knobs than LangSmith Hub’s Playground + Polly + Prompt Hub trio, which keeps the daily workflow tighter when prompts are being authored by agent-opt from eval data.

LangSmith’s prompt surface is the Playground + Polly + Prompt Hub trio. Templates support f-string and Mustache, versioning is first-class, deployments target environments, Polly autogenerates tool schemas from a prompt. Hub is mature, well-integrated with traces, and good for teams who write prompts collaboratively.

Verdict. LangSmith wins on shared human authoring. Future AGI wins on automated prompt updates from the optimizer, a different workflow entirely. The honest pattern many 2026 teams adopt is to keep both: LangSmith Hub for human-written prompts, Future AGI agent-opt to optimize production prompts at scale.

4. Self-host and deployment

Future AGI offers SaaS, BYOC, and Apache 2.0 OSS libraries you can deploy without the hosted product at all. traceAI, ai-evaluation, and agent-opt run anywhere Python or TypeScript runs. You can self-host the entire trace + eval + optimizer stack without an Enterprise contract. The hosted Agent Command Center is SaaS or BYOC. AWS Marketplace is live.

LangSmith self-host is gated to Enterprise. The v0.13 release (January 16, 2026) added IAM auth, mTLS, KEDA autoscaling, and IngestQueues on by default. AWS, GCP, Azure are supported BYOC. SDKs are MIT, but the platform (eval backend, prompt store, Studio IDE) isn’t source-readable.

Verdict. Future AGI wins, clearly. Apache 2.0 libraries plus BYOC plus AWS Marketplace plus no Enterprise gate on self-host. LangSmith’s v0.13 self-host is technically excellent, but the Enterprise procurement gate is the difference for teams who want source readability before signing.

5. Security and compliance

The Future AGI Protect model family runs inline at 65 ms text / 107 ms image median time-to-label (arXiv 2510.13351). Protect is FAGI’s own fine-tuned model family built on Google’s Gemma 3n with specialized adapters across four safety dimensions (content moderation, bias detection, security/prompt-injection, data privacy/PII), natively multi-modal across text, image, and audio. Inline at the request boundary, so PII detection and policy violations are caught synchronously. RBAC and audit logs are solid. SOC 2 Type II, HIPAA (BAA), GDPR, and CCPA are all certified. ISO 27001 is in active audit. BYOC and AWS Marketplace are both available.

LangSmith ships SOC 2 Type 2, HIPAA, and GDPR materials on the platform page. Enterprise SSO, deployed engineers, contractual SLAs, and four named alert destinations (PagerDuty, Dynatrace, Slack, generic webhook). LangSmith doesn’t ship an inline guardrail layer at the request boundary.

Verdict. Future AGI wins on inline runtime guardrails (the Future AGI Protect model family at sub-100 ms p50 across four safety dimensions, multi-modal), which LangSmith doesn’t ship at all, on Apache 2.0 source-readable instrumentation, and on certified GDPR + CCPA in addition to SOC 2 Type II + HIPAA. LangSmith ties on SOC 2 Type 2 + HIPAA + GDPR and wins on named-alert destination depth (PagerDuty, Dynatrace, Slack) for teams whose primary compliance surface is incident routing rather than runtime enforcement.

6. Developer experience

Future AGI’s DX is built around the iteration loop most AI teams actually run: write a rubric, watch the eval score, let the optimizer rewrite the prompt, ship the routing update. Four-language SDK fleet (Python, TS, Java, C#); 50+ framework integrations under traceAI; eval and optimizer UIs are strong. The prompt library is opinionated by design. Versioning and access control ship today. The optimizer is the wedge, with agent-opt writing the next prompt version from eval signal.

LangSmith’s DX is mature. Four SDKs (Python, TS, Go, Java), four CI paths (pytest, Vitest, Jest, LangSmith CLI), MCP support, named integrations for Claude Code, OpenAI Codex, OpenCode, and Instructor, plus the Fleet visual builder. The hosted experience is polished and the LangGraph-native trace UI is best in category for LangChain teams.

Verdict. LangSmith wins on raw DX surface breadth (Fleet, Polly, Studio IDE, Go SDK). Future AGI wins on the eval-to-optimizer daily workflow, Apache 2.0 readability, and C# coverage.

Pricing snapshot: May 2026

Pulled from each vendor’s pricing page on May 17, 2026.

Tier	Future AGI	LangSmith
Free	100K traces/month, basic eval + routing, no SSO	Developer $0/seat/mo, 5,000 base traces, 1 seat, 1 Fleet agent, 50 Fleet runs/mo
Scale / Pro	$99/mo, 10M traces, full eval suite, `agent-opt`, RBAC	Plus $39/seat/mo, 10,000 base traces, unlimited seats, 1 dev deployment, 500 Fleet runs/mo, $0.05 extra Fleet run
Overage	Per-trace at Scale; optimizer + guardrails + gateway included in one bill	Base traces $2.50 / 1,000 (14-day retention), extended $5.00 / 1,000 (400-day retention)
Enterprise	Custom; SOC 2 Type II, HIPAA (BAA), GDPR, CCPA certified; ISO 27001 in active audit; BYOC; AWS Marketplace	Custom; cloud, hybrid, self-hosted; custom SSO, SLA, deployed engineers; HIPAA + SOC 2 Type 2 + GDPR

The shapes don’t line up cleanly. LangSmith prices per seat plus per trace, with Fleet runs metered separately at $0.05 each. Future AGI prices the whole runtime in one bill: trace + eval + optimizer + gateway + inline guardrails for $99/mo at Scale. At enterprise volume, both negotiate. For continuous production workloads, the optimizer typically pays for itself in routing savings within four weeks of live trace data flowing. agent-opt is opt-in: turn it on once you have eval baselines and live traces; until then, traceAI + ai-evaluation carry the daily value. AWS Marketplace listing is a procurement shortcut on the Future AGI side. Deployed engineers are LangSmith’s.

Where each one falls short

Future AGI: three deliberate tradeoffs

Prompt-authoring UI is opinionated by design. LangSmith Hub’s Playground + Polly + Prompt Hub trio is the sharpest shared-human-authoring surface in the category. Future AGI ships versioning and access control with fewer collaboration knobs because the optimizer is the wedge: agent-opt writes prompts from eval signal, so the typical “edit-deploy-watch” loop shortens to “watch the score, accept the suggestion, ship.” Teams running large multi-author prompt libraries should preview the workflow before standardizing.
API-first agent surface, not drag-and-drop. LangSmith Fleet covers the no-code internal-tool agent builder. Future AGI’s surface is the OpenTelemetry-instrumented agents your team writes in Python, TypeScript, Java, or C#. First-class for production-grade agent runtimes, deliberately not the answer for non-engineer drag-and-drop builders. Pair the two if both matter.
Federal procurement runs through BYOC. FedRAMP authorization is on the partner roadmap. Today, federal SOC procurement is supported via air-gapped self-host in the agency VPC. Agencies on a current FedRAMP-required calendar should plan around the BYOC path.

Three deliberate tradeoffs in pursuit of the closed loop. Every one has a clear path or workaround for buyers who need it today.

LangSmith: four honest limitations

No optimizer. Polly proposes prompt rewrites in the UI but doesn’t roll them into the next gateway request. There is no agent-opt-style closed loop.
No gateway, no router. LangSmith captures traces and runs evals. It doesn’t sit in the request path as a router with fallbacks, virtual keys, or routing policies. You bring your own gateway downstream.
No inline guardrails. No sub-100 ms PII redactor or prompt-injection filter at the request boundary.
Closed-source platform with Enterprise-only self-host. SDKs are MIT but the eval backend, prompt store, and Studio IDE aren’t. Self-host is technically excellent (v0.13) but gated to the Enterprise contract.

Choose Future AGI if

Your runtime workload needs to keep improving on its own. The optimizer rewrites prompts and routing policies from eval outcomes. Storing them isn’t enough.
You want trace + eval + optimizer + gateway + inline guardrails in one runtime, on one bill, with Apache 2.0 OSS libraries you can self-host without an Enterprise contract.
Your stack spans Python, TypeScript, Java, or C#, and uniform OpenInference spans across all of them matter.
Inline AI guardrails at sub-100 ms latency at the request boundary are a requirement.
You procure through AWS Marketplace.

Choose LangSmith if

Your runtime is LangChain or LangGraph and you want native graph-execution semantics in every trace.
You need HIPAA on the procurement checklist today plus SOC 2 Type 2 and GDPR.
Go is part of your stack and you need a first-party Go SDK.
The Fleet no-code agent builder + Arcade MCP integrations accelerate internal-tool agents.
Shared human authoring of prompts in a polished hub (Playground + Polly + Prompt Hub) is your daily workflow.

Verdict matrix: when to pick which

Situation	Best pick	Why
Self-improving runtime: gateway updates its own prompts and routes from outcomes	Future AGI	`agent-opt` + Agent Command Center is the closed loop LangSmith doesn’t implement
Trace + eval + optimizer + gateway + inline guardrails in one bill	Future AGI	One product covers the runtime; LangSmith covers one layer
Inline AI guardrails at sub-100 ms (prompt injection, PII)	Future AGI	Future AGI Protect (own Gemma 3n model family with four safety adapters; multi-modal text/image/audio) at 65 ms text median time-to-label; LangSmith doesn’t ship a guardrail layer
Continuous evaluation across production traffic	Future AGI	`ai-evaluation` SDK (Apache 2.0, 60+ EvalTemplate classes, 13 guardrail backends including 9 open-weight Llama Guard / Qwen3Guard / Granite Guardian / WildGuard / ShieldGemma plus 4 API backends, and 8 fast Scanners) layered with the Future AGI Platform (self-improving evaluators, in-product agent authoring from natural language to rubric, lower per-eval cost than Galileo Luna-2) + unlimited custom evaluators authored by an in-product agent + self-improving rubrics + in-house classifier models at lower per-eval cost than Galileo Luna-2
Auto-clustered agent error monitoring	Future AGI	Error Feed is zero-config, auto-clusters traces into named issues with auto-analysis and fix recommendations
Apache 2.0 self-host with no Enterprise gate	Future AGI	The three OSS libraries are genuinely Apache 2.0
AWS Marketplace procurement path	Future AGI	Live on Marketplace; LangSmith is direct contract
Certified SOC 2 Type II, HIPAA, GDPR, CCPA for regulated buyers	Future AGI	Trust page lists all four certified today; ISO 27001 in active audit
LangGraph-native trace UI with one span per graph node	LangSmith	The deepest LangGraph integration in the category
First-party Go SDK	LangSmith	The only product of the two with Go covered
No-code internal-tool agent builder	LangSmith Fleet	Template-based agents with Arcade MCP integrations
Shared human authoring of prompts in a polished hub	LangSmith	Playground + Polly + Prompt Hub trio

Migration playbook: LangSmith to Future AGI

If you’re already on LangSmith, the migration doesn’t have to be a rip-and-replace.

Pattern 1: Compose. Keep LangSmith Hub for human authoring and Polly as the editorial assistant. Drop traceAI (Apache 2.0) into the application code alongside the LangSmith SDK. The two coexist without duplicating instrumentation. Layer ai-evaluation on captured traces, graduate to agent-opt for the closed loop, add Protect as the inline guardrail.

Pattern 2: Migrate. Export prompts via LangSmith’s versioned-export API and re-import into Future AGI’s prompt store. Replace LangSmith trace instrumentation with traceAI. OpenInference spans are a superset, so the migration is additive. Map openevals named-prompt evaluators (CORRECTNESS_PROMPT, CONCISENESS_PROMPT, exact-match) to the equivalent ai-evaluation metrics. Re-point CI pipelines from LangSmith pytest gates to ai-evaluation pytest gates. Wire agent-opt against the migrated dataset. The first optimizer run typically produces a measurable improvement within two weeks of trace ingestion.

What’s hard to migrate. Fleet agents don’t have a direct Future AGI equivalent. If your team has Fleet agents in production, keep those on LangSmith and run Future AGI under the application-code agents in parallel.

Pricing comparison: effective rate at scale

For greenfield and mid-stage workloads above 100K traces/month, LangSmith’s per-1K trace pricing ($2.50 / 1K at 14-day retention, $5.00 / 1K at 400-day) plus per-seat Plus ($39/seat/mo) is the dominant cost line. Future AGI’s flat Scale tier ($99/mo) includes the trace store, the full eval suite, agent-opt, the gateway, and the Future AGI Protect model family inline. At 5M traces/month with eight developers, Future AGI is ~$99/mo flat. LangSmith Plus is ~$12,900/mo with seats and trace overages combined. Below 100K traces and where LangChain Hub is the highest-value surface, LangSmith Plus is a credible buy. For regulated HIPAA workloads, both ship the paperwork today. Future AGI is certified on SOC 2 Type II, HIPAA, GDPR, and CCPA. ISO 27001 is in active audit. LangSmith Enterprise ships SOC 2 Type 2, HIPAA, and GDPR.

How the loop changes the math

The closed loop in practice: traceAI emits a span tree for every request, ai-evaluation scores each turn against rubrics drawn from a 50+ built-in catalog plus any custom evaluator your team authors (generated and tuned by an in-product eval-authoring agent that uses tool calling on your code), every evaluator self-improves from live production traces, and FAGI’s in-house classifier models score continuously at very low cost-per-token (lower per-eval cost than Galileo Luna-2). Error Feed runs alongside as the zero-config error monitor, auto-clustering related failures into named issues with auto-generated root-cause analysis. Low-scoring sessions cluster by failure mode, agent-opt rewrites the prompt or routing policy, Agent Command Center applies the update on the next request, and the new version auto-rolls back if the score regresses. The Future AGI Protect model family (Gemma 3n + fine-tuned adapters, four safety dimensions, multi-modal) enforces inline at 65 ms text / 107 ms image median time-to-label (arXiv 2510.13351).

Net effect for continuous production workloads: 15-30% cost reduction within four weeks of live trace data flowing, with no change to developer behavior required. The router learns to pick the cheaper model for easy turns, the optimizer rewrites over-prompting, the eval data tells the loop where to focus. This is the math LangSmith leaves on the table because it’s an observation + eval layer by design. For LangSmith customers, the practical pattern is to keep LangSmith as the LangGraph-native trace store and Hub for shared prompt authoring, and add Future AGI as the optimizer + gateway + guardrail layer downstream. For wider context, the Best LLM Observability and Tracing Gateways for 2026 listicle covers the cohort, and the Future AGI vs Langfuse comparison covers the OSS-vs-self-improving-runtime axis specifically.

Sources

LangSmith pricing, langchain.com/pricing
LangSmith docs, docs.langchain.com/langsmith
LangSmith Fleet rename announcement (March 19, 2026), changelog.langchain.com/announcements/agent-builder-is-now-langsmith-fleet
LangSmith Self-Hosted v0.13 (January 16, 2026), changelog.langchain.com/announcements/langsmith-self-hosted-v0-13
LangSmith integrations, docs.langchain.com/langsmith/integrations
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Future AGI traceAI (Apache 2.0), github.com/future-agi/traceAI
Future AGI ai-evaluation (Apache 2.0), github.com/future-agi/ai-evaluation
Future AGI agent-opt (Apache 2.0), github.com/future-agi/agent-opt
Future AGI Protect latency, arxiv.org/abs/2510.13351
AWS Marketplace listing for Future AGI, aws.amazon.com/marketplace

Frequently asked questions

What is the main difference between Future AGI and LangSmith?

Future AGI is a self-improving runtime — trace + eval + optimizer + gateway + inline guardrails in one product, with Apache 2.0 building blocks plus the Agent Command Center as the hosted control plane. LangSmith is LangChain Inc.'s closed-source eval + observability + agent platform with first-class LangGraph semantics, four SDK languages, the no-code Fleet builder, and HIPAA + SOC 2 Type 2 + GDPR. Future AGI updates the runtime; LangSmith watches it.

Is Future AGI open-source? Is LangSmith open-source?

Future AGI's three building blocks (`traceAI`, `ai-evaluation`, `agent-opt`) are Apache 2.0; the hosted Agent Command Center is the closed-source control plane on top. LangSmith's SDKs are MIT, but the platform — eval backend, prompt store, Studio IDE — is closed source and self-host is Enterprise-only.

Does LangSmith have an optimizer or a gateway?

No. LangSmith is an observation + eval + agent-builder layer. Polly proposes prompt rewrites in the UI but does not roll them into the next gateway request. There is no routing layer. Closed-loop optimization and inline sub-100 ms guardrails are not part of the product.

Can I run Future AGI alongside LangSmith instead of replacing it?

Yes. Keep LangSmith Hub for human prompt authoring and the LangGraph-native trace UI, drop `traceAI` into application code, layer `ai-evaluation` and `agent-opt` on top, and route through Agent Command Center with Protect inline. The libraries are gateway-agnostic and compose with LangSmith without duplicating instrumentation.

How does pricing compare?

Future AGI: free tier (100K traces/month), Scale at $99/mo (10M traces + full eval + `agent-opt` + RBAC + gateway + Protect), Enterprise custom with AWS Marketplace. LangSmith: Developer $0/seat/mo, Plus $39/seat/mo, base traces $2.50 / 1K (14-day), extended $5.00 / 1K (400-day), Enterprise custom with HIPAA. Above 5M traces/month, Future AGI's flat Scale tier is dramatically cheaper than LangSmith's per-1K trace pricing.

What is the alternative if neither fits?

For an MIT-licensed framework-agnostic OSS, Langfuse. For a hosted enterprise gateway with mature virtual keys, Portkey. For an Elastic-License-2.0 OpenTelemetry + OpenInference workbench, Arize Phoenix. For hosted closed-loop eval without owning infra, Braintrust.

View all

Guides

Evaluating LiteLLM Multi-Provider Apps in 2026

How to evaluate LiteLLM-routed apps: paired comparison across providers on your data, tool-call parity, latency parity, and the gateway alternative.

Vrinda Damani · May 17, 2026

12 min

Guides

Future AGI vs LiteLLM in 2026: Self-Improving Runtime vs OSS Python Proxy

Future AGI vs LiteLLM scored on routing, observability, cost attribution, security, deployment, DX. Honest verdict, March 2026 PyPI compromise context.

NVJK Kartik · May 15, 2026

16 min

Guides

Future AGI vs Portkey in 2026: Self-Improving Runtime vs Hosted Gateway

Future AGI vs Portkey scored on routing, observability, cost attribution, security, deployment, DX. Why FAGI wins the self-improving loop, post-PANW note.

NVJK Kartik · May 15, 2026

17 min

TL;DR: capability snapshot

What changed in 2026

What each product actually is

Head-to-head on the six axes

1. Tracing

2. Evaluation

3. Prompt management

4. Self-host and deployment

5. Security and compliance

6. Developer experience

Pricing snapshot: May 2026

Where each one falls short

Future AGI: three deliberate tradeoffs

LangSmith: four honest limitations

Choose Future AGI if

Choose LangSmith if

Verdict matrix: when to pick which

Migration playbook: LangSmith to Future AGI

Pricing comparison: effective rate at scale

How the loop changes the math

Related reading

Sources

Frequently asked questions