What Are Support Vector Machines (SVM)? Definition (2026)

What Are Support Vector Machines (SVM)?

Support vector machines, formalised by Vapnik and colleagues in the 1990s, are a family of supervised algorithms that find the maximum-margin hyperplane separating classes (or fitting a regression) in feature space. The points closest to the boundary — the support vectors — define the decision surface. The kernel trick lets the algorithm operate in arbitrarily high-dimensional spaces without ever explicitly computing the lifted features. Variants cover the four supervised regimes: binary SVM, multi-class SVM (one-vs-rest, one-vs-one, error-correcting output codes), support vector regression (SVR) for continuous targets, and one-class SVM for novelty/anomaly detection.

Why It Matters in Production LLM and Agent Systems

LLM stacks have a class of decisions that are too frequent or too latency-sensitive for an LLM judge call but still need a learned classifier. Routing a query between cheap and expensive models. Gating user input through a binary safety filter. Detecting out-of-distribution traffic at ingress. Predicting a confidence or effort score over an embedding. These decisions multiply: in a six-step agent, ten such decisions per request is normal — and an LLM judge for each would dwarf the agent’s own cost.

The pain shows up across roles. A platform engineer’s cost-optimized-routing policy uses a 7B model to make a routing decision that ought to take 200 microseconds. A safety lead wants real-time toxicity gating but cannot afford the LLM-judge per-call cost. A reliability engineer notices p99 latency dominated by classifier-LLM calls that would be irrelevant if a tiny SVM handled the work.

In 2026, the SVM family is the right tool for high-frequency edge decisions over LLM embeddings. They do not replace deep models for the reasoning layer; they sit at the perimeter, doing the cheap deterministic work, freeing the LLM budget for tasks that need it. Multi-step agents and human-on-loop workflows compound this — every step that can be gated, routed, or filtered with an SVM is an LLM call that does not need to be made.

How FutureAGI Handles SVM-Based Decisions

FutureAGI’s approach is to evaluate the LLM outputs and embeddings that flow through SVM-based gates and routers, and surface whether the SVM is helping or hurting downstream metrics. The platform integrates at three layers: embedding stability (EmbeddingSimilarity), classifier accuracy (GroundTruthMatch against a labelled Dataset test split), and end-to-end downstream eval (AnswerRelevancy, TaskCompletion, per-route slicing).

Concretely: a multi-tenant agent on traceAI-openai-agents uses an SVM ensemble — a one-class SVM at ingress to flag novel inputs, a multi-class SVM to bucket intent, and an SVR head to predict per-trace expected token cost (used by the cost-optimized-routing policy). Each SVM emits a span attribute (novelty.is_ood, route.intent, cost.predicted_tokens) so the tracing dashboard can compare every downstream metric across SVM-decided paths. After an embedding-model swap, EmbeddingSimilarity flags drift, the team retrains the SVMs against the new embeddings, and GroundTruthMatch against the held-out Dataset confirms no regression before promotion.

For one-class novelty detection, the simulate-sdk’s Persona injects out-of-distribution payloads — adversarial prompts, atypical phrasing — to confirm the detector flags them. This becomes a regression test for the SVM itself. Unlike scikit-learn’s standalone OneClassSVM, which a team operates as a script, FutureAGI keeps the classifier decision, downstream eval score, and route in one auditable trace.

How to Measure or Detect It

GroundTruthMatch: returns binary or scored match against the labelled gold class for binary and multi-class SVMs.
Per-class precision and recall: the canonical multi-class SVM diagnostic.
EmbeddingSimilarity: drift signal on the SVM’s embedding input.
SVR mean absolute error: regression-variant accuracy signal for predicted continuous targets.
One-class detection rate (dashboard): proportion of incoming traffic flagged as out-of-distribution; track over time for distribution shifts.
Downstream eval lift per SVM-decided route: AnswerRelevancy and TaskCompletion sliced by route, vs a no-routing baseline.

from fi.evals import GroundTruthMatch, EmbeddingSimilarity

match = GroundTruthMatch()
sim = EmbeddingSimilarity()

result_a = match.evaluate(output="technical", expected_response="technical")
result_b = sim.evaluate(text_a="App keeps crashing", text_b="The application freezes")
print(result_a.score, result_b.score)

Common Mistakes

Choosing kernel by reflex. RBF is the popular default; check linear and polynomial empirically — the right kernel often beats RBF on tabular embeddings.
Conflating SVR with SVM classification. They have different loss functions and different evaluation metrics; do not benchmark them with each other’s signals.
Skipping calibration. SVM outputs are decision functions, not probabilities; if you need probabilities, run Platt scaling or isotonic regression on top.
Training once, deploying forever. Embedding versions change, distributions shift; SVMs need a refresh schedule and a regression-eval gate.
Ignoring class imbalance. A 95/5 split lets a degenerate SVM hit 95% accuracy by always predicting the majority class.

Frequently Asked Questions

What are support vector machines (SVM)?

SVMs are a family of supervised learning algorithms that find the maximum-margin hyperplane separating classes, with kernel functions extending the technique to non-linear data and variants for classification, regression, and novelty detection.

What is the difference between SVM and SVR?

SVM (classification) finds the hyperplane separating classes. SVR (support vector regression) fits a function within an epsilon-tube around the data, treating points outside the tube as the support vectors that define the regressor.

Why use SVM in an LLM pipeline instead of a small transformer?

SVMs have sub-millisecond inference, deterministic outputs, no GPU requirement, and minimal labelled-data needs. For high-frequency routing or gating decisions over fixed embeddings, those properties beat what a tiny transformer offers.