How is security different from safety in AI?

Safety focuses on preventing harmful outputs and risky actions during normal operation. Security focuses on protecting against intentional attack: prompt injection, model extraction, data poisoning, supply-chain compromise. The two overlap but are not identical.

How do you measure AI security?

FutureAGI measures it with PromptInjection and ProtectFlash on inputs and retrieved context, ActionSafety on agent trajectories, PII on outputs and tool arguments, and security-detector evaluators for code-level vulnerabilities.

What Is Security (AI/ML)? Definition & FutureAGI Guide (2026)

What Is Security (AI/ML)?

Security in AI is the discipline of protecting AI systems and their data from intentional attack and accidental exposure. It spans five layers: model security (extraction, inversion, poisoning), prompt-layer security (injection, jailbreaks), data security (PII exposure, training-data leakage), tool-layer security (excessive agency, code injection), and infrastructure security (supply chain, secrets, dependencies). In production LLM and agent stacks, security shows up as detectors at every seam, evaluators on traces, and gateway guardrails. FutureAGI implements security with PromptInjection, ProtectFlash, ActionSafety, PII, and the security-detector evaluator family.

Why It Matters in Production LLM and Agent Systems

Security in AI is asymmetric: a single successful attack costs more than years of incremental quality work. The surface is wide and the attackers are creative. Prompt injection, jailbreak chains, indirect injection through retrieved documents, training-data extraction, model extraction via API queries, supply-chain compromise of inference dependencies, code injection through tool arguments — each is a documented class with public proof-of-concept exploits.

Engineers feel this when they see exotic input patterns in logs and cannot tell whether they are noise or a probe. SREs see traffic spikes that don’t map to a marketing event. Compliance leads need evidence of which controls ran on which request — and the answer cannot be “we trust the model.” Product teams discover an attack worked when a customer’s data appears in a third-party log or a tool action is taken without consent.

In 2026, AI security is the OWASP-defined top concern for LLM and agent applications. Useful production symptoms include rising PromptInjection scores on retrieved context, new dangerous-action patterns in ActionSafety findings, PII matches in tool arguments, security-detector hits on generated code (SQLInjectionDetector, XSSDetector), unexpected outbound DNS resolutions from inference containers, and dependency-scan findings on transitive packages. Security has to be operationalised as continuous detection, not a quarterly audit.

How FutureAGI Handles Security

FutureAGI’s approach is to layer detectors and evaluators at every seam where attacks can hide. At the input edge, PromptInjection and ProtectFlash score user input and retrieved context. At the agent step, ActionSafety scores trajectories for dangerous tool calls. At the output, ContentSafety and PII check responses and tool arguments. For generated code, the security-detector family runs static analysis: SQLInjectionDetector, XSSDetector, CodeInjectionDetector, CommandInjectionDetector, PathTraversalDetector, HardcodedSecretsDetector, UnsafeDeserializationDetector, WeakCryptoDetector, XXEDetector, SSRFDetector. Each detector maps to a CWE so compliance evidence is concrete.

A practical example: a coding-assistant agent generates code that gets executed in a sandbox. The team configures Agent Command Center with a pre-guardrail that runs PromptInjection on input and a post-guardrail that runs the security-detector family on every code response. traceAI-openai-agents writes spans with the prompt, generated code, detector scores, and execution result. When eval-fail-rate-by-cohort rises on a cohort that includes shell-related prompts, the team uses the trace evidence to localise the failure to a specific detector and prompt pattern.

Unlike a single Lakera-style filter at the output, FutureAGI’s evaluators run at every layer, and security findings sit on the same trace as quality and cost metrics. The next engineering action is operational: alert, fallback, regression set, or dependency rotation.

How to Measure or Detect It

Security in AI is measured by signals at each layer:

Prompt-layer — PromptInjection and ProtectFlash scores on user input and retrieved context.
Agent-layer — ActionSafety findings per trajectory; severe findings page on-call.
Data-layer — PII matches in inputs, outputs, and tool arguments.
Code-layer — security-detector pass rate on every generated-code response (CWE mapping for compliance).
Infra-layer — dependency-scan findings, signed-artefact verification, secrets-rotation freshness.

from fi.evals import PromptInjection, SQLInjectionDetector, PII

inj = PromptInjection()
sqli = SQLInjectionDetector()
pii = PII()

scores = {
    "prompt": inj.evaluate(input=user_input, retrieved=retrieved_text),
    "sqli": sqli.evaluate(code=generated_code),
    "pii": pii.evaluate(output=response, tool_args=tool_args),
}

If your traces lack per-step security scores, the auditor will get a story instead of evidence. We’ve found that teams with per-layer scores resolve incidents in hours; teams with output-only checks negotiate impact for days while reading raw prompt logs.

Common Mistakes

Treating security as a single filter at the output. Real systems fail at inputs, retrieved context, tool arguments, and generated code — each needs its own check.
Confusing safety with security. Safety prevents harmful outputs in normal operation; security defends against intentional attack. You need both.
Trusting the model’s training-time alignment. Alignment is not a security control; runtime detectors are.
Skipping the security-detector family on generated code. Code-generating agents produce SQL, shell, and script payloads that other tools will execute; scan every output.
No regression suite of red-team prompts. Without a versioned attack-set, you cannot know if a model swap regressed security.