Models

What Is the KYC Process?

A regulated identity-verification workflow that combines document checks, biometrics, sanctions screening, and risk scoring, increasingly powered by AI extractors and classifiers.

What Is the KYC Process?

The Know Your Customer (KYC) process is a regulated identity-verification workflow used by banks, fintechs, and other high-risk providers to confirm a customer is who they claim, screen them against sanctions and politically-exposed-person lists, and continuously assess risk. A typical flow captures an identity document, runs OCR, performs a biometric face-match against a selfie, verifies address, screens adverse media, and scores risk. Modern KYC outsources much of that pipeline to AI models — and that is where LLM and vision-model evaluation becomes a compliance requirement, not a nice-to-have.

Why It Matters in Production LLM and Agent Systems

KYC is the highest-stakes deployment surface most fintech teams ship. A wrong yes-decision admits a fraudster; a wrong no-decision drops a real customer and may trigger fair-lending complaints. The AI components inside the pipeline carry that risk silently — an OCR model that mis-reads “1990” as “1900” approves an underage applicant; a face-match scorer with demographic bias rejects darker-skinned customers at higher rates; an LLM-based adverse-media summariser hallucinates a sanctions hit and freezes an innocent account.

The pain shows up across roles. A compliance lead is asked, mid-audit, “how do you know the OCR model has not regressed since last quarter?” and has no per-cohort accuracy chart to point at. An ML engineer ships a new vision model that improves global accuracy by 2 points but worsens accuracy on one passport country by 11 points; nobody notices for six weeks. A product manager debugs a fraud spike and finds the LLM summariser is missing recent media because the retriever’s index is stale.

In 2026 KYC stacks the AI surface keeps growing — agentic workflows triage cases, write reviewer notes, and propose decisions for human sign-off. Each step is an evaluable model with an audit obligation. Treating KYC AI as black-box vendor magic is no longer defensible.

How FutureAGI Handles KYC AI Components

FutureAGI does not run KYC end-to-end — it is not an identity-verification vendor. What it does is evaluate the AI components inside a KYC pipeline so compliance teams have evidence. Three places it slots in:

  1. Document extraction quality. Wrap the OCR or vision-language extractor with fi.evals.FactualAccuracy and FieldCompleteness against a labelled Dataset of passports and IDs. Run a regression eval on every model version; surface per-document-type and per-country cohort scores in the dashboard.
  2. Bias detection on the risk model. The BiasDetection evaluator scores LLM or classifier outputs across protected-attribute cohorts; ship a daily report showing approval-rate parity by age band, gender, and country code. Pair with human-in-the-loop annotation queues for ambiguous cases.
  3. PII protection on logs and prompts. PII and DataPrivacyCompliance flag any KYC document text that leaks into trace payloads, system prompts, or eval-output reports. The audit log captures every block as a span_event, giving regulators a deterministic replay.

Concretely: a fintech team instruments their LLM-based adverse-media summariser via traceAI, runs Faithfulness and Groundedness against retrieved articles, and gates production releases on cohort fail-rate. When BiasDetection flags rising disparate impact on a country cohort, the team retrains the upstream classifier rather than tuning the LLM prompt — FutureAGI surfaced the issue at the right layer.

How to Measure or Detect It

KYC AI quality is multi-evaluator; pick metrics for each component:

  • FactualAccuracy — for document-extraction outputs against ground-truth labels.
  • FieldCompleteness — coverage of required fields in extracted JSON.
  • BiasDetection — disparity scores across protected-attribute cohorts.
  • PII — leak detection on every prompt, response, and log line.
  • DataPrivacyCompliance — broader compliance template covering retention and access rules.
  • Per-cohort approval-rate dashboard — tracks fairness drift by age, gender, country.
  • Reviewer-disagreement rate — proxy for model uncertainty when humans override AI decisions.
from fi.evals import PII, BiasDetection

pii = PII()
bias = BiasDetection()

extracted = {"name": "Jane Doe", "dob": "1990-01-01", "ssn": "123-45-6789"}
print(pii.evaluate(output=str(extracted)))
print(bias.evaluate(input="risk-score-explanation", output="..."))

Common Mistakes

  • Treating the OCR model as deterministic. Vendor models update silently; regression-test extraction on a versioned dataset weekly.
  • Measuring overall accuracy without cohort slices. A 96% global score can hide a 78% score on one country and trigger fair-lending exposure.
  • Storing raw KYC documents in trace payloads. Use redaction at trace ingest; never debug from production identity images.
  • Letting the LLM summariser pull from an unbounded retriever. Adverse-media hallucination starts when the retriever returns nothing relevant and the model makes something up.
  • Skipping human-in-the-loop on low-confidence cases. A confidence-threshold queue is the difference between an audit-friendly and an audit-disastrous KYC stack.

Frequently Asked Questions

What is the KYC process?

KYC (Know Your Customer) is a regulated workflow for verifying a customer's identity, screening against sanctions and PEP lists, and assessing risk. Modern KYC uses AI components for document OCR, face-match, and risk scoring.

How is KYC different from KYB?

KYC verifies an individual customer; KYB (Know Your Business) verifies a business entity, including its directors, beneficial owners, and corporate structure. Both run similar risk and screening logic but on different data sources.

How do you measure AI quality inside a KYC pipeline?

Score the document extractor with FactualAccuracy and FieldCompleteness, the bias profile of the risk model with BiasDetection, and the redaction layer with PII. Track per-cohort fail rates so demographic skew surfaces before regulators do.