A clique is a subset of graph vertices where every vertex connects directly to every other vertex, forming a fully connected subgraph.

Why are cliques used in machine learning?

Clique detection underpins community detection, deduplication, knowledge-graph construction, and motif analysis. In LLM stacks, cliques often appear in knowledge graphs and embedding-similarity graphs to identify clusters of strongly-related entities or queries.

How does FutureAGI use cliques?

FutureAGI treats clique IDs as trace and evaluation cohorts. `EmbeddingSimilarity` checks cluster coherence, while `Groundedness` checks whether LLM responses stay faithful to retrieved clique context.

Clique (Graph Theory): Definition for LLM Graphs

What Is a Clique?

A clique is a subset of graph vertices where every vertex is directly connected to every other vertex. In AI reliability, it is a model-graph concept that shows up in knowledge graphs, embedding-similarity graphs, retrieval clusters, and agent tool-dependency graphs. A clique often marks a set of entities, chunks, or tools that should be evaluated together because the system treats them as tightly related. FutureAGI uses clique identifiers as downstream cohorts for tracing and evaluation, not as a graph-mining engine.

Why Clique Matters in Production LLM and Agent Systems

Modern LLM stacks lean heavily on graph structure. A knowledge graph powering a RAG system has nodes for entities and edges for relationships; cliques inside that graph represent tightly-coupled entity sets. for example, the set of diseases all sharing the same symptom and the same drug. Retrieving the clique often gives better grounding than retrieving any single node. Embedding similarity graphs (k-NN graphs over chunk embeddings) use clique detection to identify near-duplicate chunks that should be deduplicated before retrieval. Agent tool graphs use clique structure to spot tools that always co-occur in a successful trajectory, suggesting routing pre-loads.

The pain of getting graph structure wrong shows up across roles. A RAG engineer indexes 200K chunks without dedup; cliques of near-identical chunks dominate top-k retrieval and groundedness on long-tail questions tanks. A product lead reviews “knowledge graph quality” and finds the largest community is actually a clique of stale entities that should have been merged. A platform engineer debugging an agent finds three tool calls per trajectory are redundant because they form a clique of always-called functions.

Unlike PageRank, which scores node importance across a graph, clique detection asks whether a local subgraph is fully connected. Unlike Ragas faithfulness, it is not an answer-quality metric; it is an upstream structure signal that still needs downstream evaluation.

In 2026 agent stacks, cliques surface in agent-memory graphs (agent-workflow-memory) where strongly-coupled past trajectories form replay clusters. Detection feeds back into retrieval design.

How FutureAGI Handles Clique-Derived Structures

FutureAGI’s approach is to treat clique-derived groups as evaluation cohorts, then test whether the model response remains grounded when that cohort is retrieved, summarized, or used by an agent. FutureAGI does not implement clique detection directly. It sits downstream of graph-construction pipelines: when your knowledge graph, embedding graph, or trajectory graph groups items by clique, FutureAGI evaluates whether the resulting LLM behavior preserves the cluster’s semantics.

Concretely: a healthcare-RAG team builds a knowledge graph over its drug-interaction corpus, runs Bron-Kerbosch to identify cliques of co-mentioned drugs, and uses each clique as a retrieval unit. They wire the chain through traceAI-langchain and log clique_id, retrieval.cluster_size, and llm.token_count.prompt on the retrieval span. FutureAGI’s Groundedness evaluator scores responses against the retrieved clique; the dashboard plots groundedness by clique ID, exposing which clusters produce supported answers and which produce confabulated drug interactions. When a new chunk is added to the corpus, the team reruns clique detection and uses EmbeddingSimilarity to verify the new clique’s coherence before promoting it to retrieval.

For embedding-graph deduplication, EmbeddingSimilarity scores pairwise semantic similarity before k-NN graph construction. If a new clique drops below the team’s threshold, the engineer blocks the retrieval index update, reviews the source chunks, and reruns a regression eval before the cluster reaches production.

How to Measure or Detect It

Clique-derived structure is graded by downstream evaluation plus intrinsic graph metrics:

EmbeddingSimilarity (fi.evals): pairwise 0–1 similarity used to build similarity graphs and find cliques in offline batch.
Groundedness: per-response 0–1 score against retrieved context; segment by clique-id to spot bad clusters.
Trace fields: log clique_id, retrieval.cluster_size, and llm.token_count.prompt so poor clusters can be tied to cost and context length.
Clique density: edges/(n choose 2). should be 1.0 for a true clique; degraded values indicate noisy graph construction.
Clique stability: do the same cliques recur across re-runs over the same corpus? Variance suggests embedding instability.
Per-clique retrieval recall: fraction of relevant chunks captured when retrieving the whole clique vs. top-k nearest neighbours.

from fi.evals import EmbeddingSimilarity

sim = EmbeddingSimilarity()
result = sim.evaluate(
    text_a="Aspirin and ibuprofen interaction risk",
    text_b="NSAID drug interaction warning",
)
print(result.score, result.reason)

Common Mistakes

Treating any dense subgraph as a clique. A near-clique with 95% edges is useful but not the same; pick the algorithm to match.
Running maximum-clique on graphs that are too large. Bron-Kerbosch is exponential worst case; cap graph size or use approximations.
Building cliques on cosine similarity without thresholding. Every node connects weakly to every other; all-pairs forms a trivial clique.
Skipping clique-stability checks. Embedding model upgrades shift similarity scores enough to reshape cliques; verify before redeploy.
Using cliques as retrieval units without grounding eval. A coherent clique upstream does not guarantee a grounded answer downstream.