Research

Self-Host LLMOps in 2026: Postgres, ClickHouse, and the Architecture Tradeoffs

Self-hosting LLM observability in 2026: Postgres vs ClickHouse, OTel collector, queue, blob storage, K8s footprint, ARM. Vendor-neutral architecture guide.

·
9 min read
self-hosted-llmops llm-observability clickhouse postgres otel-collector on-prem-llm kubernetes 2026
Editorial cover image on a pure black starfield background with faint white grid. Bold all-caps white headline SELF-HOST LLMOPS GUIDE fills the left half. The right half shows a wireframe vertical server-stack architecture diagram with four labeled racks APP, COLLECTOR, QUEUE, STORAGE. The bottom STORAGE rack has a soft white halo glow.
Table of Contents

A team needs LLM observability inside its perimeter because the workload handles regulated healthcare data. The procurement team rejects every hosted option. The eng team gets the green light to self-host. Three months later the team has a working stack: Langfuse on ClickHouse plus Postgres, an OTel collector with a redaction pipeline, a Redis queue for ingest fanout, and a small worker pool for online judges. Total infrastructure cost: $620 per month for 8M spans per day. Total engineering cost: roughly 15% of one engineer’s time. The team would pay 4x that hosted, but hosted was not an option.

This piece is the architecture guide for that team. Vendor-neutral. The components are the same regardless of which OSS platform you pick. Postgres or ClickHouse, queue or no queue, K8s footprint, ARM versus x86, blob storage, OTel collector configuration. Where the platform choice matters, this piece names the trade-offs honestly without shilling one stack over another.

TL;DR: Four layers, two storage choices, one collector

Self-hosted LLMOps is four layers. The application layer emits OTel-native spans via traceAI, OpenInference, or a vendor SDK. The OTel collector receives, filters, redacts, tail-samples, and routes. The queue (Redis Streams, Kafka) buffers ingest and fans async work to evaluators. The storage layer holds spans (ClickHouse for high volume, Postgres for low volume), control-plane data (Postgres), and content (S3 or MinIO blob).

The two storage decisions matter most. Postgres is operationally familiar but caps around 1-5M spans per day. ClickHouse is the standard past that volume but adds operational learning curve. Most production-scale OSS LLMOps platforms run both: ClickHouse for spans, Postgres for control plane.

If you only read one paragraph: self-hosting is a real cost in engineering time and infrastructure. Pencil it out against hosted pricing before committing. Past 100M spans per month or with hard residency requirements, self-hosting becomes the default.

When self-hosting makes sense

Three scenarios.

Compliance and residency. HIPAA, GDPR strict, FedRAMP, regulated finance, on-prem-only contracts. The data cannot leave the perimeter, and that rules out every hosted option. This is the most common reason teams self-host.

Volume and cost. Past 100M spans per month, hosted pricing climbs into a range where in-house engineering cost is competitive. Below 10M spans per month, hosted is almost always cheaper. The 10-100M range is where teams pencil out the trade.

Latency and locality. A workload in a region not served by the hosted vendor’s nearest backend. The latency from your application to the trace backend matters for tail-based sampling and online evaluator workflows.

When none of these apply, hosted is usually the right answer. Self-hosting takes engineering time you could spend on the workload itself.

The architecture in 2026

Editorial figure on a black background showing a vertical four-tier server stack diagram with arrows showing data flow: top APP / SDK (traceAI, OpenInference) feeding into OTEL COLLECTOR (filter, redact, sample) feeding into QUEUE (BullMQ / Redis Streams) feeding into bottom STORAGE (ClickHouse + S3 blob). Each tier shows a small thin rectangle with internal sub-boxes. To the right, an annotated note column lists K8s namespace, ARM-compatible, TLS internal, backup to S3. The bottom STORAGE tier has a soft white halo glow.

Application layer

Your services emit OTLP-native spans. Three viable instrumentation libraries:

  • traceAI. Future AGI’s OSS framework. 50+ integrations across Python, TypeScript, Java (including LangChain4j and Spring AI), and a C# core library. OTel-native, OTLP exporters, ships traces to any OTel backend. See the GitHub repository for the current compatibility matrix.
  • OpenInference. Arize’s framework. Around 31 Python instrumentation packages, 13 JavaScript, 4 Java. Complementary to OTel GenAI conventions.
  • Vendor SDK. Langfuse SDK, LangSmith SDK, Helicone SDK, etc. Some are OTel-native; some emit proprietary formats.

For self-hosting flexibility, prefer OTel-native instrumentation. A proprietary SDK couples your instrumentation to one backend; OTel-native lets you swap backends.

OTel collector

The collector is where most of the operational policy lives. Functions:

  • Receive. OTLP HTTP and gRPC endpoints. The application points at the collector.
  • Filter. Drop spans matching exclusion rules (health checks, internal endpoints).
  • Redact. Apply redaction processors to PII-bearing attributes (gen_ai.input.messages, gen_ai.output.messages). Deterministic placeholder per value.
  • Tail-sample. Buffer the full trace; decide at trace end. Keep all errors, low scores, top-cost, experiment cohorts; sample 5-20% of the rest.
  • Route. Send to one or more backends (your storage, plus optionally a vendor backend for parallel comparison).

The OTel collector contrib distribution ships the tail sampling processor, the attribute redaction processor, and many others. Configuration is YAML; deployment is a stateless K8s deployment that scales horizontally.

Queue

The queue absorbs spikes and decouples ingest rate from storage write rate. Two viable patterns:

  • Redis Streams. Used by Langfuse and others. Lightweight, easy to operate. BullMQ on top adds job queue semantics for async eval workers.
  • Kafka. The JVM-stack default. Higher throughput ceiling, more complex to operate.

The queue also fans out async work: online evaluators consume from the queue and produce score events; drift detectors consume from the queue and update rolling means; cost calculators consume from the queue and update per-user totals.

For workloads under 50M spans per day, Redis Streams is sufficient. Past that, Kafka starts to look better.

Storage

Two decisions: span storage and control-plane storage.

Span storage: Postgres or ClickHouse

Postgres. Familiar, durable, transactional. Works up to roughly 1-5M spans per day with careful indexing on (trace_id, span_id, parent_span_id, timestamp) and partitioning by day or by week. Past 5M spans per day, query latency degrades and storage cost climbs.

Workloads where Postgres-only is appropriate: small internal tools, prototype workloads, regulated environments where ClickHouse is not approved.

ClickHouse. Columnar analytical store, designed for the trace-search workload shape. Comfortably handles 100M+ spans per day on a single shard; sharded clusters scale further. Storage cost is 5-10x lower than Postgres for the same data. Query latency on common trace-search patterns is 10-100x faster.

The trade-off: ClickHouse has a steeper operational learning curve. Schema migrations are different (ALTER on a 100M-row table is operationally distinct). Backup and restore needs the right tooling. The right DBA-day-one investment is worth the ROI; without that, ClickHouse can become a source of operational risk.

Most production LLMOps platforms (Langfuse, Future AGI’s open-source stack) run ClickHouse for spans and Postgres for control plane. Phoenix runs Postgres-first, which fits its lighter-weight profile.

Control-plane storage

Postgres for users, projects, prompts, configs. Standard Postgres ops. Backup with pg_dump or wal-g; replication with logical or streaming.

Blob storage

S3 or MinIO for prompt and completion bodies. The body is large (kilobytes per span); storing it inline in ClickHouse or Postgres bloats the hot tables. Store an S3 reference in the span row; the body lives in object storage.

Lifecycle: hot for 30-90 days in S3 standard, then transition to S3 IA or Glacier. Trace queries that need the body re-fetch from object storage on demand.

Worker tier

The worker tier consumes from the queue and runs:

  • Online evaluators. Pull a span sample, run the judge, write the score back as a span attribute or a separate score event.
  • Drift detectors. Compute rolling-mean rubric scores per route, per version, per cohort; emit alerts.
  • Cost calculators. Aggregate token-cost per user, per tenant, per route; emit budget alerts.
  • Scheduled jobs. Daily cleanups, retention enforcement, calibration runs.

Workers are stateless; horizontal scaling adds throughput. Memory pressure depends on the judge model; the smallest distilled judges run in 50-150ms with modest memory footprint.

OSS LLMOps platforms compared honestly

PlatformLicenseStorageOTel postureFootprintNotes
LangfuseMIT (core)ClickHouse + PostgresOTLP-compatible (gateway)Mid (4-6 services)Largest OSS LLMOps community; ClickHouse is the heavy dependency
PhoenixELv2SQLite default; PostgreSQL for productionOTLP-nativeSmall (2-3 services)Lighter weight; better for low-volume workloads
Future AGI traceAI + OSS backendApache 2.0ClickHouse + PostgresOTLP-nativeMidTighter integration between instrumentation and backend
Helicone (self-host)Apache-2.0ClickHouse + PostgresOpenAI proxy SDK; check current OTLP support before adoptingMidGateway-first design; trace surface integrated with gateway
LunaryApache-2.0 sourcePostgresOTel via SDKSmallApache-2.0 source; production Docker self-host packaging is Enterprise-gated, verify before adoption

The license differences matter for some workloads. Apache 2.0 (Future AGI, Helicone, Lunary) is the most permissive. MIT (Langfuse core) is similar but watch the enterprise edition split. ELv2 (Phoenix) permits free self-hosting on your own infrastructure; it restricts offering Phoenix itself as a hosted/managed service and tampering with license-key or notice functionality.

The deploy footprint matters at scale. Langfuse and Future AGI’s stack and Helicone all assume ClickHouse, which is a meaningful operational dependency. Phoenix and Lunary on Postgres are lighter to operate but cap out at lower span volume.

What changed in self-hosted LLMOps in 2026

DateEventWhy it matters
2026OTel GenAI semantic conventions broadly supported by major instrumentation libraries (still Development status)Cross-platform compatibility at the trace layer
2026ARM64 images mainstream across all major LLMOps platformsGraviton and Apple Silicon support without compatibility shims
2026Tail-sampling processor in OTel collector contrib distribution (beta for traces) widely deployedOutcome-aware sampling moved closer to off-the-shelf
2026ClickHouse cloud and self-managed both saw schema improvements for trace dataSpan ingestion latency and query patterns improved
2026EU AI Act enforcement phase beganAudit-trail and PII-redaction requirements pushed more workloads to self-hosting

Common mistakes when self-hosting

  • Underestimating ClickHouse operational complexity. Schema changes on multi-billion-row tables are operationally distinct from Postgres. Plan for a DBA day-one.
  • Skipping the queue layer. A spike in trace volume without buffering drops spans or kills the storage. The queue is not optional past trivial volume.
  • Storing prompt bodies inline. Bloats hot tables. Use blob storage with S3 references.
  • No backup discipline. Backups for ClickHouse, Postgres, and the OTel collector configuration. Test restores quarterly.
  • Forgetting the control plane. Users, projects, prompts, configs. Treat with the same backup and HA discipline as the application data.
  • No tail-sampling configuration. Default head sampling at 1% hides the failures.
  • PII redaction at the client only. Defense in depth; redact at the collector as well.
  • No retention policy. Storage cost grows linearly with retention. Configure lifecycle from day one.
  • Single-tenant ClickHouse without sharding plan. Past 100M spans per day, the path forward is sharded ClickHouse; plan it before you need it.
  • Pinning to a vendor SDK. A proprietary SDK couples instrumentation to one backend; OTel-native instrumentation lets you swap.

How to actually deploy self-hosted LLMOps in 2026

  1. Pencil out hosted vs self-hosted. Hosted price for your span volume vs (infrastructure + 15% of one engineer’s time). Self-host only when one wins clearly.
  2. Pick the platform. Langfuse, Phoenix, Future AGI, Helicone, Lunary. Match storage architecture to volume.
  3. Provision storage day-one. ClickHouse cluster (or Postgres for low volume); Postgres control plane; S3 or MinIO blob.
  4. Wire the OTel collector. Receive, redact, tail-sample, route. Configuration in repo.
  5. Add the queue. Redis Streams or Kafka. Connect the collector and the workers.
  6. Deploy the API service and UI. Per the platform’s deploy guide.
  7. Wire the worker tier. Online evaluators, drift detectors, cost calculators.
  8. Configure backups. Both span storage and control plane. Test restore.
  9. Configure retention. Lifecycle policies on hot and blob storage.
  10. Run a chaos drill. Drop a backend node; verify the queue absorbs and the cluster recovers.

Sources

Related: LLM Tracing Best Practices, What is LLM Tracing?, Production LLM Monitoring Checklist, LLM Cost Tracking Best Practices

Frequently asked questions

When does self-hosting LLM observability make sense?
Three scenarios. Data residency or compliance requires the data stay inside your perimeter (HIPAA, EU GDPR strict, FedRAMP, regulated finance). Latency to a hosted backend is unacceptable for your traffic volume. Hosted pricing crosses the in-house engineering cost. For most teams under 10M spans per month, hosted is cheaper. Past 100M spans per month or with hard residency requirements, self-hosting becomes the default.
What is the minimum architecture for self-hosted LLM observability?
Four layers. Application layer with OTel-native instrumentation (traceAI, OpenInference, vendor SDK). OTel collector for filter, redact, sample. Queue (BullMQ on Redis Streams, or Kafka, or SQS-compatible) for buffering ingest. Storage layer (Postgres for low volume; ClickHouse for high volume; both back blob storage like S3 or MinIO for prompt content). The four layers map to standard K8s deployments; ARM64 builds are now mainstream.
Postgres or ClickHouse for trace storage?
Postgres works up to roughly 1-5M spans per day with careful indexing and partitioning. Past that, ClickHouse is the standard. ClickHouse is built for analytical queries over wide rows, which is the trace-search workload shape. The trade-off: Postgres is operationally familiar (most teams have a DBA), ClickHouse has a steeper operational learning curve. Most production LLMOps platforms run ClickHouse for spans and Postgres for control-plane data (users, projects, prompts).
What does the OTel collector actually do?
Five things. (1) Receive OTLP from instrumentation libraries. (2) Process: filter spans, redact PII, enrich with collector-side attributes. (3) Tail-sample: buffer the trace, decide at trace end whether to keep. (4) Route: send to one or more backends (your storage, plus optionally a vendor backend or Jaeger). (5) Buffer: handle backend outages without dropping spans. The collector is where most of the operational policy lives; the application emits raw, the collector applies the rules.
What about the queue layer between collector and storage?
The queue handles spike absorption and graceful storage outages. Without a queue, a spike in trace volume can overwhelm the storage; a brief storage outage drops spans. With a queue, the collector writes to the queue, a worker drains the queue into storage at sustainable rate. BullMQ on Redis Streams is the common pattern in TypeScript stacks; Kafka is the standard in JVM stacks. The queue is also where async eval scoring fans out from.
What is the K8s footprint for a self-hosted LLMOps stack?
Roughly: 4-6 deployments (API service, collector, worker, ClickHouse cluster, Postgres, Redis or Kafka), 3-5 stateful sets (storage), and an ingress for OTLP endpoints. The hardware footprint scales with span volume: a workload at 10M spans per day runs comfortably on a 3-node cluster with 16-32 GB RAM per node. Past 100M spans per day, ClickHouse needs sharding and the cluster grows. ARM64 images are the default in 2026; Intel images are still shipped for compatibility.
What are the operational costs of self-hosting?
Three categories. Infrastructure: nodes, storage IOPS, egress, blob storage. For a 10M-span-per-day workload, infrastructure cost in 2026 is roughly $300-$1,500 per month depending on cloud and retention. Engineering: ongoing operations of ClickHouse, Redis, Postgres, the collector. Conservatively, 10-20% of one engineer's time. Risk: data residency and compliance sign-off. The total is meaningful; compare against hosted pricing before committing to self-host.
Which OSS LLMOps platforms are deployable self-hosted?
Several with different architectures. Langfuse (MIT core, ClickHouse plus Postgres, OTLP-compatible) is the dominant OSS LLMOps stack at higher volume. Phoenix (Arize, ELv2, Postgres-first, OTLP-first) is lighter-weight. Future AGI traceAI plus the Apache 2.0 backend (Apache 2.0, ClickHouse plus Postgres, OTLP-native) covers the same surface with different tradeoffs. Helicone (Apache 2.0) and Lunary (Apache 2.0) are lighter. Pick by storage architecture preference, deploy footprint, and feature set.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.