Research

What is Haystack? Deepset's RAG and Agents Framework in 2026

Haystack is Deepset's open-source pipeline framework for RAG and agents. Components, pipelines, document stores, agents, and the Haystack 2.x rewrite.

·
8 min read
haystack deepset rag rag-framework nlp pipelines python open-source 2026
Editorial cover image on a pure black starfield background with faint white grid. Bold all-caps white headline WHAT IS HAYSTACK fills the left half. The right half shows a wireframe haystack mound with a magnifying glass over it whose lens has a soft white halo glow inside, drawn in pure white outlines.
Table of Contents

A team is operating a customer-support search assistant over a 5-million document corpus. The retrieval needs to be hybrid (BM25 plus dense vectors). Documents need filtering by tenant, by language, and by recency. Cross-encoder reranking is non-negotiable. The retrieval-path latency budget is tight, and the system has to behave like a search engine first and an LLM application second. The team evaluates frameworks: LangChain feels too LLM-first for the search-engine workload; LlamaIndex’s data layer is strong but the pipeline composition feels less explicit; Haystack’s typed pipelines map directly to the workflow shape. They pick Haystack and ship.

This is the niche Haystack continues to serve well in 2026. Where LangChain and LlamaIndex came from the LLM application world, Haystack came from the search-and-QA world and brought that engineering discipline to the LLM era. This guide covers what Haystack is, the 2.x architecture, how its primitives work, and when to pick it.

TL;DR: What Haystack is

Haystack is an open-source Apache 2.0 Python framework from Deepset for building production NLP and LLM applications as composable pipelines. The repo at github.com/deepset-ai/haystack has approximately 25,000 GitHub stars as of mid-2026. The current major version is 2.x, a 2024 rewrite that introduced typed component sockets, explicit connections, and an async-friendly directed-graph pipeline runtime that allows cycles for agent loops. Deepset also operates a commercial Haystack Enterprise Platform (formerly deepset AI Platform / deepset Cloud) on top of Haystack with managed deployment and evaluation features. The framework’s strengths are typed pipelines, deep document store integrations, and a strong heritage in classical NLP retrieval primitives.

Why Haystack matters in 2026

Three things kept Haystack relevant through the LLM-framework consolidation.

First, the search-engineer audience kept choosing Haystack. Teams whose product is fundamentally a search system that uses LLMs (legal-doc search, biomedical literature search, financial-filing search) tend to find Haystack’s pipeline abstractions a better match for their mental model than LangChain’s chains or LlamaIndex’s looser composition. The typed inputs and outputs at every stage match how search teams think.

Second, Deepset shipped the 2.x rewrite. The 1.x API was showing its age; 2.x stripped the historical baggage, made the runtime async, added explicit directed-graph semantics with branching and cycles for loops, and serialized pipelines to YAML for declarative deployment. The result is a framework that feels modern in 2026 rather than retrofitted.

Third, the agent layer landed. Haystack 2.x now ships a first-class Agent component that fits naturally inside a Pipeline alongside retrievers and generators. The framework can express both pure RAG pipelines and agentic workflows with the same primitives.

The anatomy of a Haystack 2.x application

The framework’s primitives map to typed dataflow.

Component. A Python class decorated with @component. It declares typed inputs and outputs via @component.output_types(...) and a run method. Components are single-purpose: a Retriever is a component, a Generator is a component, a Ranker is a component, a PromptBuilder is a component. Components are stateless within a run.

Pipeline. A graph of Components connected by socket. You add components with pipe.add_component(name, component) and connect them with pipe.connect("source.output", "target.input"). The Pipeline schedules each component when its required inputs are available; acyclic pipelines behave like ordered dataflow, while cyclic graphs run under Haystack’s loop-execution rules with bounded iteration counts.

Document. The data unit. A Document has content (text or media), metadata (a dict of arbitrary fields), an optional embedding (a list of floats), an optional sparse_embedding, and an id. Documents flow through the pipeline.

Document Store. The storage backend for Documents. The built-in store is InMemoryDocumentStore (for prototyping); deepset-maintained and community integrations cover Elasticsearch, OpenSearch, Weaviate, Pinecone, Qdrant, Chroma, MongoDB Atlas, pgvector, AstraDB, and Milvus.

Generator. A component that calls an LLM. OpenAIGenerator, AnthropicGenerator, HuggingFaceLocalGenerator, OllamaGenerator, CohereGenerator, and similar are first-party. Each takes a model name and other provider-specific params; the input is a prompt string and the output is the completion.

Retriever. A component that queries a Document Store. Vector retrievers (one per backend), BM25 retrievers, hybrid retrievers, and metadata filter helpers are first-party.

Agent. A component that wraps an LLM with tools. Tools are components themselves (or thin wrappers around Python functions). The agent component runs the LLM in a loop, dispatches tool calls, and returns when the model emits a final answer.

Haystack in 30 lines

from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore

doc_store = InMemoryDocumentStore()
# (assume documents have been written with embeddings already)

template = """Answer using only this context.
{% for doc in documents %}{{ doc.content }}{% endfor %}
Question: {{ question }}"""

rag = Pipeline()
rag.add_component("embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
rag.add_component("retriever", InMemoryEmbeddingRetriever(document_store=doc_store, top_k=5))
rag.add_component("prompt", PromptBuilder(template=template, required_variables=["question"]))
rag.add_component("llm", OpenAIGenerator(model="gpt-4o"))

rag.connect("embedder.embedding", "retriever.query_embedding")
rag.connect("retriever.documents", "prompt.documents")
rag.connect("prompt.prompt", "llm.prompt")

result = rag.run({"embedder": {"text": "What is OpenTelemetry?"}, "prompt": {"question": "What is OpenTelemetry?"}})
print(result["llm"]["replies"][0])

The Pipeline reads as a typed dataflow graph: the embedder produces an embedding, the retriever consumes it and produces documents, the prompt builder consumes documents and a question, the LLM consumes the prompt and produces replies.

How Haystack compares to alternatives

FrameworkLead withBest forLicense
HaystackTyped pipeline compositionSearch-engineer workflows, hybrid retrieval, production NLPApache 2.0
LlamaIndexData ingestion and retrieval primitivesRAG-heavy applications, heterogeneous data sourcesMIT
LangChain + LangGraphChains and stateful graphsMulti-agent orchestration with retrieval as one toolMIT
DSPyCompiled prompt programsOptimization-driven prompt programmingMIT

Haystack’s strength is the engineering discipline of typed pipelines. If your team thinks in dataflow, the framework reads naturally. If your team thinks in chains or in graphs, LangChain or LangGraph is closer to your mental model.

Production patterns with Haystack

Three patterns recur.

Pattern 1: Hybrid RAG pipeline. A Pipeline with a vector retriever (sentence-transformers + Qdrant) running in parallel with a BM25 retriever (Elasticsearch), a DocumentJoiner that merges results, a Ranker (CrossEncoderRanker) for reranking, a PromptBuilder, and an OpenAIGenerator. This is the canonical Haystack RAG path.

Pattern 2: Indexing pipeline plus query pipeline. Two pipelines deployed together. The indexing pipeline ingests documents (FileTypeRouter, PyPDFToDocument, MarkdownToDocument, DocumentSplitter, DocumentEmbedder, DocumentWriter). The query pipeline retrieves and generates. Both pipelines share a Document Store. This separates the offline indexing concern from the online query concern.

Pattern 3: Agent inside a pipeline. An Agent component wired as one node in a larger Pipeline. The agent has tool components (a Retriever wrapped as a tool, a Python function tool, an MCP server tool). The Pipeline can pre-process the user question, hand it to the agent, post-process the agent’s reply (with a guardrail check, for example), and return. This is the agentic Haystack pattern.

Common mistakes when adopting Haystack

  • Mixing 1.x and 2.x patterns. The two are not compatible. New code should target 2.x; 1.x is archived. Old tutorials online still reference 1.x; use the Deepset docs as the source of truth.
  • Forgetting connect calls. A Pipeline with components added but no connections runs nothing. Every component’s input must be either connected from another component or supplied as a top-level run input.
  • Skipping the typed output declaration. Custom components without @component.output_types(...) decorations confuse the runtime. Always declare output types.
  • Using InMemoryDocumentStore in production. It is for prototyping. Pick a real store (Elasticsearch, Weaviate, Qdrant, pgvector) for production.
  • Hard-coding component parameters at construction. Component params can also be supplied as run-time inputs in pipe.run. The latter is more flexible for production where the same pipeline serves many tenants.
  • Skipping the Ranker. Hybrid retrievers return more candidates than the synthesizer needs. A CrossEncoderRanker step adds another model call but typically improves answer quality on retrieval and groundedness metrics; benchmark on your own dataset before adopting.
  • Building Pipelines without YAML serialization. YAML serialization makes pipelines deployable as artifacts. For production, save and load Pipelines as YAML rather than reconstructing them in code on every deploy.

How to trace Haystack with FutureAGI

Haystack 2.x can emit OpenTelemetry-compatible spans through its native tracing module plus OpenInference and traceAI instrumentation packages. To ship traces to FutureAGI’s observability platform or any other OTel backend with traceAI:

pip install traceai-haystack
from fi_instrumentation import register
from fi_instrumentation.fi_types import ProjectType
from traceai_haystack import HaystackInstrumentor

trace_provider = register(
    project_type=ProjectType.OBSERVE,
    project_name="docs-search",
)
HaystackInstrumentor().instrument(tracer_provider=trace_provider)

The resulting trace tree shows the Pipeline run at the root, every Component invocation as a child span with input and output sockets, Retriever spans that can expose top-k Document IDs, scores, and content when content tracing is enabled, and any Agent loop as a nested span tree.

How FutureAGI implements Haystack observability and RAG evaluation

FutureAGI is the production-grade RAG observability and evaluation platform for Haystack built around the closed reliability loop that other Haystack stacks stitch together by hand. The full stack runs on one Apache 2.0 self-hostable plane:

  • Pipeline and component tracing, traceAI (Apache 2.0) auto-wraps Haystack 2.x Pipelines, every Component invocation, Retriever and Ranker spans, Agent loops, and provider-level model spans across Python, TypeScript, Java, and C#.
  • RAG evals, 50+ first-party metrics including Faithfulness, Groundedness, Context Recall, Context Precision, Answer Relevance, Answer Correctness, Aspect Critic, and Noise Sensitivity attach as span attributes; BYOK lets any LLM serve as the judge at zero platform fee, and turing_flash runs the same rubrics at 50 to 70 ms p95.
  • Simulation, persona-driven scenarios exercise the RAG path in pre-prod with the same scorer contract that judges production traces.
  • Gateway and guardrails, the Agent Command Center fronts 100+ providers with BYOK routing, and 18+ runtime guardrails enforce policy on the same plane.

Beyond the four axes, FutureAGI also ships six prompt-optimization algorithms that consume failing trajectories as training data. Pricing starts free with a 50 GB tracing tier; Boost is $250 per month, Scale is $750 per month with HIPAA, and Enterprise from $2,000 per month with SOC 2 Type II.

Most teams running Haystack in production end up running three or four tools alongside it: one for traces, one for evals, one for the gateway, one for guardrails. FutureAGI is the recommended pick because tracing, evals, simulation, gateway, and guardrails all live on one self-hostable runtime; the loop closes without stitching. For the broader tracing model, read What is LLM Tracing?.

Sources

Related: What is LlamaIndex?, What is RAG Evaluation?, Best RAG Evaluation Tools in 2026, What is LLM Tracing?

Frequently asked questions

What is Haystack in plain terms?
Haystack is an open-source Python framework from Deepset for building production NLP and LLM applications as pipelines of composable components. The core abstractions are Components (single-purpose units like a retriever, a generator, or a ranker), Pipelines (directed graphs of components with typed connections), Document Stores (storage for embedded documents), and Agents (LLM-powered workers with tools). Haystack predates the LLM-framework wave; the 2.x line is the modern LLM-era rewrite.
Who maintains Haystack and what license is it under?
Haystack is maintained by Deepset, a German venture-backed company founded in 2018. The codebase at github.com/deepset-ai/haystack is Apache 2.0 licensed and crossed the 25,000 GitHub stars milestone (~25.1k as of May 2026). Deepset also operates a hosted commercial product called the Haystack Enterprise Platform (formerly deepset AI Platform / deepset Cloud) on top of Haystack, with managed deployment, evaluation, and admin features. The Haystack 2.x line (latest 2.28 in 2026) is the current major version; Haystack 1.x is no longer actively developed.
What changed between Haystack 1.x and 2.x?
Haystack 2.x was a near-complete rewrite released in early 2024. The 1.x API was monolithic Pipelines with imperatively-defined components. 2.x introduced typed inputs and outputs on every component, explicit connections between components by socket, async support, and a directed-graph pipeline runtime that allows cycles for loops and agentic patterns. The 2.x design is closer to a typed dataflow graph than to 1.x's chain-of-objects. New code targets 2.x; 1.x is still importable but archived.
How is Haystack different from LlamaIndex?
Haystack and LlamaIndex overlap heavily on RAG primitives. The differences are taste and history. Haystack predates the LLM era and was originally a search and QA framework; the design leans toward typed pipelines with explicit connections. LlamaIndex started LLM-first with looser primitives and consolidated around Workflows in 0.11+. For pure RAG over documents, both are viable. Haystack tends to feel more pipeline-engineer-friendly; LlamaIndex tends to feel more application-engineer-friendly. The vector store and provider integration surfaces are comparable.
What is a Haystack Pipeline?
A Pipeline is a directed graph of Components connected by typed sockets. You add components, connect their outputs to others' inputs, and run the pipeline by passing inputs to the entry components. Components run when their inputs are ready. Pipelines support branching, cycles for loops and agentic patterns (added in 2.x), async execution, and serialization to YAML. The Pipeline is the core orchestration primitive; agents in Haystack are built as components inside pipelines.
What document stores does Haystack support?
Haystack has deepset-maintained and community integrations for most production document stores: Elasticsearch, OpenSearch, Weaviate, Pinecone, Qdrant, Chroma, MongoDB Atlas, pgvector, AstraDB, Milvus, and an InMemoryDocumentStore for prototyping. The integrations live in haystack-integrations packages. The common DocumentStore protocol (write_documents, count_documents, filter_documents, delete_documents) makes stores easier to swap, but production swaps usually require changing the document store, the matching retriever, credentials, and backend-specific filter or index configuration.
How do you trace a Haystack pipeline?
Haystack 2.x ships a native OpenTelemetry tracing module that emits Pipeline-run and component spans. You enable it via configuration, and content tracing (prompts, completions, retrieved documents) is opt-in. For deeper provider visibility (HTTP, OpenAI, etc.), pair the native tracing with provider OTel instrumentations. OpenInference and traceAI both provide Haystack instrumentation packages that wrap components and emit semantic-convention attributes (verify the exact namespace your stack expects before standardizing dashboards). The trace tree shows the Pipeline run at the root, every component as a child span with input and output sockets, and every LLM call deeper when a provider instrumentation is active.
When should I pick Haystack over LlamaIndex or LangChain?
Pick Haystack when your team's mental model is typed dataflow pipelines and you want explicit input/output contracts on every step. Pick it for production NLP search workloads where the framework's history in classical NLP (BM25, sparse retrievers, reranking) shows. Pick it when the Haystack Enterprise Platform is on the procurement shortlist; Haystack-native applications deploy there cleanly. Skip it for pure agentic workflows that need fine-grained graph control (LangGraph) or for primarily LLM-driven workflows where the pipeline overhead is friction.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.