What is Haystack? Deepset's RAG and Agents Framework in 2026
Haystack is Deepset's open-source pipeline framework for RAG and agents. Components, pipelines, document stores, agents, and the Haystack 2.x rewrite.
Table of Contents
A team is operating a customer-support search assistant over a 5-million document corpus. The retrieval needs to be hybrid (BM25 plus dense vectors). Documents need filtering by tenant, by language, and by recency. Cross-encoder reranking is non-negotiable. The retrieval-path latency budget is tight, and the system has to behave like a search engine first and an LLM application second. The team evaluates frameworks: LangChain feels too LLM-first for the search-engine workload; LlamaIndex’s data layer is strong but the pipeline composition feels less explicit; Haystack’s typed pipelines map directly to the workflow shape. They pick Haystack and ship.
This is the niche Haystack continues to serve well in 2026. Where LangChain and LlamaIndex came from the LLM application world, Haystack came from the search-and-QA world and brought that engineering discipline to the LLM era. This guide covers what Haystack is, the 2.x architecture, how its primitives work, and when to pick it.
TL;DR: What Haystack is
Haystack is an open-source Apache 2.0 Python framework from Deepset for building production NLP and LLM applications as composable pipelines. The repo at github.com/deepset-ai/haystack has approximately 25,000 GitHub stars as of mid-2026. The current major version is 2.x, a 2024 rewrite that introduced typed component sockets, explicit connections, and an async-friendly directed-graph pipeline runtime that allows cycles for agent loops. Deepset also operates a commercial Haystack Enterprise Platform (formerly deepset AI Platform / deepset Cloud) on top of Haystack with managed deployment and evaluation features. The framework’s strengths are typed pipelines, deep document store integrations, and a strong heritage in classical NLP retrieval primitives.
Why Haystack matters in 2026
Three things kept Haystack relevant through the LLM-framework consolidation.
First, the search-engineer audience kept choosing Haystack. Teams whose product is fundamentally a search system that uses LLMs (legal-doc search, biomedical literature search, financial-filing search) tend to find Haystack’s pipeline abstractions a better match for their mental model than LangChain’s chains or LlamaIndex’s looser composition. The typed inputs and outputs at every stage match how search teams think.
Second, Deepset shipped the 2.x rewrite. The 1.x API was showing its age; 2.x stripped the historical baggage, made the runtime async, added explicit directed-graph semantics with branching and cycles for loops, and serialized pipelines to YAML for declarative deployment. The result is a framework that feels modern in 2026 rather than retrofitted.
Third, the agent layer landed. Haystack 2.x now ships a first-class Agent component that fits naturally inside a Pipeline alongside retrievers and generators. The framework can express both pure RAG pipelines and agentic workflows with the same primitives.
The anatomy of a Haystack 2.x application
The framework’s primitives map to typed dataflow.
Component. A Python class decorated with @component. It declares typed inputs and outputs via @component.output_types(...) and a run method. Components are single-purpose: a Retriever is a component, a Generator is a component, a Ranker is a component, a PromptBuilder is a component. Components are stateless within a run.
Pipeline. A graph of Components connected by socket. You add components with pipe.add_component(name, component) and connect them with pipe.connect("source.output", "target.input"). The Pipeline schedules each component when its required inputs are available; acyclic pipelines behave like ordered dataflow, while cyclic graphs run under Haystack’s loop-execution rules with bounded iteration counts.
Document. The data unit. A Document has content (text or media), metadata (a dict of arbitrary fields), an optional embedding (a list of floats), an optional sparse_embedding, and an id. Documents flow through the pipeline.
Document Store. The storage backend for Documents. The built-in store is InMemoryDocumentStore (for prototyping); deepset-maintained and community integrations cover Elasticsearch, OpenSearch, Weaviate, Pinecone, Qdrant, Chroma, MongoDB Atlas, pgvector, AstraDB, and Milvus.
Generator. A component that calls an LLM. OpenAIGenerator, AnthropicGenerator, HuggingFaceLocalGenerator, OllamaGenerator, CohereGenerator, and similar are first-party. Each takes a model name and other provider-specific params; the input is a prompt string and the output is the completion.
Retriever. A component that queries a Document Store. Vector retrievers (one per backend), BM25 retrievers, hybrid retrievers, and metadata filter helpers are first-party.
Agent. A component that wraps an LLM with tools. Tools are components themselves (or thin wrappers around Python functions). The agent component runs the LLM in a loop, dispatches tool calls, and returns when the model emits a final answer.
Haystack in 30 lines
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
doc_store = InMemoryDocumentStore()
# (assume documents have been written with embeddings already)
template = """Answer using only this context.
{% for doc in documents %}{{ doc.content }}{% endfor %}
Question: {{ question }}"""
rag = Pipeline()
rag.add_component("embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
rag.add_component("retriever", InMemoryEmbeddingRetriever(document_store=doc_store, top_k=5))
rag.add_component("prompt", PromptBuilder(template=template, required_variables=["question"]))
rag.add_component("llm", OpenAIGenerator(model="gpt-4o"))
rag.connect("embedder.embedding", "retriever.query_embedding")
rag.connect("retriever.documents", "prompt.documents")
rag.connect("prompt.prompt", "llm.prompt")
result = rag.run({"embedder": {"text": "What is OpenTelemetry?"}, "prompt": {"question": "What is OpenTelemetry?"}})
print(result["llm"]["replies"][0])
The Pipeline reads as a typed dataflow graph: the embedder produces an embedding, the retriever consumes it and produces documents, the prompt builder consumes documents and a question, the LLM consumes the prompt and produces replies.
How Haystack compares to alternatives
| Framework | Lead with | Best for | License |
|---|---|---|---|
| Haystack | Typed pipeline composition | Search-engineer workflows, hybrid retrieval, production NLP | Apache 2.0 |
| LlamaIndex | Data ingestion and retrieval primitives | RAG-heavy applications, heterogeneous data sources | MIT |
| LangChain + LangGraph | Chains and stateful graphs | Multi-agent orchestration with retrieval as one tool | MIT |
| DSPy | Compiled prompt programs | Optimization-driven prompt programming | MIT |
Haystack’s strength is the engineering discipline of typed pipelines. If your team thinks in dataflow, the framework reads naturally. If your team thinks in chains or in graphs, LangChain or LangGraph is closer to your mental model.
Production patterns with Haystack
Three patterns recur.
Pattern 1: Hybrid RAG pipeline. A Pipeline with a vector retriever (sentence-transformers + Qdrant) running in parallel with a BM25 retriever (Elasticsearch), a DocumentJoiner that merges results, a Ranker (CrossEncoderRanker) for reranking, a PromptBuilder, and an OpenAIGenerator. This is the canonical Haystack RAG path.
Pattern 2: Indexing pipeline plus query pipeline. Two pipelines deployed together. The indexing pipeline ingests documents (FileTypeRouter, PyPDFToDocument, MarkdownToDocument, DocumentSplitter, DocumentEmbedder, DocumentWriter). The query pipeline retrieves and generates. Both pipelines share a Document Store. This separates the offline indexing concern from the online query concern.
Pattern 3: Agent inside a pipeline. An Agent component wired as one node in a larger Pipeline. The agent has tool components (a Retriever wrapped as a tool, a Python function tool, an MCP server tool). The Pipeline can pre-process the user question, hand it to the agent, post-process the agent’s reply (with a guardrail check, for example), and return. This is the agentic Haystack pattern.
Common mistakes when adopting Haystack
- Mixing 1.x and 2.x patterns. The two are not compatible. New code should target 2.x; 1.x is archived. Old tutorials online still reference 1.x; use the Deepset docs as the source of truth.
- Forgetting connect calls. A Pipeline with components added but no connections runs nothing. Every component’s input must be either connected from another component or supplied as a top-level run input.
- Skipping the typed output declaration. Custom components without
@component.output_types(...)decorations confuse the runtime. Always declare output types. - Using InMemoryDocumentStore in production. It is for prototyping. Pick a real store (Elasticsearch, Weaviate, Qdrant, pgvector) for production.
- Hard-coding component parameters at construction. Component params can also be supplied as run-time inputs in pipe.run. The latter is more flexible for production where the same pipeline serves many tenants.
- Skipping the Ranker. Hybrid retrievers return more candidates than the synthesizer needs. A CrossEncoderRanker step adds another model call but typically improves answer quality on retrieval and groundedness metrics; benchmark on your own dataset before adopting.
- Building Pipelines without YAML serialization. YAML serialization makes pipelines deployable as artifacts. For production, save and load Pipelines as YAML rather than reconstructing them in code on every deploy.
How to trace Haystack with FutureAGI
Haystack 2.x can emit OpenTelemetry-compatible spans through its native tracing module plus OpenInference and traceAI instrumentation packages. To ship traces to FutureAGI’s observability platform or any other OTel backend with traceAI:
pip install traceai-haystack
from fi_instrumentation import register
from fi_instrumentation.fi_types import ProjectType
from traceai_haystack import HaystackInstrumentor
trace_provider = register(
project_type=ProjectType.OBSERVE,
project_name="docs-search",
)
HaystackInstrumentor().instrument(tracer_provider=trace_provider)
The resulting trace tree shows the Pipeline run at the root, every Component invocation as a child span with input and output sockets, Retriever spans that can expose top-k Document IDs, scores, and content when content tracing is enabled, and any Agent loop as a nested span tree.
How FutureAGI implements Haystack observability and RAG evaluation
FutureAGI is the production-grade RAG observability and evaluation platform for Haystack built around the closed reliability loop that other Haystack stacks stitch together by hand. The full stack runs on one Apache 2.0 self-hostable plane:
- Pipeline and component tracing, traceAI (Apache 2.0) auto-wraps Haystack 2.x Pipelines, every Component invocation, Retriever and Ranker spans, Agent loops, and provider-level model spans across Python, TypeScript, Java, and C#.
- RAG evals, 50+ first-party metrics including Faithfulness, Groundedness, Context Recall, Context Precision, Answer Relevance, Answer Correctness, Aspect Critic, and Noise Sensitivity attach as span attributes; BYOK lets any LLM serve as the judge at zero platform fee, and
turing_flashruns the same rubrics at 50 to 70 ms p95. - Simulation, persona-driven scenarios exercise the RAG path in pre-prod with the same scorer contract that judges production traces.
- Gateway and guardrails, the Agent Command Center fronts 100+ providers with BYOK routing, and 18+ runtime guardrails enforce policy on the same plane.
Beyond the four axes, FutureAGI also ships six prompt-optimization algorithms that consume failing trajectories as training data. Pricing starts free with a 50 GB tracing tier; Boost is $250 per month, Scale is $750 per month with HIPAA, and Enterprise from $2,000 per month with SOC 2 Type II.
Most teams running Haystack in production end up running three or four tools alongside it: one for traces, one for evals, one for the gateway, one for guardrails. FutureAGI is the recommended pick because tracing, evals, simulation, gateway, and guardrails all live on one self-hostable runtime; the loop closes without stitching. For the broader tracing model, read What is LLM Tracing?.
Sources
- Haystack GitHub repo
- Haystack documentation
- Haystack 2.x release announcement
- Haystack Enterprise Platform
- Deepset blog
- Haystack integrations registry
- LlamaIndex GitHub repo
- LangChain GitHub repo
- traceAI repo
- OpenInference Haystack instrumentation
Series cross-link
Related: What is LlamaIndex?, What is RAG Evaluation?, Best RAG Evaluation Tools in 2026, What is LLM Tracing?
Frequently asked questions
What is Haystack in plain terms?
Who maintains Haystack and what license is it under?
What changed between Haystack 1.x and 2.x?
How is Haystack different from LlamaIndex?
What is a Haystack Pipeline?
What document stores does Haystack support?
How do you trace a Haystack pipeline?
When should I pick Haystack over LlamaIndex or LangChain?
LlamaIndex is the open-source data framework for RAG and agents over enterprise data. Indexes, query engines, agents, workflows, and 0.14 architecture.
Best LLMs May 2026: compare GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, and DeepSeek V4 across coding, agents, multimodal, cost, and open weights.
Best LLMs April 2026: compare GPT-5.5, Claude Opus 4.7, DeepSeek V4, Gemma 4, and Qwen after benchmark trust broke and prices compressed fast.