Guides

Time Series Data Analysis: Forecasting Frameworks, Foundation Models, and When to Use Each (2026 Update)

Time series data analysis in 2026: Prophet, Darts, statsforecast, neuralforecast, TimesFM, Chronos. Code, benchmarks, when to use each model.

February 10, 2025

Updated May 14, 2026

7 min read

machine-learning time-series forecasting

TL;DR Time Series Data Analysis in 2026

Use case	Best pick
Quick business forecast	Prophet
All-in-one Python lib	Darts
Fast statistical or ML at scale	Nixtla statsforecast and mlforecast
Deep learning forecasting	NeuralForecast (NHITS, NBEATS, PatchTST, TFT)
Zero-shot foundation model	TimesFM (Google) or Chronos (AWS Labs)
Anomaly detection	PyOD, ADTK, Darts anomaly detectors
Probabilistic intervals	NeuralForecast, GluonTS
Explanation and reporting	Pair numerical model with LLM (e.g., Claude, GPT, Gemini)
Hierarchical and grouped forecasts	hierarchicalforecast (Nixtla)

What Time Series Analysis Looks Like in 2026

Time series data is any sequence of observations ordered in time: a stock price, a server CPU metric, a daily order count, a smart-meter reading every 15 minutes. The job of time series analysis is to:

Decompose the signal into trend, seasonality, cycle, and residual.
Detect anomalies that deviate from the expected pattern.
Forecast future values along with their uncertainty.
Attribute changes to drivers (changepoints, exogenous variables).

In 2026 the toolkit splits into four tiers. Each has a place; the trick is knowing which to reach for first.

Tier	Examples	When to use
Classical statistical	ARIMA, ETS, TBATS, theta	Stationary or near-stationary, single-series, interpretability
Machine learning	XGBoost, LightGBM with lag and calendar features	Tabular forecasting at scale, exogenous variables
Deep learning	NHITS, NBEATS, PatchTST, TFT, DeepAR	Long horizons, complex seasonality, multivariate, probabilistic
Foundation models	TimesFM, Chronos, Moirai, Lag-Llama	Zero-shot baselines, low-data settings, broad coverage

What’s New on the 2026 Forecasting Stack

Three trends shape the 2026 landscape:

Foundation models for time series went mainstream. TimesFM from Google Research, Chronos from AWS Labs, Moirai from Salesforce, and Lag-Llama all shipped open weights for zero-shot forecasting, with successive releases pushing the state of the art on the Monash benchmarks.
Nixtla’s ecosystem consolidated the practitioner stack. statsforecast, mlforecast, neuralforecast, hierarchicalforecast, and the unified nixtla client cover the most-used patterns with consistent APIs.
LLM-explanation overlays got production-ready. LLMs do not beat purpose-built models at the numerical task, but they are the right tool to translate a forecast into a business narrative, flag anomalies in natural language, and answer ad-hoc analyst questions. Treat them as the explanation layer above your numerical model.

Tier 1: Classical Statistical Methods

ARIMA and its cousins remain the right starting point for stationary, single-series problems with a few hundred to a few thousand observations. They are fast, interpretable, and tough to beat on small data.

The fastest path in 2026 is statsforecast:

import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA, AutoETS

df = pd.DataFrame({
    "unique_id": ["series_a"] * 100,
    "ds": pd.date_range("2024-01-01", periods=100, freq="D"),
    "y": range(100),
})

sf = StatsForecast(
    models=[AutoARIMA(season_length=7), AutoETS(season_length=7)],
    freq="D",
)
forecasts = sf.forecast(df=df, h=14)

Strengths: minutes to fit thousands of series. Interpretable. Strong baseline. Weaknesses: Single-variate, weak with multiple seasonality, no exogenous variables in plain ARIMA.

Tier 2: Machine Learning Forecasting

Feature-engineered tabular ML wins when you have many series, exogenous variables, and large datasets. The pattern: build lag features, calendar features, holiday flags, then train a gradient-boosted tree.

import pandas as pd
from mlforecast import MLForecast
from mlforecast.lag_transforms import RollingMean
from lightgbm import LGBMRegressor

mlf = MLForecast(
    models=[LGBMRegressor()],
    freq="D",
    lags=[7, 14, 28],
    lag_transforms={7: [RollingMean(7)]},
    date_features=["dayofweek", "month"],
)
mlf.fit(df)
forecasts = mlf.predict(h=14)

Strengths: handles exogenous variables naturally, fast at scale, well-understood feature engineering. Weaknesses: needs manual feature design, no native probabilistic output without quantile regressors.

Tier 3: Deep Learning

For long horizons, complex seasonality, multivariate signals, or probabilistic forecasts, deep learning takes over. The current 2026 stack:

NHITS and NBEATS: pure MLP architectures with hierarchical interpolation; strong default for medium-length series.
PatchTST: transformer with patching; strong for long-horizon multivariate.
TFT (Temporal Fusion Transformer): best for interpretable multivariate with static and time-varying covariates.
DeepAR: probabilistic autoregressive RNN; production workhorse at Amazon.

from neuralforecast import NeuralForecast
from neuralforecast.models import NHITS, PatchTST

nf = NeuralForecast(
    models=[
        NHITS(input_size=28, h=14, max_steps=500),
        PatchTST(input_size=28, h=14, max_steps=500),
    ],
    freq="D",
)
nf.fit(df=df)
forecasts = nf.predict()

Strengths: complex patterns, probabilistic intervals, multivariate. Weaknesses: GPU cost, slower iteration, easy to overfit on small data.

For probabilistic forecasting and broader algorithm coverage see also GluonTS and Darts.

Tier 4: Foundation Models for Time Series

The 2026 entrants. These are transformer architectures pretrained on huge time series corpora, ready for zero-shot forecasting.

Model	Source	Weights	Zero-shot
TimesFM	Google Research	Open (HF)	Yes
Chronos	AWS Labs	Open (HF)	Yes
Moirai	Salesforce	Open (HF)	Yes
Lag-Llama	IBM and team	Open (HF)	Yes

Quick start with Chronos:

import pandas as pd
import torch
from chronos import ChronosPipeline

pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-large",
    device_map="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.float16,
)

context = torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0])
forecast = pipeline.predict(context=context, prediction_length=12)
print(forecast.shape)  # (1, 20, 12) num_samples by horizon

When to use foundation models:

You need a strong baseline fast on new data
You do not have enough historical data to train a deep model
You want one model that works across many heterogeneous series

When to skip them:

You have large in-domain data; a tuned deep model usually wins on accuracy
You need quantitative interpretability per feature
Latency or cost matters and your existing classical model is “good enough”

Anomaly Detection in Time Series

Anomaly detection in 2026 splits into:

Statistical: rolling median absolute deviation, Tukey fences, STL decomposition residuals
Forecast-residual: predict, compute residual, flag when residual exceeds threshold
Density-based: isolation forest, LOF, autoencoders, PyOD catalog
Foundation-model-based: use Chronos or TimesFM forecasts as a baseline and flag deviations

Darts ships a unified anomaly detection API:

from darts import TimeSeries
from darts.ad import KMeansScorer, QuantileDetector
from darts.models import ARIMA

series = TimeSeries.from_dataframe(df, time_col="ds", value_cols="y")
model = ARIMA(p=12, d=1, q=0)
model.fit(series)
predictions = model.predict(len(series))

scorer = KMeansScorer(k=2, window=7)
scores = scorer.score_from_prediction(actual_series=series, pred_series=predictions)
detector = QuantileDetector(high_quantile=0.95)
anomalies = detector.detect(scores)

Evaluation: What to Measure

For point forecasts:

Metric	Use when
MAE	Symmetric error, robust to outliers
RMSE	Penalize large errors more
MAPE	Scale-free, but breaks on zeros
sMAPE	Bounded symmetric MAPE
MASE	Scale-free, compare across series

For probabilistic forecasts:

Metric	Use when
CRPS	Continuous, proper score for distributions
Pinball loss	Quantile-specific accuracy
Coverage	Empirical vs nominal prediction interval coverage

Always benchmark against:

Naive: last observation
Seasonal naive: last value at same season
Exponential smoothing: ETS auto-fit

If you cannot beat naive plus seasonal naive plus ETS on cross-validation, your fancier model is not earning its complexity cost.

Cross-Validation Done Right

Random k-fold leaks information. Use time-aware cross-validation:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

sf = StatsForecast(models=[AutoARIMA()], freq="D")
cv_results = sf.cross_validation(
    df=df,
    h=14,           # forecast horizon
    n_windows=4,    # number of rolling windows
    step_size=14,
)

Rolling-origin or expanding-window splits respect the temporal structure of the problem. The most recent period stays held out until final evaluation.

LLMs in a 2026 Time Series Stack

LLMs are not a forecasting tool. They are an explanation, orchestration, and reporting tool. Two patterns work well:

Forecast narrative generation

Run your tuned numerical model, then ask an LLM to translate the result for a non-technical reader:

from openai import OpenAI

client = OpenAI()

prompt = f"""
You are an analyst. Given the forecast below, write a 3-sentence narrative
for a retail operations lead. Highlight expected demand, uncertainty, and
risk drivers.

Forecast (next 14 days mean and 80 percent interval):
{forecast.to_dict()}
"""

resp = client.chat.completions.create(
    model="gpt-5-2025-08-07",
    messages=[{"role": "user", "content": prompt}],
)
print(resp.choices[0].message.content)

Agentic forecasting workflows

A multi-step agent that loads data, picks a model, fits, forecasts, and writes a report. Frameworks like LangChain, LangGraph, and CrewAI orchestrate the tool calls. See Agentic AI frameworks for the broader landscape.

Industry Applications in 2026

Retail and demand forecasting

Hierarchical forecasting at SKU times store times day grain. hierarchicalforecast reconciles forecasts so they sum consistently across the hierarchy. Foundation models give strong cold-start on new SKUs.

Energy and grid

Probabilistic forecasts with prediction intervals. NeuralForecast or DeepAR for half-hourly demand. PatchTST shines on multi-day horizons.

Healthcare

Vital-sign monitoring, length-of-stay, ICU readmission. Standardize signals, handle missing data carefully, and prefer interpretable models for clinician acceptance.

Cloud capacity and SRE

CPU, memory, request-rate forecasting. Foundation models work well here because the data is high-volume and short-tail seasonal.

Finance

Volatility forecasting (GARCH, realized volatility), regime detection. LLMs add news and earnings call text on top of price series.

When You Combine Time Series Models With LLMs

If your stack feeds time series outputs into LLM-generated narratives or agentic flows, you need observability over the LLM side too. The numerical model has standard ML monitoring; the LLM side needs:

Trace ingestion for every LLM call (traceAI ships OpenInference instrumentation, Apache 2.0)
Faithfulness evaluation so the narrative does not hallucinate numbers that contradict the forecast (fi.evals faithfulness template)
Tone and clarity scoring for downstream consumers

Future AGI is not a time series analysis tool, but it is the evaluation and observability companion for the LLM layer that wraps your forecasts. The BYOK Agent Command Center at /platform/monitor/command-center provides routing, fallbacks, PII redaction, and audit logs for the LLM calls. See Real-time LLM evaluation setup for how to wire this up.

Common Pitfalls in 2026

Random k-fold on time series. Information leaks. Use rolling-origin or expanding-window.

No naive baseline. Beat naive plus seasonal naive plus ETS before claiming a win.

MAPE on series with zeros. Use sMAPE or MASE instead.

Overfitting on small data. Classical methods often win below 1,000 observations.

Ignoring exogenous variables. Holidays, promotions, weather often matter more than the model choice.

Treating LLMs as forecasters. General-purpose LLMs underperform tuned numerical models on numerical tasks. Use foundation models for time series (TimesFM, Chronos) for zero-shot baselines, and reserve general-purpose LLMs for explanation and orchestration.

Skipping uncertainty. A point forecast without an interval is half a forecast. Probabilistic models pay off in business decisions.

Get Started in 30 Minutes

pip install nixtla statsforecast neuralforecast mlforecast chronos-forecasting darts

import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA, AutoETS, Naive

df = pd.read_csv("your_series.csv", parse_dates=["ds"])

sf = StatsForecast(
    models=[Naive(), AutoARIMA(season_length=7), AutoETS(season_length=7)],
    freq="D",
)
cv = sf.cross_validation(df=df, h=14, n_windows=4, step_size=14)
print(cv.groupby("unique_id").apply(lambda x: ((x["y"] - x["AutoARIMA"]).abs()).mean()))

Compare your auto-ARIMA against naive on cross-validation. If it does not beat naive, simplify or get more data.

Frequently asked questions

What is time series data analysis in 2026?

Time series data analysis is the practice of modeling sequences of observations indexed in time order to detect patterns, decompose seasonality and trend, identify anomalies, and forecast future values. In 2026 the field is split between classical statistical methods (ARIMA, ETS), modern machine learning frameworks (XGBoost, LightGBM with engineered features), deep learning architectures (N-BEATS, TFT, PatchTST), and the newer generation of foundation models for time series like TimesFM and Chronos.

What are the most-used time series frameworks in 2026?

Prophet from Meta remains the easiest entry point for business forecasting. Darts is the most batteries-included Python library and supports both classical and deep learning models. Nixtla's statsforecast and mlforecast are the fastest options for scalable classical and ML forecasting. NeuralForecast covers modern deep learning models like NHITS, NBEATS, PatchTST, and TFT. For foundation models look at TimesFM by Google Research and Chronos by AWS Labs.

Are time series foundation models worth using in 2026?

For zero-shot forecasting on out-of-distribution data, yes. TimesFM (Google, 2024 update) and Chronos (AWS, 2024) both deliver competitive zero-shot performance on the [Monash Time Series Forecasting Archive](https://forecastingdata.org/) benchmarks without any fine-tuning. For high-stakes production workloads where you have plenty of in-domain data, a well-tuned deep learning model from NeuralForecast (TFT, NHITS, PatchTST) often beats them, especially on hierarchical and probabilistic forecasts. Start with foundation models for a baseline, then upgrade if accuracy matters.

When should I use classical methods like ARIMA vs deep learning?

ARIMA and exponential smoothing remain the right baseline for stationary, single-series problems with regular seasonality and where interpretability matters. They are also unbeatable for small datasets (a few hundred observations) because deep models overfit. Deep learning wins on multi-series problems, high-frequency data, complex seasonality, and when you can use exogenous variables. The Nixtla suite makes running both side-by-side trivial.

How do LLMs fit into time series analysis?

Three patterns work in 2026. First, foundation models for time series (TimesFM, Chronos, Moirai, Lag-Llama) which are transformer architectures pretrained on time series, not general-purpose LLMs. Second, classic LLMs (GPT, Claude, Gemini) used for explanation, anomaly description, and report generation on top of numerical models. Third, agentic workflows that orchestrate forecasting tools and explain results to non-technical users. Do not expect a general-purpose chat model to outperform a tuned forecasting model on the numerical task.

What metrics should I use to evaluate a time series model?

For point forecasts: MAE, RMSE, MAPE (with care, MAPE breaks on zeros), sMAPE, and MASE for scale-free comparison across series. For probabilistic forecasts: pinball loss, CRPS, coverage of prediction intervals. For business utility: weighted versions tied to actual cost of over- or under-forecasting. Always include a naive baseline (last value, seasonal naive) so you know if your model is actually adding signal.

How do I avoid overfitting in time series models?

Use proper time-aware cross-validation (rolling-origin or expanding-window), not random splits. Hold out the most recent period as a test set you only touch once. Regularize aggressively on small datasets. Compare against naive baselines on every iteration. Track multiple metrics, not just one. For deep learning, monitor validation loss on held-out windows and use early stopping based on a forecasting metric, not training loss.

How do I monitor time series models in production?

Track residual statistics, prediction interval coverage, and concept-drift indicators (Population Stability Index, KS distance) over rolling windows. Set up automated retraining when drift exceeds threshold or forecast accuracy degrades. For ML applications that combine time series models with LLMs (for example using an LLM to generate a forecast narrative), pair the time series monitoring with LLM observability via traceAI and fi.evals to score the explanation quality.

View all

Guides

R-Squared (R²) Explained: Formula, Interpretation, Pitfalls

What R-squared means, how to compute it, when adjusted R² helps, when to switch to RMSE/MAE, and why LLM evaluation needs different metrics.

Rishav Hada · Dec 12, 2024

7 min

Guides

Evaluate Google ADK Agents: 6-Step 2026 Production Loop

Evaluate Google ADK agents in 6 steps: traceAI instrumentation, span-attached evaluate() scoring, AgentEvaluator CI gates, persona simulation, and Bayesian prompt opt.

Rishav Hada · Mar 11, 2026

14 min

Guides

Future AGI Voice AI Evaluation in 2026: Latency, Tone, Audio

Future AGI's voice AI evaluation in 2026: P95 latency tracking, tone scoring, audio artifact detection, refusal checks, and Simulate-plus-Observe workflows.

NVJK Kartik · Dec 15, 2025

14 min