Skip to content

drumtorben/polars-ts

Repository files navigation

Polars TS

DocumentationSource CodePyPI


polars-ts is a batteries-included time series toolkit built on Polars. It gives you Rust-accelerated distance metrics, 10+ clustering algorithms, a full forecasting stack, and diagnostics — all from a single pip install, no heavyweight frameworks required.

Why polars-ts?

Pain point How polars-ts helps
"I need DTW but scipy is slow" 12 distance metrics compiled to native code via Rust + Rayon, orders of magnitude faster on large panels
"I want to cluster time series but tslearn/sktime have too many deps" K-Medoids, K-Shape, HDBSCAN, Spectral, Hierarchical, K-Means DBA, CLARA/CLARANS, U-Shapelets — all built-in, optional scikit-learn only for density methods
"Setting up a forecast pipeline takes too long" ForecastPipeline wires up lags, rolling stats, calendar features, target transforms, and any sklearn model in 5 lines
"I don't know which clustering method to pick" auto_cluster sweeps methods × distances × k values and returns the best result with evaluation scores
"Polars doesn't have time series functions" Mann-Kendall, Sen's slope, CUSUM, PELT, decomposition, ACF/PACF — all group-aware and Polars-native

TL;DR — what you can do in 3 lines

import polars_ts as pts

# Cluster 1 000 series by shape similarity
labels = pts.auto_cluster(df, methods=["kmedoids", "spectral"], distances=["sbd", "dtw"])

# Forecast with a full ML pipeline
pipe = pts.ForecastPipeline(model, lags=[1,7,14], rolling_windows=[7], calendar=["day_of_week"])
pipe.fit(train); forecasts = pipe.predict(train, h=7)

# Detect changepoints
breaks = pts.pelt(df, cost="meanvar", pen=10)

Installation

pip install polars-timeseries

Extras for optional features:

pip install "polars-timeseries[clustering]"     # HDBSCAN, DBSCAN, spectral (sklearn + scipy)
pip install "polars-timeseries[forecast]"       # SCUM, auto_arima (statsforecast)
pip install "polars-timeseries[decomposition]"  # Fourier decomposition (polars-ds)
pip install "polars-timeseries[all]"            # Everything

Requires Python 3.12+ and Polars 1.30+.


Quick start

Pairwise DTW distance

import polars as pl
import polars_ts as pts

df = pl.DataFrame({
    "unique_id": ["A"] * 5 + ["B"] * 5,
    "y": [1.0, 2.0, 3.0, 2.0, 1.0,
          1.0, 3.0, 5.0, 3.0, 1.0],
})

result = pts.compute_pairwise_dtw(df, df)

Auto-cluster time series

result = pts.auto_cluster(
    df,
    methods=["kmedoids", "spectral", "kshape"],
    distances=["sbd", "dtw"],
    k_range=range(2, 6),
)
print(result.best_method, result.best_k, result.best_score)
print(result.best_labels)  # DataFrame[unique_id, cluster]

End-to-end forecast pipeline

from sklearn.ensemble import GradientBoostingRegressor
import polars_ts as pts

pipe = pts.ForecastPipeline(
    GradientBoostingRegressor(),
    lags=[1, 2, 7],
    rolling_windows=[7],
    calendar=["day_of_week", "month"],
    target_transform="log",
)
pipe.fit(train_df)
forecasts = pipe.predict(train_df, h=7)

ARIMA forecasting

import polars_ts as pts

# Fit ARIMA(1,1,1) and forecast 12 steps ahead
fitted = pts.arima_fit(df, order=(1, 1, 1))
forecast = pts.arima_forecast(fitted, h=12)

# Or use automatic order selection
forecast = pts.auto_arima(df, h=12, season_length=12)

Exponential smoothing

import polars_ts as pts

# Holt-Winters seasonal forecast
result = pts.holt_winters_forecast(df, h=12, season_length=12, seasonal="additive")

Conformal prediction intervals

import polars_ts as pts

# Distribution-free prediction intervals
result = pts.conformal_interval(cal_residuals, predictions, coverage=0.9)

Weighted ensemble

import polars_ts as pts

ens = pts.WeightedEnsemble(weights="inverse_error")
combined = ens.combine([forecast_a, forecast_b], validation_dfs=[val_a, val_b])

Mann-Kendall trend test

import polars as pl
import polars_ts as pts

df = pl.DataFrame({
    "group": ["A"] * 10 + ["B"] * 10,
    "y": list(range(10)) + [10 - x for x in range(10)],
})

result = df.group_by("group").agg(
    pts.mann_kendall(pl.col("y")).alias("trend"),
    pts.sens_slope(pl.col("y")).alias("slope"),
)

Seasonal decomposition

import polars as pl
import polars_ts as pts

df = pl.DataFrame({
    "unique_id": ["A"] * 48,
    "ds": list(range(48)),
    "y": [10 + 5 * (i % 12 > 5) + 0.5 * i for i in range(48)],
})

result = pts.seasonal_decomposition(df, freq=12, method="additive")

Features

Distance metrics Rust, parallelized via Rayon

All distance functions return a tidy DataFrame with columns [id_1, id_2, <metric>]. A unified compute_pairwise_distance(method=...) API lets you swap metrics with a single string.

Metric Function Key Parameters
Dynamic Time Warping compute_pairwise_dtw method: standard, sakoe_chiba, itakura, fast
Derivative DTW compute_pairwise_ddtw Shape-sensitive comparison
Weighted DTW compute_pairwise_wdtw g: weight sharpness
Move-Split-Merge compute_pairwise_msm c: move cost
Edit Distance (Real Penalty) compute_pairwise_erp g: gap value
Longest Common Subsequence compute_pairwise_lcss epsilon: matching threshold
Time Warp Edit Distance compute_pairwise_twe nu: stiffness, lambda_: deletion cost
Shape-Based Distance compute_pairwise_sbd Cross-correlation based
Frechet Distance compute_pairwise_frechet Geometric coupling distance
Edit Distance on Real Sequences compute_pairwise_edr Edit-operation cost
Multivariate DTW compute_pairwise_dtw_multi metric: manhattan, euclidean
Multivariate MSM compute_pairwise_msm_multi c: move cost

Clustering & classification

Method Function When to use
K-Medoids (PAM) kmedoids Known k, any distance metric, interpretable medoids
K-Shape KShape Shape-based grouping via cross-correlation centroids
Spectral (KSC) spectral_cluster Non-convex clusters, graph Laplacian structure
HDBSCAN hdbscan_cluster Unknown k, varying density, noise detection
DBSCAN dbscan_cluster Fixed-radius neighbourhood, noise detection
Hierarchical agglomerative_cluster Dendrogram visualization, flexible linkage
K-Means DBA kmeans_dba DTW Barycentric Averaging centroids
CLARA clara Scalable k-medoids via sampling
CLARANS clarans Randomized k-medoids neighbourhood search
U-Shapelets shapelet_cluster Interpretable sub-sequence patterns
ROCKET / MiniRocket rocket_features, minirocket_features Random convolutional kernel feature extraction
Auto-cluster auto_cluster Sweep methods × distances × k, pick the best

Evaluation: silhouette_score, davies_bouldin_score, calinski_harabasz_score

Classification: knn_classify (distance-based k-NN), TimeSeriesKNNClassifier (OOP), KShapeClassifier (centroid-based)

Trend & changepoint detection

  • Mann-Kendall test — non-parametric trend detection (Rust)
  • Sen's slope — robust trend magnitude estimation (Rust)
  • CUSUM — cumulative sum changepoint detection (Rust)
  • PELT — multiple changepoints with mean/variance/meanvar cost functions
  • BOCPD — Bayesian Online Changepoint Detection
  • Regime detection — Hidden Markov Model state inference

Decomposition

  • Seasonal decomposition — additive or multiplicative (classical)
  • Fourier decomposition — harmonic decomposition with configurable frequencies
  • Decomposition features — trend/seasonal strength extraction (simple or MSTL)
  • Anomaly flagging — residual-based anomaly detection from any decomposition

Feature engineering

  • Lag features — create lagged versions of a target column per group
  • Rolling features — rolling window aggregations (mean, std, min, max, sum, median, var)
  • Calendar features — extract day_of_week, month, quarter, is_weekend, etc.
  • Fourier features — sin/cos pairs for seasonal modelling
  • Target encoding — smoothed categorical encoding by target mean
  • Holiday features — binary holidays + distance-to-holiday (requires holidays package)
  • Interaction features — cross-term column generation
  • Time embeddings — cyclical sin/cos encoding for time components

Target transforms

  • Log transform — log1p / expm1 with automatic validation and lossless inversion
  • Box-Cox transform — parametric power transform with configurable lambda
  • Differencing — configurable order and seasonal period with metadata for lossless inversion

All transforms are group-aware, invertible, and accessible via the df.pts namespace.

Data preprocessing

  • Missing value imputation — forward/backward fill, linear interpolation, mean, median, seasonal
  • Outlier detection — z-score, IQR, Hampel filter, rolling z-score
  • Outlier treatment — clip (winsorize), median replacement, interpolation, null
  • Temporal resampling — downsample/upsample with configurable aggregation

Validation strategies

  • Expanding window CV — growing training window cross-validation
  • Sliding window CV — fixed-size training window cross-validation
  • Rolling origin CV — general rolling-origin with configurable initial/fixed train size

Forecasting

  • SCUM — ensemble model combining AutoARIMA, AutoETS, AutoCES, and DynamicOptimizedTheta
  • ARIMA/SARIMA — explicit (p,d,q) order via statsmodels (arima_fit/arima_forecast) or automatic selection via statsforecast (auto_arima)
  • Baseline models — naive, seasonal naive, moving average, and FFT-based forecasts
  • Exponential smoothing — SES, Holt's linear, Holt-Winters (additive/multiplicative, Rust-accelerated)
  • Multi-step strategiesRecursiveForecaster and DirectForecaster
  • ForecastPipeline — end-to-end ML pipeline with feature engineering + transforms
  • GlobalForecaster — cross-series panel model with optional ID encoding

Probabilistic forecasting

  • QuantileRegressor — one model per quantile level with CRPS-compatible output
  • Conformal prediction — distribution-free intervals with coverage guarantees
  • EnbPI — Ensemble Batch Prediction Intervals with adaptive online updates

Ensembling

  • WeightedEnsemble — equal, manual, or inverse-error-optimized weights
  • StackingForecaster — meta-learner trained on out-of-fold predictions

Forecast evaluation & diagnostics

  • Metrics — MAE, RMSE, MAPE, sMAPE, MASE, CRPS
  • Kaboudan metric — model robustness evaluation via block-shuffle backtesting
  • Bias detection & correction — mean, regression, quantile mapping
  • Calibration diagnostics — calibration table, PIT histogram, reliability diagram
  • Residual diagnostics — ACF, PACF, Ljung-Box test
  • Permutation importance — model-agnostic feature importance

Multivariate & hierarchical

  • VAR — Vector Autoregression with OLS fitting and multi-step forecasts
  • Granger causality — F-test for causal relationships between series
  • GARCH — volatility modelling and conditional variance forecasting
  • Forecast reconciliation — bottom-up, top-down, and MinTrace-OLS

Anomaly detection

  • Decomposition-based — residual threshold anomaly flagging
  • Isolation Forest — unsupervised anomaly detection on engineered features

Integration adapters

  • NeuralForecast — convert to/from N-BEATS, PatchTST, N-HiTS format
  • PyTorch Forecasting — convert to/from TFT, DeepAR format
  • HuggingFace — convert to Dataset for Chronos, TimesFM, Lag-Llama
  • Chronos / MOMENT embeddings — foundation model feature extraction for clustering
  • ForecastEnv — Gymnasium-compatible RL environment for decision making

Tutorials

The notebooks/ directory contains 10 end-to-end tutorials:

# Topic
01 Data wrangling & exploration
02 Feature engineering & transforms
03 Forecasting fundamentals
04 ML forecasting pipelines
05 Uncertainty & calibration
06 Changepoint & anomaly detection
07 Time series similarity & clustering
08 Multivariate & volatility
09 Ensembles & reconciliation
10 Ecosystem adapters

Development

git clone https://github.com/drumtorben/polars-ts.git
cd polars-ts
uv sync
uv pip install -e .
uv run pytest

Code quality

Pre-commit hooks run via prek (Rust reimplementation of pre-commit) or standard pre-commit — both read .pre-commit-config.yaml:

# Option A: prek (faster)
uv tool install prek
prek run --all-files

# Option B: standard pre-commit
pre-commit run --all-files

Type checking

# mypy (authoritative)
uv run mypy polars_ts/

# ty (fast, informational — beta)
uvx ty check polars_ts/

License

MIT

About

Polars Time Series Extension

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors