Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting

🌟 News

Meta Learning for Time Series Forecasting: we add the code for meta-learning-based model selection used in the paper. You can:

Run meta learning experiments:

python meta/run.py --mode simple --test_dataset ETTh2 --meta_model_type mlp

Extract meta-features for datasets:

python meta/meta_features/get_meta_features_LTF.py --meta_feature_type tabpfn

Apply meta selection to new datasets:

python meta/run_custom.py --new_dataset my_dataset --checkpoint_path <path> --new_dataset_path <csv_path> --scripts_root <scripts_dir>

We add distribution plot analyses of meta-features based on our method (TabPFN-based) and other statistical methods. We found that the meta-features extracted by TabPFN exhibit a more pronounced normal distribution.

🌟 Introduction

Official implementation of TSCOMP.

As the field of multivariate time series forecasting (MTSF) continues to diversify across Transformers, MLPs, Large Language Models (LLMs), and Time Series Foundation Models (TSFMs), existing studies typically address concerns about methodological effectiveness by conducting large-scale benchmarks. These studies consistently indicate that no single approach dominates across all scenarios.

However, existing benchmarks typically evaluate models holistically, failing to analyze the multi-level hierarchy of MTSF pipelines. Consequently, the contributions of internal mechanisms remain obscured, hindering the combination of effective designs into superior solutions.

To bridge these gaps, we propose TSCOMP, a comprehensive framework designed to systematically deconstruct and benchmark deep MTSF methods. Instead of viewing models as indivisible black boxes, TSCOMP performs a hierarchical deconstruction across three levels: the Pipeline, Component Dimensions, and Deconstructed Components.

🚀 Method Innovations

Comprehensive benchmark via hierarchical deconstruction We propose TSCOMP, the first large-scale benchmark that systematically deconstructs deep MTSF methods. TSCOMP examines the MTSF workflow through a hierarchical design space, spanning from the overall modeling pipeline to fine-grained specific components. To rigorously assess these elements, we design a constrained orthogonal evaluation protocol that isolates the core mechanisms driving forecasting performance.
Multi-view analysis and insights We conduct a large-scale analysis that provides both overall and conditional insights. Beyond evaluating general component effectiveness, we extensively investigate performance variations across different backbones (including specific models and emerging LLMs/TSFMs), diverse data domains, and data characteristics. Furthermore, we explore the intricate interaction effects among deconstructed components, verifying community claims with rigorous experimental evidence.
Open-sourced corpus and automated construction We open-source the resulting fine-grained performance corpus and validate its utility for model design. This corpus facilitates automated construction of MTSF methods that are adaptively tailored to different forecasting scenarios, consistently achieving better results than state-of-the-art methods.

🌟 Framework Overview

Overview of the proposed TSCOMP framework. TSCOMP deconstructs existing SOTA models into a modular component pool. Through large-scale experimental analysis, TSCOMP conducts bottom-up evaluation from component-level comparisons to dimension-level and pipeline-level importance ranking. The resulting performance corpus enables automated model construction via a pre-trained meta-predictor that delivers zero-shot, data-adaptive component selection.

Component-level Deconstruction

Deconstructed component taxonomy in TSCOMP. We organize forecasting model design into a hierarchical component space for controlled and interpretable benchmarking.

The design space is structured into three levels:

Pipeline level: the standard MTSF workflow is modeled as Series Preprocessing -> Series Encoding -> Network Architecture -> Network Optimization.
Dimension level: each pipeline stage contains multiple component dimensions, such as normalization, tokenization, and attention mechanisms.
Component level: each dimension includes concrete implementations extracted from SOTA models, such as RevIN normalization, series patching, and sparse attention.

This deconstruction forms a structured and extensible design space that covers diverse modeling strategies.

Constrained Orthogonal Pool Generation

Constrained orthogonal pool generation process. Following the protocol in our paper, TSCOMP constructs valid model combinations under compatibility constraints to ensure fair and systematic large-scale evaluation.

Design Space Complexity. The Cartesian product of component dimensions yields more than $10^6$ theoretical configurations. Many combinations are invalid due to mechanism-level incompatibilities (for example, inverted encoding conflicts with channel-independent strategies, and some pre-trained backbones require specific attention protocols). After filtering invalid designs, thousands of candidates still remain, which is computationally prohibitive for multi-dataset benchmarking.

Pairwise Coverage Criterion. To balance rigor and efficiency, we adopt a constrained orthogonal design that targets pairwise coverage of valid component interactions. Compared with exhaustive $k$-way coverage ($k \geq 3$), this strategy is computationally tractable; compared with single-component analysis, it better captures interaction effects. We use a greedy construction process to iteratively select configurations that maximize uncovered valid pairs, resulting in a compact yet representative pool (about 136 models per horizon in our setting).

📁 Repository Structure

data_provider/: dataset loading and preprocessing.
models/: forecasting model implementations.
layers/: reusable neural network building blocks.
exp/: experiment pipelines for forecasting tasks.
scripts/: generated batch scripts for benchmark execution.
meta/: meta-feature extraction and meta-learning based model selection.
figures/: framework and analysis figures used in the paper and README.

🚀 Running Experiments

To reproduce the experimental results for TSCOMP, you need to first generate the execution scripts for the Constrained Orthogonal Pool and the Random Pool, and then run these generated scripts.

1. Environment Setup

conda env create -f environment.yml
conda activate tscomp

2. Generate Execution Scripts (.sh)

Please run the following Python scripts to generate bash scripts for batch testing of short-term and long-term forecasting tasks:

Short-term forecasting:

python notebooks/bash_generator_short_term_forecasting_sota_seed.py

Long-term forecasting:

python notebooks/bash_generator_long_term_forecasting_sota_seed.py

After executing the above code, a series of .sh script files will be generated in scripts/ (or the output directory specified in the code).

3. Run Experimental Scripts

Once generated, you can directly run the .sh scripts to build and evaluate the TSCOMP model combinations within the benchmark, for example:

bash scripts/<generated_script_name>.sh

4. Meta-learning (Optional)

Run meta learning experiments:

python meta/run.py --mode simple --test_dataset ETTh2 --meta_model_type mlp

Extract meta-features for datasets:

python meta/meta_features/get_meta_features_LTF.py --meta_feature_type tabpfn

Apply meta selection to new datasets:

python meta/run_custom.py --new_dataset my_dataset --checkpoint_path <path> --new_dataset_path <csv_path> --scripts_root <scripts_dir>

📝 Citation

If you find this work useful, please consider citing:

@inproceedings{liang2025beyond,
  title={Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting},
  author={Liang, Shuang and Hou, Chaochuan and Yao, Xu and Wang, Shiping and Huang, Hailiang and Han, Songqiao and Jiang, Minqi},
  booktitle={Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting

🌟 News

🌟 Introduction

🚀 Method Innovations

🌟 Framework Overview

Component-level Deconstruction

Constrained Orthogonal Pool Generation

📁 Repository Structure

🚀 Running Experiments

1. Environment Setup

2. Generate Execution Scripts (.sh)

3. Run Experimental Scripts

4. Meta-learning (Optional)

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
data_provider		data_provider
exp		exp
figures		figures
layers		layers
meta		meta
models		models
notebooks		notebooks
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
run.py		run.py
run_sota.py		run_sota.py

Folders and files

Latest commit

History

Repository files navigation

Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting

🌟 News

🌟 Introduction

🚀 Method Innovations

🌟 Framework Overview

Component-level Deconstruction

Constrained Orthogonal Pool Generation

📁 Repository Structure

🚀 Running Experiments

1. Environment Setup

2. Generate Execution Scripts (.sh)

3. Run Experimental Scripts

4. Meta-learning (Optional)

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages