BeatNet: Real-time and Offline Joint Music Beat, Downbeat, Tempo, and Meter Tracking System

BeatNet is the state-of-the-art AI-based Python library for joint music beat, downbeat, tempo, and meter tracking. This repo includes the BeatNet neural structure along with the efficient two-stage cascade particle filtering algorithm that is proposed in the paper. It offers four distinct working modes, as follows:

Streaming mode: This mode captures streaming audio directly from the microphone.
Real-time mode: In this mode, audio files are read and processed in real-time, yielding immediate results.
Online mode: Similar to Real-time mode, Online mode employs the same causal algorithm for track processing. However, rather than reading the files in real-time, it reads them faster, while still producing identical outcomes to the real-time mode.
Offline mode: Inferes beats and downbeats in an offline fashion.

New in v1.2.0: The official training pipeline is now included. You can train the BeatNet CRNN from scratch on your own data or reproduce the paper's results using the provided training scripts, dataset handlers, and test suite.

To gain a better understanding of each mode, please refer to the Usage examples provided in this document.

This repository contains the user package and the source code of the Monte Carlo particle flitering inference model of the "BeatNet" music online joint beat/downbeat/tempo/meter tracking system. The arxiv version of the original ISMIR-2021 paper:

In addition to the proposed online inference, we added madmom's DBN beat/downbeat inference model for the offline usages. Note that, the offline model still utilize BeatNet's neural network rather than that of Madmom which leads to better performance and significantly faster results.

Note: All pre-trained models are included in the models folder. The official training script is now part of this repository (see Training section below).

System Input:

Raw audio waveform object or directory.

By using the audio directory as the system input, the system automatically resamples the audio file to 22050 Hz. However, in the case of using an audio object as the input, make sure that the audio sample rate is equal to 22050 Hz.

System Output:

A vector including beats and downbeats columns, respectively with the following shape: numpy_array(num_beats, 2).

Input Parameters:

model: An scalar in the range [1,3] to select which pre-trained CRNN models to utilize.

mode: An string to determine the working mode. i.e. 'stream', 'realtime', 'online' and 'offline'.

inference model: A string to choose the inference approach. i.e. 'PF' standing for Particle Filtering for causal inferences and 'DBN' standing for Dynamic Bayesian Network for non-causal usages.

plot: A list of strings to plot. It can include 'activations', 'beat_particles' and 'downbeat_particles' Note that to speed up plotting the figures, rather than new plots per frame, the previous plots get updated. However, to secure realtime results, it is recommended to not plot or have as less number of plots as possible at the time.

thread: To decide whether accomplish the inference at the main thread or another thread.

device: Type of device being used. Cuda or cpu (by default).

Installation command:

Approach #1: Installing binaries from the pypi website:

pip install BeatNet

Approach #2: Installing directly from the Git repository:

pip install git+https://github.com/mjhydri/BeatNet

Note: Before installing the BeatNet make sure Librosa and Madmom packages are installed. Also, pyaudio is a python binding for Portaudio to handle audio streaming. If Pyaudio is not installed in your machine, depending on your machine type either install it thorugh pip (Mac OS and Linux) or download an appropriate version for your machine (Windows) from here. Then, navigate to the file location through commandline and use the following command to install the wheel file locally:

pip install <Pyaduio_file_name.whl>

Usage example 1 (Streaming mode):

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='stream', inference_model='PF', plot=[], thread=False)

Output = estimator.process()

*In streaming usage cases, make sure to feed the system with as loud input as possible to leverage the maximum streaming performance, given all models are trained on the datasets containing mastered songs.

Usage example 2 (Realtime mode):

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='realtime', inference_model='PF', plot=['beat_particles'], thread=False)

Output = estimator.process("audio file directory")

Usage example 3 (Online mode):

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='online', inference_model='PF', plot=['activations'], thread=False)

Output = estimator.process("audio file directory")

Usage example 4 (Offline mode):

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='offline', inference_model='DBN', plot=[], thread=False)

Output = estimator.process("audio file directory")

Training:

The official training pipeline is included in this repository. Training involves three steps: data preparation, training, and evaluation.

Prerequisites

Install the required packages:

pip install BeatNet

Or install from source:

pip install -e .

Note on madmom compatibility: madmom 0.16.1 has compatibility issues with Python >= 3.10 and NumPy >= 1.24. If you encounter ImportError: cannot import name 'MutableSequence' from 'collections' or AttributeError: module 'numpy' has no attribute 'float', you can either (a) use Python 3.9, or (b) apply the following fixes to your madmom installation:

In madmom/processors.py, change from collections import MutableSequence to from collections.abc import MutableSequence
For numpy alias errors in compiled Cython extensions, add this to your Python's sitecustomize.py:

import numpy as np
if not hasattr(np, 'float'): np.float = np.float64
if not hasattr(np, 'int'): np.int = np.int_

Step 1: Prepare Data

Organize your raw dataset with the following structure:

raw_datasets/
    ballroom/
        audio/ChaChaCha/track001.wav
        audio/Waltz/track002.wav
        ...
        annotations/track001.beats
        annotations/track002.beats
        ...
    gtzan/
        audio/blues/track001.wav
        ...
        annotations/track001.beats
        ...

The .beats annotation format is one line per beat: <time_in_seconds> <beat_number>, where beat_number == 1 indicates a downbeat.

Extract features and annotations:

python -m BeatNet.prepare_data --config src/BeatNet/configs/default.yaml \
    --raw_dir /path/to/raw_datasets \
    --dataset BALLROOM GTZAN BEATLES CMR ROCK_CORPUS

This creates pickled per-track feature files under ./data/ (configurable via --data_dir).

Step 2: Train

python -m BeatNet.train --config src/BeatNet/configs/default.yaml

Key configuration options (set in configs/default.yaml or as CLI overrides):

python -m BeatNet.train --config src/BeatNet/configs/default.yaml \
    learning_rate=0.001 batch_size=128 device=cuda

Parameter	Default	Description
`batch_size`	200	Training batch size
`learning_rate`	5e-4	Adam optimizer learning rate
`seq_len`	400	Training sequence length in frames (8 seconds at 50fps)
`max_epochs`	10000	Maximum training epochs
`patience`	20	Early stopping patience (epochs)
`class_weights`	[50, 400, 5]	Cross-entropy weights for [beat, downbeat, non-beat]
`checkpoint_every`	10	Validate and save every N epochs
`val_inference`	DBN	Inference method for validation (DBN or PF)
`device`	cpu	Device (cpu, cuda, cuda:0, mps)

Training outputs are saved to ./output/ (configurable via output_dir):

best_model_weights.pt -- Best model weights (by validation beat F-measure), directly loadable by the inference code
checkpoint_epoch_N.pt -- Full checkpoints (model + optimizer state) for resuming
tensorboard/ -- TensorBoard training logs

To resume training from a checkpoint:

python -m BeatNet.train --config src/BeatNet/configs/default.yaml \
    --resume output/checkpoint_epoch_100.pt

Monitor training progress with TensorBoard:

tensorboard --logdir output/tensorboard

Step 3: Use Trained Weights for Inference

The saved best_model_weights.pt is directly compatible with the inference code:

from BeatNet.BeatNet import BeatNet
import torch

estimator = BeatNet(1, mode='online', inference_model='PF', plot=[])
estimator.model.load_state_dict(
    torch.load('output/best_model_weights.pt', map_location='cpu'), strict=False
)
output = estimator.process("audio_file.wav")

Testing:

The repository includes a test suite that validates the full training pipeline using synthetic toy data:

python test/test_training.py

The test suite covers:

Ground truth construction and annotation parsing
Dataset loading in training mode (random crops) and validation mode (full tracks)
Model forward pass shape and statelessness
Training loop convergence (loss decreases over epochs)
Validation pipeline (model -> DBN decoding -> F-measure evaluation)
Weight save/load compatibility with inference code
End-to-end pipeline (data creation -> dataset building -> training -> validation -> weight export)

Project Structure:

src/BeatNet/
    BeatNet.py                  # Main inference handler
    model.py                    # BDA neural network (with train_forward for training)
    log_spect.py                # Log-spectrogram feature extraction
    common.py                   # Feature module base class
    particle_filtering_cascade.py  # Particle filter inference
    train.py                    # Training script (entry point)
    dataset.py                  # PyTorch Dataset and data splitting
    prepare_data.py             # Data preparation (feature extraction + annotations)
    configs/
        default.yaml            # Default training configuration
    models/
        model_1_weights.pt      # Pre-trained weights (GTZAN)
        model_2_weights.pt      # Pre-trained weights (Ballroom)
        model_3_weights.pt      # Pre-trained weights (Rock Corpus)
test/
    test_training.py            # Training pipeline test suite

Video Tutorial:

1: In this tutorial, we explain the BeatNet mechanism.

Video Demos:

In order to demonstrate the performance of the system for different beat/donbeat tracking difficulties, here are three video demo examples :

1: Song Difficulty: Easy

2: Song difficulty: Medium

3: Song difficulty: Veteran

Acknowledgements:

For the input feature extraction and the raw state space generation, Librosa and Madmom libraries are ustilzed respectively. Many thanks for their great jobs. This work has been partially supported by the National Science Foundation grants 1846184 and DGE-1922591.

arXiv 2108.03576

Cite:

@inproceedings{heydari2021beatnet,
  title={BeatNet: CRNN and Particle Filtering for Online Joint Beat Downbeat and Meter Tracking},
  author={Heydari, Mojtaba and Cwitkowitz, Frank and Duan, Zhiyao},
  journal={22th International Society for Music Information Retrieval Conference, ISMIR},
  year={2021}
}

@inproceedings{heydari2021don,
  title={Don’t look back: An online beat tracking method using RNN and enhanced particle filtering},
  author={Heydari, Mojtaba and Duan, Zhiyao},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={236--240},
  year={2021},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
dist		dist
src		src
test		test
.DS_Store		.DS_Store
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BeatNet: Real-time and Offline Joint Music Beat, Downbeat, Tempo, and Meter Tracking System

System Input:

System Output:

Input Parameters:

Installation command:

Usage example 1 (Streaming mode):

Usage example 2 (Realtime mode):

Usage example 3 (Online mode):

Usage example 4 (Offline mode):

Training:

Prerequisites

Step 1: Prepare Data

Step 2: Train

Step 3: Use Trained Weights for Inference

Testing:

Project Structure:

Video Tutorial:

Video Demos:

Acknowledgements:

Cite:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BeatNet: Real-time and Offline Joint Music Beat, Downbeat, Tempo, and Meter Tracking System

System Input:

System Output:

Input Parameters:

Installation command:

Usage example 1 (Streaming mode):

Usage example 2 (Realtime mode):

Usage example 3 (Online mode):

Usage example 4 (Offline mode):

Training:

Prerequisites

Step 1: Prepare Data

Step 2: Train

Step 3: Use Trained Weights for Inference

Testing:

Project Structure:

Video Tutorial:

Video Demos:

Acknowledgements:

Cite:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages