diive is currently under active developement with frequent updates.
diive is a Python library for time series processing, in particular ecosystem data. Originally developed
by the ETH Grassland Sciences group for Swiss FluxNet.
Recent updates: CHANGELOG Recent releases: Releases
diive/
├── core/ # Foundational utilities shared across the library
│ ├── base/ # FlagBase — base class for quality and outlier flags
│ ├── dfun/ # DataFrame helpers: stats, regression, bin fitting
│ ├── funcs/ # Miscellaneous utility functions
│ ├── io/ # File detection, reading (CSV, EddyPro, TOA5), parquet I/O
│ ├── ml/ # MlRegressorGapFillingBase — base class for RF/XGBoost gap-filling
│ ├── plotting/ # Heatmaps, time series, scatter, histograms, ridge lines, cumulatives
│ ├── times/ # Timestamp sanitization, frequency detection, vectorization, resampling
│ └── utils/ # Helper utilities
│
└── pkgs/ # Domain-specific algorithms
├── analyses/ # Correlation, GridAggregator, GapFinder, decoupling, quantiles
├── binary/ # Binary-encoded value extraction
├── corrections/ # Offset, radiation, RH, wind direction corrections
├── createvar/ # DaytimeNighttimeFlag, VPD, ET, TimeSince, potential radiation
├── echires/ # High-resolution eddy covariance: FluxDetectionLimit, WindRotation2D
├── fits/ # BinFitterCP
├── flux/ # USTAR thresholds, self-heating correction, flux uncertainty
├── fluxprocessingchain/ # Orchestrated Level-2 through Level-4 flux workflows
├── formats/ # FLUXNET and EddyPro file format conversions
├── gapfilling/ # XGBoostTS, RandomForestTS, long-term multi-year gap-filling, FluxMDS, linear interpolation
├── outlierdetection/# Hampel, z-score, LOF, absolute limits, stepwise detection
└── qaqc/ # FlagQCF, EddyPro flags, StepwiseMeteoScreeningDb
| Package | Key classes / functions | Description |
|---|---|---|
diive.core.base |
FlagBase |
Base class for building quality and outlier flags; provides flag encoding, filtering, and visualization |
diive.core.ml |
FeatureEngineer, MlRegressorGapFillingBase |
Standalone feature engineering (8-stage pipeline) and base class for ML gap-filling (RF, XGBoost); separate feature engineering from model training for better reusability |
diive.core.io |
DataFileReader, MultiDataFileReader, ReadFileType, FileSplitter |
Read single or multiple instrument files (CSV, EddyPro, TOA5); detect file structure; split large files; load/save Parquet |
diive.core.plotting |
HeatmapDateTime, HeatmapXYZ, HexbinPlot, TimeSeries, ScatterXY, HistogramPlot, DielCycle, RidgeLinePlot, CumulativeYear |
Comprehensive visualization suite covering heatmaps, time series, scatter, histograms, diurnal cycles, ridge lines, hexbin plots, and cumulative plots |
diive.core.times |
TimestampSanitizer, DetectFrequency, vectorize_timestamps(), continuous_timestamp_freq() |
Sanitize and validate timestamps, detect/infer data frequency, vectorize time attributes, resample diel cycles |
diive.core.dfun |
sstats(), fit_to_bins_linreg(), fit_to_bins_polyreg() |
DataFrame statistics, linear/polynomial bin fitting, regression utilities |
diive.pkgs.gapfilling |
XGBoostTS, RandomForestTS, QuickFillRFTS, LongTermGapFillingRandomForestTS, LongTermGapFillingXGBoostTS, FluxMDS |
Fill time series gaps with XGBoost, Random Forest (standard and long-term multi-year), MDS, or linear interpolation |
diive.pkgs.outlierdetection |
HampelDaytimeNighttime, zScore, zScoreDaytimeNighttime, LocalOutlierFactorAllData, AbsoluteLimits, AbsoluteLimitsDaytimeNighttime |
Detect and flag outliers using Hampel filter, z-score, LOF, absolute limits, local SD, manual removal, or stepwise combinations |
diive.pkgs.flux |
FluxProcessingChain |
Post-process eddy covariance fluxes: Level-2 quality flags, storage correction, USTAR filtering, gap-filling (RF/XGBoost/MDS), self-heating correction |
diive.pkgs.fluxprocessingchain |
FluxProcessingChain |
Orchestrate a complete Level-2 → Level-4 flux processing workflow in a single pipeline |
diive.pkgs.analyses |
GapFinder, GridAggregator, daily_correlation(), SeasonalTrendDecomposition |
Locate data gaps, aggregate variables into 2-D grids, compute daily correlations, decoupling analysis, quantiles, seasonal-trend decomposition |
diive.pkgs.corrections |
OffsetCorrection, WindDirectionOffset, SetToThreshold, SetToMissing |
Apply measurement offsets, correct wind directions, clamp values to thresholds, set periods to missing |
diive.pkgs.createvar |
DaytimeNighttimeFlag, TimeSince, calc_vpd_from_ta_rh(), et_from_le(), potrad() |
Derive new variables: daytime/nighttime flags, VPD, ET, time-since-event, potential radiation |
diive.pkgs.qaqc |
FlagQCF, StepwiseMeteoScreeningDb |
Manage FLUXNET quality control flags; apply stepwise meteorological screening |
diive.pkgs.echires |
FluxDetectionLimit, WindRotation2D, MaxCovariance |
Process 20 Hz eddy covariance data: detection limits, 2-D wind rotation, maximum covariance lag |
diive.pkgs.formats |
FormatEddyProFluxnetFileForUpload, FormatMeteoForEddyProFluxProcessing |
Convert EddyPro output to FLUXNET submission format; prepare meteorological data for EddyPro |
diive.pkgs.fits |
BinFitterCP |
Fit data to bins using cumulative-probability approach |
- For many examples see notebooks here: Notebook overview
- More notebooks are added constantly.
- Daily correlation: calculate daily correlation between two time
series · func:
daily_correlation()(notebook example) - Decoupling: Investigate binned aggregates (median) of a variable z in binned classes of x and y (notebook example)
- Data gaps identification · class:
GapFinder(notebook example) - Grid aggregator: calculate z-aggregates in bins (classes) of x and
y · class:
GridAggregator(notebook example) - Histogram calculation: calculate histogram from Series (notebook example)
- Optimum range: find x range for optimum y
- Percentiles: Calculate percentiles 0-100 for series (notebook example)
- Seasonal-Trend Decomposition: Separate time series into trend, seasonal, and residual components using STL (Seasonal-Trend Loess), classical, or harmonic methods · class:
SeasonalTrendDecomposition(notebook example)
- Offset correction for measurement: correct measurement by offset in comparison to
replicate · class:
OffsetCorrection(notebook example) - Offset correction radiation: correct nighttime offset of radiation data and set nighttime to zero
- Offset correction relative humidity: correct RH values > 100%
- Offset correction wind direction: correct wind directions by offset, calculated based on reference time
period · class:
WindDirectionOffset(notebook example) - Set to threshold: set values above or below a threshold value to threshold value · class:
SetToThreshold - Set exact values to missing: set exact values to missing
records · class:
SetToMissing(notebook example)
Functions to create various variables.
- Time since: calculate time since last occurrence, e.g. since last
precipitation · class:
TimeSince(notebook example) - Daytime/nighttime flag: calculate daytime flag, nighttime flag and potential radiation from latitude and
longitude · class:
DaytimeNighttimeFlag(notebook example) - Vapor pressure deficit: calculate VPD from air temperature and
RH · func:
calc_vpd_from_ta_rh()(notebook example) - Calculate ET from LE: calculate evapotranspiration from latent heat
flux · func:
et_from_le()(notebook example) - Calculate air temperature from sonic anemometer temperature · func:
air_temp_from_sonic_temp()(notebook example)
- Flux detection limit: calculate flux detection limit from high-resolution data (20 Hz) · class:
FluxDetectionLimit - Maximum covariance: find maximum covariance between turbulent wind and scalar · class:
MaxCovariance - Turbulence: wind rotation to calculate turbulent departures of wind components and scalar (e.g. CO2) · class:
WindRotation2D
Input/output functions.
- Detect files: detect expected and unexpected (irregular) files in a list of files · class:
FileDetector - Split files: split multiple files into smaller parts and export them as (compressed) CSV files · class:
FileSplitter - Read single data files: read file using
parameters · class:
DataFileReader(notebook example) - Read single data files: read file using pre-defined
filetypes · class:
ReadFileType(notebook example) - Read multiple data files: read files using pre-defined
filetype · class:
MultiDataFileReader(notebook example)
- Bin fitter · class:
BinFitterCP(notebook example)
Function specifically for eddy covariance flux data.
- Flux processing chain · class:
FluxProcessingChain(notebook example)- The notebook example shows the application of:
- Post-processing of eddy covariance flux data.
- Level-2 quality flags
- Level-3.1 storage correction
- Level-3.2 outlier removal
- Level-3.3: USTAR filtering using constant thresholds
- Level-4.1: gap-filling using long-term random forest, XGBoost, and/or MDS
- For info about the Swiss FluxNet flux levels, see here.
- The notebook example shows the application of:
- **Quick flux processing chain ** (notebook example)
- Flux detection limit: calculate flux detection limit from high-resolution eddy covariance
data · class:
FluxDetectionLimit(notebook example) - Self-heating correction for open-path IRGA NEE fluxes:
- create scaling factors table and apply to correct open-path NEE fluxes during a time period of parallel measurements (notebook example)
- apply previously created scaling factors table to long-term open-path NEE flux data, outside the time period of parallel measurements (notebook example)
- USTAR threshold scenarios: display data availability under different USTAR threshold scenarios
Format data to specific formats.
- Format: convert EddyPro fluxnet output files for upload to FLUXNET
database · class:
FormatEddyProFluxnetFileForUpload(notebook example) - Parquet files: load and save parquet
files · funcs:
load_parquet(),save_parquet()(notebook example)
Fill gaps in time series with various methods.
Feature Engineering (v0.91.0) · class: FeatureEngineer
-
Standalone 8-stage feature engineering pipeline (composable, reusable across models)
- Stage 1: Lagged features from past and future values
- Stage 2: Rolling statistics (mean, std, median, min, max, quartiles)
- Stage 3: Temporal differencing (1st and 2nd order momentum)
- Stage 4: Exponential Moving Average (EMA) with recent-value emphasis
- Stage 5: Polynomial expansion (squared, cubed terms)
- Stage 6: STL decomposition (trend, seasonal, residual components)
- Stage 7: Timestamp vectorization (season, month, hour, etc.)
- Stage 8: Continuous record numbering for trend detection
-
Pre-engineer features once, reuse across multiple models (RF + XGB simultaneously)
-
Independent testing and debugging of feature engineering
-
XGBoostTS · class:
XGBoostTS(notebook example (minimal), notebook example (more extensive))- Use
FeatureEngineerto create features, pass pre-engineered data to XGBoostTS
- Use
-
RandomForestTS · class:
RandomForestTS(notebook example)- Use
FeatureEngineerto create features, pass pre-engineered data to RandomForestTS
- Use
-
Long-term gap-filling using RandomForestTS · class:
LongTermGapFillingRandomForestTS(notebook example) -
Long-term gap-filling using XGBoostTS · class:
LongTermGapFillingXGBoostTS(for multi-year data with USTAR scenario support) -
Linear interpolation · func:
linear_interpolation()(notebook example) -
Quick random forest gap-filling · class:
QuickFillRFTS(notebook example) -
MDS gap-filling of ecosystem fluxes · class:
FluxMDS(notebook example), approach by Reichstein et al., 2005
- FluxProcessingChain examples for CO2 half-hourly flux (NEE) gap-filling:
- Both Random Forest and XGBoost examples are fully activated and comprehensively documented
- Optimized feature engineering for diurnal photosynthetic patterns (lag, rolling, EMA, STL decomposition)
- Feature reduction enabled by default (SHAP-based selection reduces ~45-50 features to ~10-20)
- Hyperparameters tuned for ecosystem flux data with detailed tuning guidance
- Model comparison code to select best algorithm for your site
- See
diive/pkgs/fluxprocessingchain/fluxprocessingchain.pyfor detailed examples (~100 lines each)
- Step-wise outlier detection: combine multiple outlier flags to one single overall flag
Create single outlier flags where 0=OK and 2=outlier.
- Absolute limits: define absolute
limits · class:
AbsoluteLimits(notebook example) - Absolute limits daytime/nighttime: define absolute limits separately for daytime and nighttime
data · class:
AbsoluteLimitsDaytimeNighttime(notebook example) - Hampel filter daytime/nighttime, separately for daytime and nighttime
data · class:
HampelDaytimeNighttime(notebook example) - Local standard deviation: Identify outliers based on the local standard deviation from a running median (notebook example)
- Local outlier factor: Identify outliers based on local outlier factor, across all
data · class:
LocalOutlierFactorAllData(notebook example) - Local outlier factor daytime/nighttime: Identify outliers based on local outlier factor, daytime nighttime separately (notebook example)
- Manual removal: Remove time periods (from-to) or single records from time series (notebook example)
- Missing values: Simply creates a flag that indicated available and missing data in a time
series · class:
MissingValues(notebook example) - Trimming: Remove values below threshold and remove an equal amount of records from high end of data (notebook example)
- z-score: Identify outliers based on the z-score across all time series
data · class:
zScore(notebook example) - z-score increments daytime/nighttime: Identify outliers based on the z-score of double increments (notebook example)
- z-score daytime/nighttime: Identify outliers based on the z-score, separately for daytime and
nighttime · class:
zScoreDaytimeNighttime(notebook example) - z-score rolling: Identify outliers based on the rolling z-score (notebook example)
- Cumulatives across all years for multiple variables · class:
Cumulative(notebook example) - Cumulatives per year · class:
CumulativeYear(notebook example) - Diel cycle per month · class:
DielCycle(notebook example) - Heatmap date/time: showing values (z) of time series as date (y) vs time (
x) · class:
HeatmapDateTime(notebook example) - Heatmap year/month: plot monthly ranks across
years · class:
HeatmapYearMonth(notebook example) - Heatmap XYZ: show z-values in bins of x and y — pairs naturally with
GridAggregator· class:HeatmapXYZ(notebook example) - Hexbin plot: aggregate flux values into 2D hexagonal bins of driver variables; supports percentile normalization and configurable aggregation functions · class:
HexbinPlot(notebook example) - Histogram: includes options to show z-score limits and to highlight the peak distribution
bin · class:
HistogramPlot(notebook example) - Long-term anomalies: calculate and plot long-term anomaly for a variable, per year, compared to a reference
period · class:
LongtermAnomaliesYear(notebook example) - Ridgeline plot: looks a bit like a
landscape · class:
RidgeLinePlot(notebook example) - Time series plot: Simple (interactive) time series
plot · class:
TimeSeries(notebook example) - ScatterXY plot · class:
ScatterXY(notebook example) - Various classes to generate heatmaps, bar plots, time series plots and scatter plots, among others
- Stepwise MeteoScreening from database · class:
StepwiseMeteoScreeningDb(notebook example)
- Diel cycle: calculate diel cycle per
month · func:
diel_cycle()(notebook example)
- Time series stats · func:
sstats()(notebook example)
- Continuous timestamp: create continuous timestamp based on number of records in the file and the file duration ·
func:
continuous_timestamp_freq() - Time resolution: detect time resolution from
data · class:
DetectFrequency(notebook example) - Timestamps: create and insert additional timestamps in various formats · class:
TimestampSanitizer - Vectorize timestamps: add date attributes as columns to dataframe, including sine/cosine variants fpr cyclical
variables (e.g., day of
year) · func:
vectorize_timestamps()(notebook example)
diive is currently under active developement using Python v3.11.
pip install diive
poetry add diive
Directly use .tar.gz file of the desired version.
pip install https://github.com/holukas/diive/archive/refs/tags/v0.76.2.tar.gz
One way to install and use diive with a specific Python version on a local machine:
- Install miniconda
- Start
minicondaprompt - Create a environment named
diive-envthat contains Python 3.11:conda create --name diive-env python=3.11 - Activate the new environment:
conda activate diive-env - Install
diiveusing pip:pip install diive - To start JupyterLab type
jupyter labin the prompt
