This repository contains a Python script and configuration file to train, evaluate, and visualize a model that predicts future values of hydrological time-series at the Geneve, Halle de l'Ile station using a multilayer perceptron (MLP).
The model learns to forecast a fixed number of hours into the future based on:
- Historical station measurements
- Time-of-day, day-of-month, and month features
| File | Description |
|---|---|
train_model.py |
Main training, evaluation, and plotting script |
utils.py |
Routine functions |
config.toml |
All model, training, and data configuration parameters |
environment.yml |
Conda environment packages |
README.md |
This documentation |
The script performs the following steps:
- Reads hyperparameters and paths from
config.toml - Loads time-series data from an HDF5 file
- Builds a MLP model in PyTorch
- Trains the model (or loads an existing one)
- Predicts values
n_hours_predictioninto the future - Evaluates performance on validation/test data
- Saves:
- The trained model
- Loss curves
- Prediction vs truth plots
- Diagnostic statistics
The model predicts a vector of future values (e.g., the next 24 hours) for multiple hydrological stations simultaneously.
[model]
dim_hidden = 1024
n_layers = 8
activation = "LeakyReLU"[training]
epochs = 1000
batch_size = 512
learning_rate = 0.001
patience = 50
min_delta = 0.001
loss = "HuberLoss"
weight_decay = 0[run]
seed = 2234
gpu = 1
data_fraction = 1
train_split = 0.8
valid_split = 0.1
n_hours_prediction = 24
read_model = 0
station_features = ["2009", "2606", "2174", "2170", "2027"]
time_features = ["Hour", "Day", "Month"][input_output]
input_data = "/path/to/data.hdf5"
output_folder = "/path/to/results"
model_name = "best_model"
overwrite = 1
n_check_plots = 100The easiest way to get the code to work is to install a conda environment via
conda env create -f environment.yml
This will create a torch environment you can access by running
conda activate torch
Open the config.toml file and set:
input_data = "/absolute/path/to/your/data.hdf5" output_folder = "/absolute/path/where/results/should/be/saved"
If you created the conda environment, access it. Then, from the project directory, run:
python train_model.py config.toml
Once the code finishes, inside the output_folder the script will create the following nested structure
output_folder/
+-- config_used.toml
+-- best_model.pth
+-- feature_scaler.joblib
+-- target_scaler.joblib
|
+-- plots/
| +-- Performance.pdf
| +-- Target_vs_prediction.pdf
|
+-- std.txt
+-- mean_relative_difference_over_time.txt
+-- loss_values.txt