DeepDEM

Satellite stereo DEM refinement using deep learning

Note

This files in this repository will be restructured as the workflow for running experiments with different models has changed, so multiple layers of folders should be removed.

Useful concepts to familiarize oneself with first

Demo notebook
Ames Stereo Pipeline (ASP) and very-high-resolution (VHR) satellite stereo processing
torchgeo
ResDepth
PyTorch / PyTorch Lightning

Instructions for training

Installation

Run mamba env create --file environment.yml
To run DeepDEM command line tools, the original ResDepth UNet model needs to be available to the module (see demo notebook), so please be sure to git clone the ResDepth repository and provide path to corresponding DeepDEM codebase!
Follow the ASP installation instructions to create a separate ASP installation (latest stable/nightly release; conda environment or download directly and add to path manually) as conda environment is not compatible

Preparing the input datasets

Follow the steps outlined in dataset_processing/baker_csm/baker_csm_processing.ipynb to start

TODO: Integrate revised scripts to work with new pairs

Training the network

Run train.py with modifications:

python train.py Run with LightningCLI (in progress, encouraged):
Check available parameter and model and dataset settings
Run with commands like python train_cli.py fit --data.train_directory="/path/to/training/data" --data.val_directory="/path/to/validation/data" --model.normalization="meanstd" ...

Inference and evaluation

Run inference from a checkpoint:

python inference.py tb_logs/resdepth_torchgeo/version_101/checkpoints/last.ckpt meanstd inference_v101_last_training_on_SCG.tif (This may change to model output postprocessing / statistics and figures script in combination with the LightningCLI predict subcommand)

Evaluation (besides inspection with QGIS or preferred raster workflow)

Evaluation notebook that generates stats and figures coming soon

Overview of key files

Data directory (outside the repository) contains
- IMPORTANT NOTE: Used Symlinks to mosaic, usgs_all616_laz_filtered_dem_mask_nlcd_rock_exclude_glaciers, shashank_data, other data files that are too large to live in the repository!
- Experiment logs & model checkpoints (tb_logs/)
.gitignore excludes most types geospatial files from showing up in git but tracking some of these is important, for example reference DEM rasters used in stereo processing are included.
configs/ should store multiple experiment definitions that will be run & saved in logging folder
train_cli.py and train.py are how to launch training runs
- train_cli.py with default arguments python train_cli.py fit --print_config is not as helpful as looking at the example YAML file for Mt Baker training/validation split in configs/
resdepth_lightning_module.py wraps the model, optimizer, training and validation steps
tgdsm_lightning_data_module.py wraps the training and validation DataLoaders
torchgeo_dataset.py is not required moving forward, but is base dataset implementation to feed stacks of aligned rasters to PyTorch
- This implementation could be modified to remove hardcoded filenames for the different layers
scg_stereo.sh and stereo notebook- changing to generic stereo script or stick with notebook workflow?
TODO Dataset inspection notebook revised check_and_plot_tiles.ipynb / visualize_torchgeo_datasets.ipynb
environment.yml for future compatibility, keep versions as up-to-date as possible for dependencies like PyTorch, TorchGeo, Lightning, etc.

Suggestions for a fresh start

With all the updates to torchgeo and pytorch-lightning that appeared since the beginning of the project it is now easier to deal with handling stacks of multiple datasets and also dealing with many possible configuration parameters, etc.
For multiple datasets and combinations of layers it will be important to pull that sort of configuration out of code files and into the YAML configs. Modules will have to be rewritten to normalize the correct layer given its index in the tensor and desired method/parameters, etc.

How do I ...?

Start a training run

Add a new dataset

Find your lidar or other desired training target (assuming point cloud format, but workflow should be compatible with rasters). For Mount Baker 2015 we downloaded, filtered, and merged the USGS 1km x 1km point clouds (USGS Entwine is likely a simpler source), and with South Cascade 2019 the starting point was a single point cloud.
Set up stereo directory with L1B images and camera model files (e.g. DigitalGlobe .r100.xml) files.

Train a model

Current way:
Edit the code and set up the desired dataset, parameters to use, model, and so on, then:
python train.py path/to/previous/checkpoint.ckpt
New and better way in progress to pull configuration choices out of the code:
python train_cli.py fit -c path/to/config.yaml
Define the configuration file as seen in configs/example_config.yml and/or a combination

View logs in tensorboard

For experiments <= 145 generated with train.py: use tensorboard serve --logdir path/to/data/torchgeo_experiments/tb_logs Experiments with > 145 will likely be saved in lightning_logs

Run trained model on some new input

Find checkpoints here on Google Drive TODO paste new link
Note that these were trained for a specific set of inputs (in order) and so serve as a starting point. It might be possible to copy weights & add channels

Evaluate stereo processing and refinement products in QGIS?

Download the desired layers and set up appearance under Symbology
A helpful way to determine which areas were used for training and which were validation/test patches: use a shapefile geojson (similar to those provided by USGS for their lidar tiles), or create a binary mask from a mosaiced validation dataset raster (using gdal_merge.py or similar approach) to overlay with different colors.
For input/output DEMs and DEM differences, set resampling to Cubic for viewing purposes
Hillshade - choose between GDAL (gdaldem hillshade) and QGIS choice
For difference maps: in Symbology set Min and Max e.g. -1 m and +1 m, Color ramp red to blue
Orthoimages may need to be stretched / range limited depending on the goal.
Triangulation error with a small maximum value is helpful to indicate hotspots of larger intersection errors
Copy and Paste Styles as desired to get consistent appearance & scales across layers
It is useful to include other layers for context like the rasterized lidar timestamps and land cover classes

More:

Compile statistics about output
Change configuration of parameters/data/model/etc.
Log metrics, or add my own?
Add a new model, input, loss function, etc.
Deal with temporal offset with my training data?
...

Link to writeup and roadmap

... coming soon ...

Name		Name	Last commit message	Last commit date
parent directory ..
ResDepth		ResDepth
configs		configs
dataset_processing		dataset_processing
new_dir/DeepDEM/scripts		new_dir/DeepDEM/scripts
notebooks		notebooks
README.md		README.md
environment.yml		environment.yml
environment_torchgeo041_no_gpu.yml		environment_torchgeo041_no_gpu.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

DeepDEM

Note

Useful concepts to familiarize oneself with first

Instructions for training

Installation

Preparing the input datasets

Training the network

Inference and evaluation

Overview of key files

Suggestions for a fresh start

How do I ...?

Start a training run

Add a new dataset

Train a model

View logs in tensorboard

Run trained model on some new input

Evaluate stereo processing and refinement products in QGIS?

Link to writeup and roadmap

FilesExpand file tree

archive

Directory actions

More options

Directory actions

More options

Latest commit

History

archive

Folders and files

parent directory

README.md

DeepDEM

Note

Useful concepts to familiarize oneself with first

Instructions for training

Installation

Preparing the input datasets

Training the network

Inference and evaluation

Overview of key files

Suggestions for a fresh start

How do I ...?

Start a training run

Add a new dataset

Train a model

View logs in tensorboard

Run trained model on some new input

Evaluate stereo processing and refinement products in QGIS?

Link to writeup and roadmap