Satellite stereo DEM refinement using deep learning
This files in this repository will be restructured as the workflow for running experiments with different models has changed, so multiple layers of folders should be removed.
- Demo notebook
- Ames Stereo Pipeline (ASP) and very-high-resolution (VHR) satellite stereo processing
- torchgeo
- ResDepth
- PyTorch / PyTorch Lightning
- Run
mamba env create --file environment.yml - To run DeepDEM command line tools, the original ResDepth UNet model needs to be available to the module (see demo notebook), so please be sure to git clone the ResDepth repository and provide path to corresponding DeepDEM codebase!
- Follow the ASP installation instructions to create a separate ASP installation (latest stable/nightly release; conda environment or download directly and add to path manually) as conda environment is not compatible
Follow the steps outlined in dataset_processing/baker_csm/baker_csm_processing.ipynb to start
TODO: Integrate revised scripts to work with new pairs
Run train.py with modifications:
python train.pyRun with LightningCLI (in progress, encouraged):- Check available parameter and model and dataset settings
- Run with commands like
python train_cli.py fit --data.train_directory="/path/to/training/data" --data.val_directory="/path/to/validation/data" --model.normalization="meanstd" ...
Run inference from a checkpoint:
python inference.py tb_logs/resdepth_torchgeo/version_101/checkpoints/last.ckpt meanstd inference_v101_last_training_on_SCG.tif(This may change to model output postprocessing / statistics and figures script in combination with the LightningCLIpredictsubcommand)
Evaluation (besides inspection with QGIS or preferred raster workflow)
- Evaluation notebook that generates stats and figures coming soon
- Data directory (outside the repository) contains
- IMPORTANT NOTE: Used Symlinks to mosaic, usgs_all616_laz_filtered_dem_mask_nlcd_rock_exclude_glaciers, shashank_data, other data files that are too large to live in the repository!
- Experiment logs & model checkpoints (tb_logs/)
.gitignoreexcludes most types geospatial files from showing up in git but tracking some of these is important, for example reference DEM rasters used in stereo processing are included.configs/should store multiple experiment definitions that will be run & saved in logging foldertrain_cli.pyandtrain.pyare how to launch training runs- train_cli.py with default arguments
python train_cli.py fit --print_configis not as helpful as looking at the example YAML file for Mt Baker training/validation split inconfigs/
- train_cli.py with default arguments
resdepth_lightning_module.pywraps the model, optimizer, training and validation stepstgdsm_lightning_data_module.pywraps the training and validation DataLoaderstorchgeo_dataset.pyis not required moving forward, but is base dataset implementation to feed stacks of aligned rasters to PyTorch- This implementation could be modified to remove hardcoded filenames for the different layers
scg_stereo.shand stereo notebook- changing to generic stereo script or stick with notebook workflow?- TODO Dataset inspection notebook revised
check_and_plot_tiles.ipynb/visualize_torchgeo_datasets.ipynb - environment.yml for future compatibility, keep versions as up-to-date as possible for dependencies like PyTorch, TorchGeo, Lightning, etc.
- With all the updates to torchgeo and pytorch-lightning that appeared since the beginning of the project it is now easier to deal with handling stacks of multiple datasets and also dealing with many possible configuration parameters, etc.
- For multiple datasets and combinations of layers it will be important to pull that sort of configuration out of code files and into the YAML configs. Modules will have to be rewritten to normalize the correct layer given its index in the tensor and desired method/parameters, etc.
- Find your lidar or other desired training target (assuming point cloud format, but workflow should be compatible with rasters). For Mount Baker 2015 we downloaded, filtered, and merged the USGS 1km x 1km point clouds (USGS Entwine is likely a simpler source), and with South Cascade 2019 the starting point was a single point cloud.
- Set up stereo directory with L1B images and camera model files (e.g. DigitalGlobe
.r100.xml) files.
- Current way:
- Edit the code and set up the desired dataset, parameters to use, model, and so on, then:
python train.py path/to/previous/checkpoint.ckpt- New and better way in progress to pull configuration choices out of the code:
python train_cli.py fit -c path/to/config.yaml- Define the configuration file as seen in
configs/example_config.ymland/or a combination
For experiments <= 145 generated with train.py: use tensorboard serve --logdir path/to/data/torchgeo_experiments/tb_logs
Experiments with > 145 will likely be saved in lightning_logs
- Find checkpoints here on Google Drive TODO paste new link
- Note that these were trained for a specific set of inputs (in order) and so serve as a starting point. It might be possible to copy weights & add channels
- Download the desired layers and set up appearance under Symbology
- A helpful way to determine which areas were used for training and which were validation/test patches: use a shapefile geojson (similar to those provided by USGS for their lidar tiles), or create a binary mask from a mosaiced validation dataset raster (using gdal_merge.py or similar approach) to overlay with different colors.
- For input/output DEMs and DEM differences, set resampling to Cubic for viewing purposes
- Hillshade - choose between GDAL (
gdaldem hillshade) and QGIS choice - For difference maps: in Symbology set Min and Max e.g. -1 m and +1 m, Color ramp red to blue
- Orthoimages may need to be stretched / range limited depending on the goal.
- Triangulation error with a small maximum value is helpful to indicate hotspots of larger intersection errors
- Copy and Paste Styles as desired to get consistent appearance & scales across layers
- It is useful to include other layers for context like the rasterized lidar timestamps and land cover classes
More:
- Compile statistics about output
- Change configuration of parameters/data/model/etc.
- Log metrics, or add my own?
- Add a new model, input, loss function, etc.
- Deal with temporal offset with my training data?
- ...
- ... coming soon ...