Read the complete Ultralytics YOLOv5 🚀 documentation here.
Set up the coding environment by following the steps below.
git clone https://github.com/amanbasu/tree-throw-yolov5.git
pip install -r requirements.txttree-throw-yolov5
├── config-train.json # single place to store training script parameters
├── config-inference.json # configurations used during inference
├── train.py
├── val.py
├── data # folder to store data used for training/inference
│ ├── dem_files # create one folder per run/use-case
│ │ ├── in2017_01251380_12_0_0.tif
│ │ ├── ...
├── models # different model configurations for YOLO
│ ├── yolov5l.yaml
│ ├── yolov5m.yaml
│ ├── yolov5s.yaml
│ ├── ...
├── runs # stores training/inference results of YOLO
│ ├── train
│ │ ├── yolov5
│ │ │ ├── results.csv # training history / metric evolution
│ │ │ ├── opt.yaml # model hyperparameters and training arguments
│ │ │ ├── weights
│ │ │ │ ├── best.pt # checkpoint of the best model
│ ├── val
│ │ ├── yolov5
│ │ │ ├── metrics.txt # best validation metrics when you test the model
│ │ │ ├── labels # all model predictions as txt files
│ │ │ │ ├── in2017_01251380_12_0_0.txt
│ │ │ │ ├── ...
├── scripts # stores all the custom code for tree-throw
│ ├── generate_images.py
│ ├── generate_strided_labels.py
│ ├── rename_labels.py
│ ├── to_yolo_format.py
│ ├── box_to_coords.py- config-train.json: contains the training configurations.
test_images - provide the images that should belong to the test set, rest of the images are randomly split into training and validation set.
size - size of each crop that the model can handle (defaults to 400).
stride - stride size of the crops (defaults to 200).
image_path - folder path that stores all the images.
dataset_path - folder path that stores the files for YOLO.
(Note: please make sure thatdataset_pathdoesn't contain 'images' or 'labels' in its absolute path, or it would interfere with utils/dataloaders.py while reading the images/labels) - Label images using Roboflow.
a. generate high-pass crops from your images and label them using roboflow.
python generate_images.py --config config-train.json --save_jpg
(save slope or msrm instead of hpass by simply changing the channel index in the code)
b. upload these images to https://roboflow.com and label them.
c. export images after labeling and put them in thelabels/folder underdataset_path.
d. rename label filenames to maintain consistency.
python rename_labels.py --config config-train.json
(filename changes from in2017_01251380_12_0_1200_jpg.rf.744492ae1ec8a75db8501c71d139e30d.txt to in2017_01251380_12_0_1200.txt) - Prepare data for YOLO.
a. generate labels for overlapping crops by combining adjacent labels.
python generate_strided_labels.py --config config-train.json
b. generate tif images to be used as input by the model.
python generate_images.py --config config-train.json --train
c. convert to a format accepted by YOLO.
python to_yolo_format.py --config config-train.json --train
(splits the data into train/valid/test sets and prepares them to be used by YOLO)
Note: This code also prints out a path for the yaml file that you should copy for the next step. - Train YOLO.
a. copy the yaml file path from the previous step.
b. train YOLO.
python train.py --img 400 --batch 64 --epochs 600 --data <yaml-path> --name yolov5 --augment --cache --weights '' --cfg yolov5m.yaml
(change--datato the yaml path you copied and--nameto the folder where you want to save the weights)
checkout docs for multi-gpu training. - Test YOLO.
Run this once the training ends.
python val.py --weights runs/train/yolov5/weights/best.pt --data <yaml-path> --img 400 --name yolov5 --save-txt --task test --conf-thres 0.1 --single-cls --save-conf
(change--datato the yaml path you copied earlier)
Example:
config-train.json
{
"test_images": [
"in2017_01251385_12",
"in2017_01251420_12",
"in2017_01251435_12",
"in2017_01601290_12",
"in2017_01601300_12",
"in2018_30901370_12"
],
"size": 400,
"stride": 200,
"image_path": "data/dem_files/",
"dataset_path": "dataset-train/"
}#!/bin/bash
cd scripts
python generate_strided_labels.py --config config-train.json
python generate_images.py --config config-train.json --train
python to_yolo_format.py --config config-train.json --train
cd ..
python -m torch.distributed.run --nproc_per_node 4 train.py --img 400 --batch 256 --epochs 600 --data dataset-train/TreeThrow.yaml --name yolov5 --augment --cache --weights '' --cfg yolov5m.yaml --device 0,1,2,3
python val.py --weights runs/train/yolov5/weights/best.pt --data dataset-train/TreeThrow.yaml --img 400 --name yolov5 --save-txt --task test --conf-thres 0.1 --single-cls --save-confConsidering that now you have got a trained model, you can use it to predict tree throws.
- config-inference.json: contains the inference configurations.
size - size of each crop that the model can handle.
image_path - folder path that stores all the inference images.
dataset_path - folder path that stores the files for YOLO. - Prepare data for YOLO.
b. generate tif images to be used as input by the model.
python generate_images.py --config config-inference.json
c. convert to YOLO format.
python to_yolo_format.py --config config-inference.json
(prepares test images to be used by YOLO)
Note: This code also prints out the path for the yaml file that you should copy for the next step. - Infer through YOLO.
python val.py --weights <model-weight> --data <yaml-path> --img 400 --name <folder-name> --save-txt --task test --conf-thres 0.1 --single-cls --save-conf
(change--datato the yaml path you copied earlier,--weightsto the path of the model weights and--nameto the folder where you want to save the labels)
The labels will be saved atruns/val/<folder-name>/labels/ - Post-process the labels.
Consolidate the labels for all the crops and convert relative coordinates to absolute coordinates by using the geo-tagged source file.
python box_to_coords.py --config config-inference.json --one_file
(use argument--save_hpassto save a geo-referenced hpass image of the source file, use--one_fileto save all predictions in a single file)
Example:
config-inference.json
{
"size": 400,
"image_path": "data/BrownCounty2017/",
"dataset_path": "dataset-brown-county/",
"pred_path": "runs/val/brown_county_test/labels",
"conf_thres": 0.32
}#!/bin/bash
cd scripts
python generate_images.py --config config-inference.json
python to_yolo_format.py --config config-inference.json
cd ..
python val.py --weights runs/train/yolov5/weights/best.pt --data dataset-brown-county/TreeThrow.yaml --img 400 --name brown_county_test --save-txt --task test --conf-thres 0.1 --single-cls --save-conf
cd scripts
python box_to_coords.py --config config-inference.json --one_fileChange model hyperparameters
Important script arguments
--img: input image size
--batch: batch size
--epochs: number of epochs for training
--cfg: path to the model configuration file (yolov5m.yaml performed the best)
--weights: path to a pre-trained weight file (use '' to train from scratch)
--data: path to the yaml file used for reading the data
--hyp: path to the hyperparameter file (default to data/hyps/hyp.tree-throw.yaml)
--resume: this flag resumes training from the last checkpoint if stopped before the specified number of epochs
--evolve: used for hyperparameter optimization/tuning
--cache: increases the training speed by caching data
--optimizer: select between 'SGD', 'Adam', and 'AdamW'
--augment: use data augmentation while training
Important script arguments
--weights: path to the best model weights
--data: path to the yaml file used for reading the data
--img: input image size
--save-txt: use this argument to save predicted labels in txt files
--conf-thres: confidence threshold for predictions, predictions < conf-thres would be ignored
--iou-thres: iou threshold for non-max supression. If two predictions have iou > iou-thres (overlap), the one with lower confidence score would be ignored
--single-cls: treat the problem as a single class problem
--save-conf: save confidence scores with boxes
--max-det: maximum number of detections per image (defaults to 300)
This is the medium sized YOLOv5 model that seemed to work best when trained from scratch. Yolov8 models could not train well when trained from scratch. Therefore, only the pre-trained Yolov8 models are compared below.
| Model | Precision | Recall | F1 | mAP50 | mAP50-95 |
|---|---|---|---|---|---|
| Yolo5 | |||||
| Yolov5s | 0.831 | 0.820 | 0.826 | 0.895 | 0.443 |
| Yolov5s (scratch) | 0.839 | 0.815 | 0.827 | 0.902 | 0.446 |
| Yolov5m | 0.854 | 0.820 | 0.837 | 0.907 | 0.459 |
| Yolov5m (scratch) | 0.855 | 0.830 | 0.843 | 0.910 | 0.461 |
| Yolov5l | 0.853 | 0.823 | 0.838 | 0.909 | 0.458 |
| Yolov5l (scratch) | 0.853 | 0.822 | 0.837 | 0.910 | 0.459 |
| Yolov8 | |||||
| Yolov8s | 0.779 | 0.798 | 0.788 | 0.817 | 0.433 |
| Yolov8m | 0.774 | 0.801 | 0.787 | 0.815 | 0.434 |
| Yolov8l | 0.767 | 0.790 | 0.778 | 0.807 | 0.430 |
| Yolov8x | 0.766 | 0.821 | 0.792 | 0.811 | 0.434 |
Augmentations used for training (read more about them here https://albumentations.ai).
Blur: prob 0.1
ToGray: prob 0.1 (merges all channel data)
CLAHE: prob 0.1 (applies Contrast Limited Adaptive Histogram Equalization to the image)
RandomBrightnessContrast: prob 0.1
RandomGamma: prob 0.1
HorizontalFlip: prob 0.25
VerticalFlip: prob 0.25
PixelDropout: prob 0.1 (zeros out random pixels)
RandomRotate90: prob 0.25
Sharpen: prob 0.1
GaussNoise: prob 0.1
ISONoise: prob 0.1
Note: no image normalization is done in YOLOv5, however, we normalize all images between 0-255 before saving to tif (check scripts/generate_images.py)
read_tif(): function added to read the tif files. It reads high-pass, slope, and msrm rasters as three separate channels.img2label_paths(): reads the images and then tries to find the corresponding label file. If the label file is not found, it is assumed that there are no boxes.class LoadImagesAndLabels: loads the images and labels and checks if they are properly formatted. Some of the code has been removed to support tif files.
check_cache_ram(): commented out the code that loads images and checks for the cache size. A constant size of n * 480000 is given. Doesn't affect anything, just avoids unnecessary exceptions.
__getitem__(): commented outaugment_hsv()as we dont have RGB channels. Flip up-down and left-right is controlled in utils/augmentations.py.
load_image(): removed the openCV code that deals with RGB images.verify_image_label(): commented out the code that verifies RGB images.
| Input rasters | Precision | Recall | F1 | mAP50 | mAP50-95 |
|---|---|---|---|---|---|
| hpass | 0.791 | 0.746 | 0.767 | 0.836 | 0.381 |
| slope | 0.783 | 0.761 | 0.772 | 0.834 | 0.391 |
| msrm | 0.731 | 0.711 | 0.721 | 0.766 | 0.317 |
| hpass + slope | 0.847 | 0.803 | 0.825 | 0.895 | 0.435 |
| slope + msrm | 0.811 | 0.779 | 0.795 | 0.863 | 0.386 |
| hpass + msrm | 0.687 | 0.722 | 0.704 | 0.735 | 0.300 |
| hpass + slope + msrm | 0.855 | 0.830 | 0.843 | 0.910 | 0.461 |
This table shows the performance of YOLOv5m model when trained on different input rasters. All models were trained from scratch for 600 epochs with a batch size of 256 and default hyperparameters. To disable a channel, simply replace the values of array with 0 in read_tif() function. Example:
# returns only hpass channel
def read_tif(filename):
# channel 0 - hpass, 1 - slope, 2 - msrm
im = tifffile.imread(filename)
im[:, :, 1:] = 0
return imContains the fitness() function that is used for evaluating the model's performance. The default one works fine for us.
Contains code for all the loss functions, interesting to see how they are implemented.
