CUDA out of memory when running train.sh

Hi, when I try to run train.sh on two GPUs (TITAN RTX) with 24GB VRAM, I get the following errors:
> RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 23.65 GiB total capacity; 21.76 GiB already allocated; 24.75 MiB free; 21.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
> RuntimeError: CUDA out of memory. Tried to allocate 148.00 MiB (GPU 0; 23.65 GiB total capacity; 21.18 GiB already allocated; 144.31 MiB free; 21.38 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried to reduce the GPU memory usage by changing some hyperparameters in v1-finetune_cocogit.yaml such as `data.params.batch_size-`>1, `data.params.num_workers`->1, and `model.params.first_stage_config.params.ddconfig.resolution`->64, but the above errors still occur.
Is there any way to further reduce the GPU memory usage to run train.sh? Also, how much VRAM is required to run it with default hyperparameters?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory when running train.sh #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA out of memory when running train.sh #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions