Hi, when I try to run train.sh on two GPUs (TITAN RTX) with 24GB VRAM, I get the following errors:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 23.65 GiB total capacity; 21.76 GiB already allocated; 24.75 MiB free; 21.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
RuntimeError: CUDA out of memory. Tried to allocate 148.00 MiB (GPU 0; 23.65 GiB total capacity; 21.18 GiB already allocated; 144.31 MiB free; 21.38 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I tried to reduce the GPU memory usage by changing some hyperparameters in v1-finetune_cocogit.yaml such as data.params.batch_size->1, data.params.num_workers->1, and model.params.first_stage_config.params.ddconfig.resolution->64, but the above errors still occur.
Is there any way to further reduce the GPU memory usage to run train.sh? Also, how much VRAM is required to run it with default hyperparameters?
Hi, when I try to run train.sh on two GPUs (TITAN RTX) with 24GB VRAM, I get the following errors:
I tried to reduce the GPU memory usage by changing some hyperparameters in v1-finetune_cocogit.yaml such as
data.params.batch_size->1,data.params.num_workers->1, andmodel.params.first_stage_config.params.ddconfig.resolution->64, but the above errors still occur.Is there any way to further reduce the GPU memory usage to run train.sh? Also, how much VRAM is required to run it with default hyperparameters?