Skip to content

Error in running pretrain because of torch.distributed #26

@tinaboya2023

Description

@tinaboya2023

Hi,
I install environment with below information
python=3.8
pytorch,cuda with command= conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
GPU= 1 geforce RTX 3090 (24 GPU-RAM)

I'm trying to run pretrain with below command
python -m torch.distributed.launch --nproc_per_node 1 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

but I encounter below code
3

Could you help me to resolve this problem?
Is this error because of using 1 GPU?
Do I need to change the initial value of a some parameter(like local_rank)?
Could the reason for this error be due to lack of GPU-memory?
It is very important to me to solve this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions