Skip to content

check whether nvidia-smi/rocm-smi command is available before trying to run it in get_gpu_info#4131

Merged
branfosj merged 1 commit intoeasybuilders:developfrom
Flamefire:gpu_info-fix
Dec 4, 2022
Merged

check whether nvidia-smi/rocm-smi command is available before trying to run it in get_gpu_info#4131
branfosj merged 1 commit intoeasybuilders:developfrom
Flamefire:gpu_info-fix

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

Currently the function calls run_cmd which throws on error AND checks the exit code which is redundant and causes a log message "ERROR EasyBuild crashed with an error ..." to be logged on error which is confusing as e.g it is VERY unlikely both nvidia-smi and rocm-smi are on the system.
So check for existance first and suppress output and error checking of run_cmd.

Currently the function calls `run_cmd` which throws on error AND checks
the exit code which is redundant and causes a log message
`"ERROR EasyBuild crashed with an error ..."` to be logged on error
which is confusing as e.g it is VERY unlikely both `nvidia-smi` and
`rocm-smi` are on the system.
So check for existance first and suppress output and error checking of `run_cmd`.
@branfosj branfosj added this to the next release (4.7.0) milestone Dec 4, 2022
@branfosj
Copy link
Copy Markdown
Member

branfosj commented Dec 4, 2022

Going in, thanks @Flamefire!

@branfosj branfosj merged commit 7150262 into easybuilders:develop Dec 4, 2022
@Flamefire Flamefire deleted the gpu_info-fix branch December 4, 2022 11:37
@boegel boegel changed the title Avoid exception in get_gpu_info check whether nvidia-smi/rocm-smi command is available before trying to run it in get_gpu_info Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants