Skip to content

add support for collecting GPU info (via nvidia-smi), and include it in --show-system-info and test report#3851

Merged
boegel merged 7 commits intoeasybuilders:developfrom
branfosj:gpuinfo
Sep 29, 2021
Merged

add support for collecting GPU info (via nvidia-smi), and include it in --show-system-info and test report#3851
boegel merged 7 commits intoeasybuilders:developfrom
branfosj:gpuinfo

Conversation

@branfosj
Copy link
Copy Markdown
Member

@branfosj branfosj commented Sep 28, 2021

closes #3825

Add GPU info to

  • eb --show-system-info
  • the PR post for a test report

Only supports NVIDIA GPUs on linux as that is all I have.

@boegelbot

This comment has been minimized.

@boegel boegel added this to the next release (4.5.0?) milestone Sep 28, 2021
@boegel boegel changed the title GPU Info add support for collecting GPU info (via nvidia-smi), and include it in --show-system-info and test report Sep 28, 2021
Comment thread easybuild/tools/systemtools.py Outdated
gpu_info['NVIDIA'][line] += 1
else:
gpu_info['NVIDIA'] = {}
gpu_info['NVIDIA'][line] = 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setdefault is your friend here, no need for the if/else:

for line in out.strip().split('\n'):
    nvidia_gpu_info = gpu_info.setdefault('NVIDIA', {})
    nvidia_gpu_info.setdefault(line, 0)
    nvidia_gpu_info[line] += 1

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done in 2e34420

Comment thread easybuild/tools/systemtools.py
Comment thread easybuild/tools/systemtools.py Outdated
gpu_info['NVIDIA'] = {}
gpu_info['NVIDIA'][line] = 1
except Exception:
_log.debug("No NVIDIA GPUs detected")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log which exception was hit?

except Exception as err:
    _log.debug("Exception was raised when running nvidia-smi: %s", err)
    _log.info("No NVIDIA GPUs detected")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2e34420

Comment thread easybuild/tools/systemtools.py Outdated
except Exception:
_log.debug("No NVIDIA GPUs detected")
else:
_log.debug("Only know how to get GPU info on Linux")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_log.info("Only know how to get GPU info on Linux, assuming no GPUs are present")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread easybuild/tools/systemtools.py
@branfosj
Copy link
Copy Markdown
Member Author

Example PR test message at easybuilders/easybuild-easyconfigs#14069 (comment)

Comment thread easybuild/tools/testing.py Outdated
Copy link
Copy Markdown
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor additional suggested changes in branfosj#4

With that included, this is good to imho.

Unless we want to go through the effort of extending the framework tests with a fake nvidia-smi command that produces output to be processed by get_gpu_info?

tweak GPU part of output produced by --show-system-info
@boegel
Copy link
Copy Markdown
Member

boegel commented Sep 29, 2021

Thanks a lot @branfosj!

@boegel boegel merged commit b276975 into easybuilders:develop Sep 29, 2021
@branfosj branfosj deleted the gpuinfo branch September 29, 2021 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Report available GPUs and NVIDIA driver version in PR test reports

4 participants