Skip to content

Handle failure of running nvidia-smi in TensorFlow tests#2506

Merged
ocaisa merged 3 commits intoeasybuilders:developfrom
Flamefire:fix-tf-wo-cuda
Jul 7, 2021
Merged

Handle failure of running nvidia-smi in TensorFlow tests#2506
ocaisa merged 3 commits intoeasybuilders:developfrom
Flamefire:fix-tf-wo-cuda

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

Correctly fall back to skipping GPU tests

Correctly fall back to skipping GPU tests
@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Jul 6, 2021

@surak Can you check if this works for you?

@surak
Copy link
Copy Markdown
Contributor

surak commented Jul 6, 2021

@surak Can you check if this works for you?

Testing now, TF takes a while to install on juwels

@surak
Copy link
Copy Markdown
Contributor

surak commented Jul 6, 2021

This test passes now

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

  • SUCCESS TensorFlow-2.4.1-fosscuda-2019b-Python-3.7.4.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusa4 - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz (broadwell), Python 2.7.5
See https://gist.github.com/16fade7ef357a2359942023d97895b94 for a full test report.

Copy link
Copy Markdown
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes are fine but it is takes a bit of digging to figure out why this was the necessary change. My digging said it was to avoid https://github.com/easybuilders/easybuild-framework/blob/develop/easybuild/tools/run.py#L574, I think this warrants a comment and I would still probably explicitly include log_all=False even though this is the default since that also features in the relevant line.

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Jul 7, 2021

I guess this is a very niche case where nvidia-smi is available but there are no visible GPUs

@Flamefire
Copy link
Copy Markdown
Contributor Author

@ocaisa Makes sense. Added.

@ocaisa ocaisa enabled auto-merge July 7, 2021 09:15
@ocaisa ocaisa merged commit 2c963a7 into easybuilders:develop Jul 7, 2021
@Flamefire Flamefire deleted the fix-tf-wo-cuda branch July 7, 2021 10:19
@migueldiascosta migueldiascosta added this to the next release (4.4.2?) milestone Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants