You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Edit: Added another fix due to an x86-only binary python package used which cannot be built from source due to a cyclic dependency: tensorflow/tensorflow#56636
Also the libclang python package is essentially a binary package and hence may also be missing on some architectures (like PPC), so remove that too as it is not actually required (yet): tensorflow/tensorflow@c211472
Test report by @Flamefire SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
taurusi8008 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/ce65fa04f5441106b533b5314d7a95eb for a full test report.
Flamefire
changed the title
TensorFlow: Exclude (flaky) fault_tolerance_test
TensorFlow: Exclude (flaky) fault_tolerance_test and fix non-x86 build
Jul 25, 2022
Test report by @Flamefire SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
taurusa11 - Linux CentOS Linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz (broadwell), 3 x NVIDIA GeForce GTX 1080 Ti, 460.32.03, Python 2.7.5
See https://gist.github.com/762339f9cca2ed1fd13f59c6c9235dde for a full test report.
boegel
changed the title
TensorFlow: Exclude (flaky) fault_tolerance_test and fix non-x86 build
exclude (flaky) fault_tolerance_test and fix non-x86 build for TensorFlow 2.7.1
Aug 3, 2022
Test report by @boegel FAILED
Build succeeded for 3 out of 5 (2 easyconfigs in total)
fair-mastodon-c6g-2xlarge-0001 - Linux Rocky Linux 8.5, AArch64, ARM UNKNOWN (graviton2), Python 3.6.8
See https://gist.github.com/0f433c570e879e352ed0e088075b4853 for a full test report.
edit: this test was mostly out of curiosity, since the PR title mentioned non-x86; it's not a blocker for this PR (since TensorFlow doesn't even seem to build on aarch64...)
Test report by @boegel SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3307.joltik.os - Linux RHEL 8.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.73.08, Python 3.6.8
See https://gist.github.com/3c7a276f15326617902afa333888dad1 for a full test report.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(created using
eb --new-pr)This test fails for me on an AMD EPYC system with 8 A100 GPUs. Both the CUDA and non-CUDA ECs fail. TF 2.6.0 is fine.
See tensorflow/tensorflow#56717 for the upstream issue and tensorflow/tensorflow@c08fda5 which disables the test with commit message
Hence I think this is safe to disable.
Edit: Added another fix due to an x86-only binary python package used which cannot be built from source due to a cyclic dependency: tensorflow/tensorflow#56636
Also the libclang python package is essentially a binary package and hence may also be missing on some architectures (like PPC), so remove that too as it is not actually required (yet): tensorflow/tensorflow@c211472