Skip to content

enhance TensorFlow easyblock to support installing TF 1.14.0 with CUDA and MPI support#1811

Merged
akesandgren merged 3 commits intoeasybuilders:developfrom
boegel:TF114_strip_CUDA_cuDNN_versions
Sep 26, 2019
Merged

enhance TensorFlow easyblock to support installing TF 1.14.0 with CUDA and MPI support#1811
akesandgren merged 3 commits intoeasybuilders:developfrom
boegel:TF114_strip_CUDA_cuDNN_versions

Conversation

@boegel
Copy link
Copy Markdown
Member

@boegel boegel commented Sep 20, 2019

This fixes the following issue when trying to install TensorFlow 1.14.0 with CUDA 10.1.243:

Could not find any cuda.h matching version '10.1.243' in any subdirectory

Changes were made in TensorFlow 1.14.0 that require both $TF_CUDA_VERSION and $TF_CUDNN_VERSION to only include the correct number of digits, so 10.1 rather than 10.1.243 for CUDA.
In older TensorFlow versions, the value specified in $TF_CUDA_VERSION was stripped down first to just <major>.<minor>, but this got lost in TF 1.14.0.
For cuDNN, it expects <major>.<minor>.<patch> (so 3, not 4 as in the cuDNN version we use).

It turns out that we've been doing it incorrectly all along, since even with these changes installing TensorFlow-1.8.0-fosscuda-2017b-Python-2.7.14.eb still works fine...

…w easyblock to issue with installation of TensorFlow 1.14.0 with CUDA support
@boegel boegel added the bug fix label Sep 20, 2019
@boegel boegel added this to the next release milestone Sep 20, 2019
@boegel boegel requested a review from akesandgren September 20, 2019 11:52
@boegel boegel changed the title strip CUDA & cuDNN versions to expected number of digits in TensorFlow easyblock to issue with installation of TensorFlow 1.14.0 with CUDA support enhance TensorFlow easyblock to support installing TF 1.14.0 with CUDA and MPI support Sep 21, 2019
})

# for recent TensorFlow versions, $TF_CUDA_PATHS and $TF_CUBLAS_VERSION must also be set
if LooseVersion(self.version) > LooseVersion('1.14'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more intuitive for me:

if LooseVersion(self.version) >= LooseVersion('1.14.0'):

Other than that lgtm!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damianam Good point here actually, this should be >= to avoid confusion/surprises...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #1816

Copy link
Copy Markdown
Contributor

@akesandgren akesandgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akesandgren akesandgren dismissed damianam’s stale review September 26, 2019 13:41

Code does the right thing as is. Dismissing so it can be merged.

@akesandgren
Copy link
Copy Markdown
Contributor

Going in, thanks @boegel!

@akesandgren akesandgren merged commit 60ef323 into easybuilders:develop Sep 26, 2019
@boegel boegel deleted the TF114_strip_CUDA_cuDNN_versions branch September 26, 2019 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants