Skip to content

Multiple updates to Tensorflow easyblock#1453

Merged
boegel merged 7 commits intoeasybuilders:developfrom
akesandgren:update-tensorflow
Aug 15, 2018
Merged

Multiple updates to Tensorflow easyblock#1453
boegel merged 7 commits intoeasybuilders:developfrom
akesandgren:update-tensorflow

Conversation

@akesandgren
Copy link
Copy Markdown
Contributor

@akesandgren akesandgren commented Jul 6, 2018

  • Make cuDNN a strict requirement when CUDA is enabled.
  • Handle building -rc versions of TernsorFlow.
  • mkl-dnn can be built with cuDNN enabled without problems.
  • Enable mkl-dnn by default.
  • Add support for v1.10
  • Add TensorRT support (WIP)
  • Add NCCL support (WIP)

Solves issue #1445

# enable mkl-dnn by default, but only if cuDNN is not listed as dependency
if self.cfg['with_mkl_dnn'] is None and get_software_root('cuDNN') is None:
self.log.info("Enabling use of mkl-dnn since cuDNN is not listed as dependency")
self.cfg['with_mkl_dnn'] = True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akesandgren Can you clarify this? Can both be used together? There must have been a reason why did this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did this because i originally thought there would be a problem, but I later tried and they can be compiled in at the same time.
So it's basically removing a bit of code that shouldn't have been there in the first place.

@boegel boegel changed the title Multiple updates to tensorflow easyblock Multiple updates to Tensorflow easyblock Jul 7, 2018
@akesandgren
Copy link
Copy Markdown
Contributor Author

Verified to work with old TensorFlow-1.5.0-foss-2017b-Python-3.6.3.eb

This might really be a bug in protobuf-python,
protocolbuffers/protobuf#1296
@akesandgren
Copy link
Copy Markdown
Contributor Author

Tested with TensorFlow-1.5.0-foss-2017b-Python-3.6.3.eb and easybuilders/easybuild-easyconfigs#6677 (TensorFlow-1.10.0-fosscuda-2018b-Python-2.7.15.eb)

'NCCL_INSTALL_PATH': nccl_root,
})
else:
nccl_version = '1.3' # Use simple downloadable version
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akesandgren Where did you get this 1.3 from? Isn't this something that will vary across TF versions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.3 is the fallback version if there is no external NCCL v2.x installed (requires manual download).
1.3 can be auto-downloaded and is the version they mention in all TF versions from 1.4, i think, as the default if no external NCCL is available.

But, from TF 1.10 they have changed the Q&A stuff so one now must specify it.

@boegel
Copy link
Copy Markdown
Member

boegel commented Aug 13, 2018

@akesandgren I've tested this with a whole bunch of existing TensorFlow easyconfigs, basically all that i) use this easyblock and ii) don't use CUDA (since we don't have a GPU system that I can test on), i.e. all of these:

  • TensorFlow-1.4.0-foss-2017b-Python-3.6.3.eb
  • TensorFlow-1.4.0-intel-2017b-Python-3.6.3.eb
  • TensorFlow-1.4.1-foss-2017b-Python-3.6.3.eb
  • TensorFlow-1.5.0-foss-2017b-Python-3.6.3.eb
  • TensorFlow-1.5.0-intel-2017b-Python-3.6.3.eb
  • TensorFlow-1.6.0-foss-2018a-Python-3.6.4.eb
  • TensorFlow-1.6.0-intel-2018a-Python-3.6.4.eb
  • TensorFlow-1.7.0-foss-2018a-Python-3.6.4.eb
  • TensorFlow-1.8.0-foss-2018a-Python-3.6.4.eb
  • TensorFlow-1.8.0-intel-2018a-Python-3.6.4.eb

Didn't see any problems, so unless you're planning further changes, this looks good to go?

@akesandgren
Copy link
Copy Markdown
Contributor Author

No more changes in sight, gfood to go from my side.

@boegel boegel merged commit c7013c5 into easybuilders:develop Aug 15, 2018
@akesandgren akesandgren deleted the update-tensorflow branch August 15, 2018 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants