Skip to content

TensorFlow 1.6.0 fix: use the absolute path for the C compiler when compiling with GPU support#1386

Merged
boegel merged 2 commits intoeasybuilders:developfrom
smoors:tensorflow
Mar 19, 2018
Merged

TensorFlow 1.6.0 fix: use the absolute path for the C compiler when compiling with GPU support#1386
boegel merged 2 commits intoeasybuilders:developfrom
smoors:tensorflow

Conversation

@smoors
Copy link
Copy Markdown
Contributor

@smoors smoors commented Mar 14, 2018

This is needed for the bazel build command with the option --config=cuda
Inspiration for this fix was found here:
tensorflow/tensorflow#14380

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 14, 2018

Hmm, I think this reverts something that @akesandgren did as a part of the sprint during the EasyBuild User Meeting in Amsterdam (see commit history in #1287)...

Why didn't this pop up earlier? Is this specific to TF 1.6.0?

@boegel boegel added the bug fix label Mar 14, 2018
@boegel boegel added this to the 3.6.0 milestone Mar 14, 2018
@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 14, 2018

Also, what happens when $CC actually points to a wrapper for the compiler, for example when --use-ccache or --rpath is used?

@akesandgren
Copy link
Copy Markdown
Contributor

It doesn't touch the things i added as far as i can see.

I don't really see how doing this change would make a difference.

What is the actual problem or error output from not doing this?

@smoors
Copy link
Copy Markdown
Contributor Author

smoors commented Mar 14, 2018

@akesandgren this is the error I get.
it complains about some "missing" header files, but they are present.
however, the /apps/brussel dir is a symlink.
according to a commenter on the link I posted above, "It usually boils down to crosstool referencing files via symlink, while compilers resolve the symlinks when outputting the .d files and vice versa."
this fix was not necessary with TensorFlow 1.5.0. it could be due to a different Bazel version (0.11.0 vs. 0.7.0).
this is all a great mystery to me but as Ward said, "with Tensorflow, you can expect anything".

ERROR: /tmp/smoors/eb-8JIy9O/tmp0ArazE-bazel-build/external/flatbuffers/BUILD:52:1: undeclared inclusion(s) in rule '@flatbuffers//:flatc_library':
this rule is missing dependency declarations for the following files included by 'external/flatbuffers/src/util.cpp':
  '/apps/brussel/CO7/ivybridge-ib/software/GCCcore/6.4.0/lib/gcc/x86_64-pc-linux-gnu/6.4.0/include/stdarg.h'
  '/apps/brussel/CO7/ivybridge-ib/software/GCCcore/6.4.0/lib/gcc/x86_64-pc-linux-gnu/6.4.0/include/stddef.h'
  '/apps/brussel/CO7/ivybridge-ib/software/GCCcore/6.4.0/lib/gcc/x86_64-pc-linux-gnu/6.4.0/include/stdint.h'
  '/apps/brussel/CO7/ivybridge-ib/software/GCCcore/6.4.0/lib/gcc/x86_64-pc-linux-gnu/6.4.0/include-fixed/limits.h'
  '/apps/brussel/CO7/ivybridge-ib/software/GCCcore/6.4.0/lib/gcc/x86_64-pc-linux-gnu/6.4.0/include-fixed/syslimits.h'^M
Target //tensorflow/tools/pip_package:build_pip_package failed to build

@smoors
Copy link
Copy Markdown
Contributor Author

smoors commented Mar 14, 2018

@boegel concerning compiler wrappers, it would indeed be good if someone tested that.

@akesandgren
Copy link
Copy Markdown
Contributor

There might be another way to solve this, but i have to look at that later today.

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 15, 2018

@smoors I ran into similar issues while developing the TensorFlow easyblock, but I didn't see these problems anymore with the easyblock we ended up with in #1287...

Comment thread easybuild/easyblocks/t/tensorflow.py Outdated
config_env_vars.update({
'CUDA_TOOLKIT_PATH': cuda_root,
'GCC_HOST_COMPILER_PATH': which(os.getenv('CC')),
'GCC_HOST_COMPILER_PATH': os.path.realpath(which(os.getenv('CC'))),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smoors Please use the resolve_path function from filetools (which has error handling)

@akesandgren
Copy link
Copy Markdown
Contributor

Doing a build of 1.6.0 with goolfc, will report back.

@akesandgren
Copy link
Copy Markdown
Contributor

I have a different change that at least solved the problem on our system.
@smoors can you please try with this fix instead?

diff -u /hpc2n/eb/software/Core/EasyBuild/3.5.3/lib/python2.7/site-packages/easybuild_easyblocks-3.5.3-py2.7.egg/easybuild/easyblocks/t/tensorflow.py /hpc2n/eb/custom/easyblocks/tensorflow.py 
--- /hpc2n/eb/software/Core/EasyBuild/3.5.3/lib/python2.7/site-packages/easybuild_easyblocks-3.5.3-py2.7.egg/easybuild/easyblocks/t/tensorflow.py       2018-03-08 08:23:43.000000000 +0100
+++ /hpc2n/eb/custom/easyblocks/tensorflow.py   2018-03-15 14:04:20.000000000 +0100
@@ -191,10 +191,11 @@
 
         # fix hardcoded locations of compilers & tools
         cxx_inc_dir_lines = '\n'.join(r'cxx_builtin_include_directory: "%s"' % resolve_path(p) for p in inc_paths)
+        cxx_inc_dir_lines_no_resolv_path = '\n'.join(r'cxx_builtin_include_directory: "%s"' % p for p in inc_paths)
         regex_subs = [
             (r'-B/usr/bin/', '-B%s/ %s' % (binutils_bin, ' '.join('-L%s/' % p for p in lib_paths))),
             (r'(cxx_builtin_include_directory:).*', ''),
-            (r'^toolchain {', 'toolchain {\n' + cxx_inc_dir_lines),
+            (r'^toolchain {', 'toolchain {\n' + cxx_inc_dir_lines + cxx_inc_dir_lines_no_resolv_path),
         ]
         for tool in ['ar', 'cpp', 'dwp', 'gcc', 'gcov', 'ld', 'nm', 'objcopy', 'objdump', 'strip']:
             path = which(tool)

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 15, 2018

@akesandgren You should probably make that cxx_inc_dir_lines + '\n' + cxx_inc_dir_lines_no_resolv_path to avoid glueing the last & first entry together?

'\n'.join will only add newlines in between element, not at the end

@smoors
Copy link
Copy Markdown
Contributor Author

smoors commented Mar 15, 2018

@akesandgren ok, will test.
do you think it is really necessary to have both the "resolved" and "unresolved" paths?

@akesandgren
Copy link
Copy Markdown
Contributor

@boegel forgot that (again...)
@smoors Yes it is. The resolved paths was what we added to get 1.5.0 building. Apparently something changed so we now need both.

@smoors
Copy link
Copy Markdown
Contributor Author

smoors commented Mar 16, 2018

the solution of @akesandgren seems to work and is probably safer wrt wrappers, so let's use it.

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 19, 2018

Thanks @smoors and @akesandgren!

@boegel boegel merged commit 6003720 into easybuilders:develop Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants