Skip to content

always add distinct_host_configuration=false to build command for TensorFlow#2459

Merged
boegel merged 1 commit intoeasybuilders:developfrom
Flamefire:tf_optarch
Jun 8, 2021
Merged

always add distinct_host_configuration=false to build command for TensorFlow#2459
boegel merged 1 commit intoeasybuilders:developfrom
Flamefire:tf_optarch

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

This fixes a failure when using optarch=False due to missing CPATH and friends while compiling e.g. protobuf files.

More info:
We pass env vars such as CPATH explicitely because Bazel clears the whole env. If you use distinct_host_configuration=true then bazel will have 2 envs: One for host compilation one for target compilation. We only pass stuff to the target env. Hence the host env is empty and the compiler can't find protobuf due to CPATH not being set

So optarch for TF is broken and likely always has been due to that flag. We can either remove the check which means optarch can now be used, although only for lower archs than the build node, or "fix" it by passing the env vars also to the Bazel host env.
But I think we don't support building for higher archs anyway, as e.g. all tests would fail to execute. Hence IMO we can remove the if and be done

…ation

This fixes a failure when using optarch=False due to missing CPATH and
friends while compiling e.g. protobuf files.
@boegel
Copy link
Copy Markdown
Member

boegel commented Jun 7, 2021

Without this change the installation of TensorFlow 2.5.0 is failing when I add 'optarch': False to toolchainopts, as follows:

In file included from ./tensorflow/python/framework/python_op_gen.h:22,
                 from tensorflow/python/framework/python_op_gen_main.cc:16:
bazel-out/host/bin/tensorflow/core/framework/op_def.pb.h:10:10: fatal error: google/protobuf/port_def.inc: No such file or directory
   10 | #include <google/protobuf/port_def.inc>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Reason is that the build environment for the host compilation is not properly set up unless --distinct_host_configuration=false is used...

@boegel
Copy link
Copy Markdown
Member

boegel commented Jun 7, 2021

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS TensorFlow-2.4.1-foss-2020b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3535.doduo.os - Linux RHEL 8.2, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/7d7ca644ca495f04e257a8caf5814f73 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Jun 8, 2021

I also tested this with easybuilders/easybuild-easyconfigs#12906 using 'optarch': False in toolchainopts on a Cascade Lake system, which is enough to make the tests pass there, so this is good to go.

@boegel boegel merged commit b9fcc09 into easybuilders:develop Jun 8, 2021
@Flamefire Flamefire deleted the tf_optarch branch June 8, 2021 06:52
@boegel boegel changed the title Always add add distinct_host_configuration=false to TensorFlow compilation Always add distinct_host_configuration=false to TensorFlow compilation Jul 5, 2021
@boegel boegel changed the title Always add distinct_host_configuration=false to TensorFlow compilation always add distinct_host_configuration=false to build command for TensorFlow Jul 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants