Skip to content

Add support for IntelMPI in TensorFlow.#1507

Merged
boegel merged 8 commits intoeasybuilders:developfrom
akesandgren:update-tensorflow-for-impi
Sep 19, 2018
Merged

Add support for IntelMPI in TensorFlow.#1507
boegel merged 8 commits intoeasybuilders:developfrom
akesandgren:update-tensorflow-for-impi

Conversation

@akesandgren
Copy link
Copy Markdown
Contributor

We need a separate wrappet for mpiicc (and probably for mpicxx if
using non-Intel compilers) because IntelMPI needs I_MPI_ROOT to be set.

This patch only adds support for IntelMPI with Intel compilers.

Also simplify PATH setting for the wrappers.

We need a separate wrappet for mpiicc (and probably for mpicxx if
using non-Intel compilers) because IntelMPI needs I_MPI_ROOT to be set.

This patch only adds support for IntelMPI with Intel compilers.

Also simplify PATH setting for the wrappers.
@akesandgren akesandgren added this to the 3.7.0 milestone Sep 13, 2018
@boegel
Copy link
Copy Markdown
Member

boegel commented Sep 13, 2018

@akesandgren How does the problem that you're fixing here manifest itself when building TF with intel toolchain?

Comment thread easybuild/easyblocks/t/tensorflow.py Outdated

# patch configure.py (called by configure script) to avoid that Bazel abuses $HOME/.cache/bazel
# Should perhaps set --install_base and --output_user_root too
# bazel still uses $HOME/.cache/bazel for some things.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akesandgren open an issue for this, no point in keeping track of TODOs in comments (especially without mentioning TODO or FIXME ;-))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akesandgren Please remove these added comments and open an issue instead?

Comment thread easybuild/easyblocks/t/tensorflow.py Outdated
env.setvar('PATH', os.pathsep.join([icc_wrapper_dir, os.getenv('PATH')]))

if use_mpi:
mpiicc_wrapper_dir = os.path.join(tmpdir, 'bin2')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akesandgren Why create a separate wrapper dir?

I don't see the point of that, the icc and mpiicc wrappers can live in the same place?
Added benefit is that you only need to update $PATH once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper removes its own path from the environment so mpiicc won't be able to find the icc wrapper if they live in the same wrapper dir i think. Haven't properly tested that though.
But on the other hand it should inherit the env that mpiicc is using....

Need to test this...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, right, I forgot about that part... Better make that clear with a comment, and use a better name than bin2 ;)

Comment thread easybuild/easyblocks/t/tensorflow.py Outdated

icc_wrapper_txt = INTEL_COMPILER_WRAPPER % {
'compiler_path': which('icc'),
'intel_mpi_root': '',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akesandgren I would use a more meaningful value here, like UNKNOWN or IRRELEVANT, in case someone inspects the generated wrapper?


export CPATH='%(cpath)s'
export INTEL_LICENSE_FILE='%(intel_license_file)s'
export I_MPI_ROOT='%(intel_mpi_root)s'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also include a comment line above:

# only relevant for MPI compiler wrapper (mpiicc), not for regular compiler (icc)
export I_MPI_ROOT='%(intel_mpi_root)s'

Comment thread easybuild/easyblocks/t/tensorflow.py Outdated
'compiler_path': which('mpiicc'),
'intel_mpi_root': os.getenv('I_MPI_ROOT'),
'cpath': os.getenv('CPATH'),
'intel_license_file': os.getenv('INTEL_LICENSE_FILE', os.getenv('LM_LICENSE_FILE')),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use these in two places, we should grab them only once from the environment above:

wrapper_cpath = os.getenv('CPATH')
wrapper_intel_licfile = os.getenv('INTEL_LICENSE_FILE', os.getenv('LM_LICENSE_FILE'))

Comment thread easybuild/easyblocks/t/tensorflow.py Outdated
}
mpiicc_wrapper = os.path.join(mpiicc_wrapper_dir, 'mpiicc')
write_file(mpiicc_wrapper, mpiicc_wrapper_txt)
env.setvar('PATH', os.pathsep.join([mpiicc_wrapper_dir, os.getenv('PATH')]))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed if we let the wrapper share the same dir

Comment thread easybuild/easyblocks/t/tensorflow.py Outdated
if use_mpi:
impi_root = get_software_root('impi')
if impi_root:
mpi_home = os.path.join(impi_root, 'intel64')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth adding some debug logging here (only under use_mpi):

self.log.debug("Derived value for MPI_HOME: %s", mpi_home)

@akesandgren
Copy link
Copy Markdown
Contributor Author

WIll take a look at the comments tomorrow.

@akesandgren
Copy link
Copy Markdown
Contributor Author

Without the mpiicc wrapper and setting of I_MPI_ROOT I get this when linking:
/hpc2n/eb/software/Compiler/GCCcore/7.3.0/binutils/2.30/bin/ld.gold: error: cannot find -lmpifort
/hpc2n/eb/software/Compiler/GCCcore/7.3.0/binutils/2.30/bin/ld.gold: error: cannot find -lmpigi

The reason for that is, that without I_MPI_ROOT defined mpiicc passes the wrong -L args down to icc.

IntelMPI compiler wrappers need a TF/Bazel wrapper even for non-intel
compiler based toolchains.

Add wrappers for C++ compiler and MPI wrappers.

We only need a single wrapper directory, if the main compiler is an
MPI compiler the actual compiler will be called dierectly from inside
the MPI compiler wrapper and thus the environment will remain correct.
@akesandgren
Copy link
Copy Markdown
Contributor Author

akesandgren commented Sep 14, 2018

Think about a helper func looking like this:

    def write_wrapper(self, tmpdir, compiler, i_mpi_root)
        """Helper function to write a compiler wrapper."""
        wrapper_dir = os.path.join(tmpdir, 'bin')
        wrapper_txt = INTEL_COMPILER_WRAPPER % {
            'compiler_path': which(compiler),
            'intel_mpi_root': i_mpi_root,
            'cpath': os.getenv('CPATH'),
            'intel_license_file': os.getenv('INTEL_LICENSE_FILE', os.getenv('LM_LICENSE_FILE'))
            'wrapper_dir': wrapper_dir,
        }
        wrapper = os.path.join(wrapper_dir, compiler)
        write_file(wrapper, wrapper_txt)
        if self.dry_run:
            self.dry_run_msg("Wrapper for '%s' was put in place: %s", compiler, wrapper)
        else:
            adjust_permissions(wrapper, stat.S_IXUSR)
            self.log.info("Using wrapper script for '%s': %s", compiler, which(compiler))

Should that be called as:

self.write_wrapper(tmpdir, compiler, 'NOT-USED-WITH-ICC')

or some other way? (Still learning python :-)

And should i make the configure_step tmpdir variable into self.tmpdir instead?
and/or the other ones too?

Would still need wrapper_dir available outside of the function to handle the PATH extension, so perhaps have wrapper_dir as param instead of tmpdir?

Comment thread easybuild/easyblocks/t/tensorflow.py Outdated
use_mpi = self.toolchain.options.get('usempi', False)
mpi_home = ''
if use_mpi:
impi_root = get_software_root('impi')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akesandgren Since you're using this in two places in configure_step, why not combine this with the exact same check above to put the mpiicc wrapper in place?

use_mpi = self.toolchain.options.get('usempi', False)
impi_root = get_software_root('impi')
mpi_home = ''
if use_mpi and impi_root:
    # put wrappers for Intel MPI compiler wrappers in place
    ...
    # set correct value for MPI_HOME
    mpi_home = os.path.join(impi_root, 'intel64')

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@boegel
Copy link
Copy Markdown
Member

boegel commented Sep 19, 2018

Looks good, tested with several existing TensorFlow easyconfigs (none of which are using both intel toolchain & have usempi enabled though...), no regressions, so good to go.

@boegel boegel merged commit 3a4f33c into easybuilders:develop Sep 19, 2018
@akesandgren akesandgren deleted the update-tensorflow-for-impi branch September 19, 2018 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants