Skip to content

DOC: Simd optimization documentation#15551

Merged
seberg merged 18 commits intonumpy:masterfrom
mattip:simd-optimization
Jul 12, 2020
Merged

DOC: Simd optimization documentation#15551
seberg merged 18 commits intonumpy:masterfrom
mattip:simd-optimization

Conversation

@mattip
Copy link
Member

@mattip mattip commented Feb 12, 2020

@seiko2plus I reformatted your document and made it into a separate PR here as rst.

@mattip mattip force-pushed the simd-optimization branch from b27f6e5 to 20b9e23 Compare June 19, 2020 12:47
@mattip mattip added the triage review Issue/PR to be discussed at the next triage meeting label Jul 1, 2020
@mattip mattip changed the title WIP: Simd optimization documentation DOC: Simd optimization documentation Jul 1, 2020
@seberg seberg self-assigned this Jul 1, 2020
@rossbar rossbar added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Jul 3, 2020
Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @seiko2plus and @mattip for putting this together - it's an interesting read!

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, mostly nitpicky stuff, so I think we can just fix it up slightly and merge, and then iterate. The first part explaining how to build (which is read by most users) seems pretty clear to me. When it comes to the implementation side and infrastructure it was much less clear to me to be honest.

If it helps a lot, I can try to make one type-edit style pass on it, similar to what Ross did. I had started before ross did his pass, so its possible some of the things do not apply anymore.

******************

NumPy provides a set of macros that define `Universal Intrinsics`_ to provide
abstract out typical platform-specific intrinsics so SIMD code needs to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"abstract versions of" or "abstractions of"?

The default value for ``x86`` is ``max -xop -fma4`` which enables all CPU
features, except for AMD legacy features.

The command arguments are available in ``build``, ``build_clib``, ``build_ext``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The command arguments are available in ``build``, ``build_clib``, ``build_ext``.
These two options are available in ``build``, ``build_clib``, ``build_ext``.

features, except for AMD legacy features.

The command arguments are available in ``build``, ``build_clib``, ``build_ext``.
if ``build_clib`` or ``build_ext`` are not specified by the user, the arguments of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ``build_clib`` or ``build_ext`` are not specified by the user, the arguments of
If ``build_clib`` or ``build_ext`` are not specified by the user, the arguments of

special options perform a series of procedures.


The following tables show the current supported optimizations sorted from the lowest to the highest interest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following tables show the current supported optimizations sorted from the lowest to the highest interest.
The following tables show the current supported optimizations sorted from lowest to highest.

:align: left

====================================== =======================================
For Arch Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For Arch Returns
For Architecture Returns

#undef NPY__CPU_DISPATCH_BASELINE_CALL
#undef NPY__CPU_DISPATCH_CALL
// nothing strange here, just a normal preprocessor callback
// enabled only if 'baseline' spesfied withiin the configration statments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// enabled only if 'baseline' spesfied withiin the configration statments
// enabled only if 'baseline' is specified in the configration statments

// the addtional optimizations, so it could be SSE42 or AVX512F
#define CURRENT_TARGET(X) NPY_CAT(NPY_CAT(X, _), NPY__CPU_TARGET_CURRENT)
#endif
// Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols,
// Macro 'CURRENT_TARGET' adding the current target as suffix to the exported symbols,

// 'NPY__CPU_DISPATCH_BASELINE_CALL'.
// it highely recomaned to include the config header before exectuing
// the dispatching macros in case if there's another header in the scope.
#include "hello.dispatch.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this is the clean usage to include the dispatch header in the function scope?

I guess the comment means to point to it, but I find it a bit hard to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's safe and clean to add it inside any level of scope, e.g. functions.
This header only contains two abstract C macros that mainly used for determining the required optimizations from outside the dispatch-able sources.

see also npy_cpu_dispatch.h and

if t.dispatch is not None:
for dname in t.dispatch:
code2list.append(textwrap.dedent("""\
#ifndef NPY_DISABLE_OPTIMIZATION
#include "{dname}.dispatch.h"
#endif
NPY_CPU_DISPATCH_CALL_XB({name}_functions[{k}] = {tname}_{name})
""").format(
dname=dname, name=name, tname=tname, k=k
))

// However in this example, we just handle it manually.
void simd_whoami(const char *extra_info);
void simd_whoami_AVX512F(const char *extra_info);
void simd_whoami_SSE41(const char *extra_info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simd_whoami function is just a dummy definition as an example?

Copy link
Member

@seiko2plus seiko2plus Jul 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. what I was trying to do is explaining what happening under the hood, but now I feel like I need to move these examples into an advanced topic and adding instead examples for how to use the CPU dispatcher via high-level macros in npy_cpu_dispatch.h without giving too many details.

Now assume you attached **hello.dispatch.c** to the source tree, then
the infrastructure should generate a temporary config header called
**hello.dispatch.h** that can be reached by any source in the source
tree, and it should contain the following code :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tree, and it should contain the following code :
tree, and will contain the following code:

@seiko2plus
Copy link
Member

@seberg, I would like to merge this patch as-is and then I'm going to release a series pull-requests to improve it. can we just consider it as a seed doc?

@seberg
Copy link
Member

seberg commented Jul 12, 2020

OK, works for me, you can just use whatever suggestions in any followup. Lets put it in, thanks @seiko2plus.

@seberg seberg merged commit b234742 into numpy:master Jul 12, 2020
@mattip mattip added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 21, 2020
@mattip mattip deleted the simd-optimization branch November 2, 2020 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

04 - Documentation component: SIMD Issues in SIMD (fast instruction sets) code or machinery triaged Issue/PR that was discussed in a triage meeting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants