DOC: Simd optimization documentation#15551
Conversation
b27f6e5 to
20b9e23
Compare
rossbar
left a comment
There was a problem hiding this comment.
Thanks @seiko2plus and @mattip for putting this together - it's an interesting read!
seberg
left a comment
There was a problem hiding this comment.
Thanks, mostly nitpicky stuff, so I think we can just fix it up slightly and merge, and then iterate. The first part explaining how to build (which is read by most users) seems pretty clear to me. When it comes to the implementation side and infrastructure it was much less clear to me to be honest.
If it helps a lot, I can try to make one type-edit style pass on it, similar to what Ross did. I had started before ross did his pass, so its possible some of the things do not apply anymore.
| ****************** | ||
|
|
||
| NumPy provides a set of macros that define `Universal Intrinsics`_ to provide | ||
| abstract out typical platform-specific intrinsics so SIMD code needs to be |
There was a problem hiding this comment.
"abstract versions of" or "abstractions of"?
| The default value for ``x86`` is ``max -xop -fma4`` which enables all CPU | ||
| features, except for AMD legacy features. | ||
|
|
||
| The command arguments are available in ``build``, ``build_clib``, ``build_ext``. |
There was a problem hiding this comment.
| The command arguments are available in ``build``, ``build_clib``, ``build_ext``. | |
| These two options are available in ``build``, ``build_clib``, ``build_ext``. |
| features, except for AMD legacy features. | ||
|
|
||
| The command arguments are available in ``build``, ``build_clib``, ``build_ext``. | ||
| if ``build_clib`` or ``build_ext`` are not specified by the user, the arguments of |
There was a problem hiding this comment.
| if ``build_clib`` or ``build_ext`` are not specified by the user, the arguments of | |
| If ``build_clib`` or ``build_ext`` are not specified by the user, the arguments of |
| special options perform a series of procedures. | ||
|
|
||
|
|
||
| The following tables show the current supported optimizations sorted from the lowest to the highest interest. |
There was a problem hiding this comment.
| The following tables show the current supported optimizations sorted from the lowest to the highest interest. | |
| The following tables show the current supported optimizations sorted from lowest to highest. |
| :align: left | ||
|
|
||
| ====================================== ======================================= | ||
| For Arch Returns |
There was a problem hiding this comment.
| For Arch Returns | |
| For Architecture Returns |
| #undef NPY__CPU_DISPATCH_BASELINE_CALL | ||
| #undef NPY__CPU_DISPATCH_CALL | ||
| // nothing strange here, just a normal preprocessor callback | ||
| // enabled only if 'baseline' spesfied withiin the configration statments |
There was a problem hiding this comment.
| // enabled only if 'baseline' spesfied withiin the configration statments | |
| // enabled only if 'baseline' is specified in the configration statments |
| // the addtional optimizations, so it could be SSE42 or AVX512F | ||
| #define CURRENT_TARGET(X) NPY_CAT(NPY_CAT(X, _), NPY__CPU_TARGET_CURRENT) | ||
| #endif | ||
| // Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols, |
There was a problem hiding this comment.
| // Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols, | |
| // Macro 'CURRENT_TARGET' adding the current target as suffix to the exported symbols, |
| // 'NPY__CPU_DISPATCH_BASELINE_CALL'. | ||
| // it highely recomaned to include the config header before exectuing | ||
| // the dispatching macros in case if there's another header in the scope. | ||
| #include "hello.dispatch.h" |
There was a problem hiding this comment.
To be clear, this is the clean usage to include the dispatch header in the function scope?
I guess the comment means to point to it, but I find it a bit hard to understand.
There was a problem hiding this comment.
Yes, it's safe and clean to add it inside any level of scope, e.g. functions.
This header only contains two abstract C macros that mainly used for determining the required optimizations from outside the dispatch-able sources.
see also npy_cpu_dispatch.h and
numpy/numpy/core/code_generators/generate_umath.py
Lines 1052 to 1061 in 72b05c0
| // However in this example, we just handle it manually. | ||
| void simd_whoami(const char *extra_info); | ||
| void simd_whoami_AVX512F(const char *extra_info); | ||
| void simd_whoami_SSE41(const char *extra_info); |
There was a problem hiding this comment.
The simd_whoami function is just a dummy definition as an example?
There was a problem hiding this comment.
yes. what I was trying to do is explaining what happening under the hood, but now I feel like I need to move these examples into an advanced topic and adding instead examples for how to use the CPU dispatcher via high-level macros in npy_cpu_dispatch.h without giving too many details.
| Now assume you attached **hello.dispatch.c** to the source tree, then | ||
| the infrastructure should generate a temporary config header called | ||
| **hello.dispatch.h** that can be reached by any source in the source | ||
| tree, and it should contain the following code : |
There was a problem hiding this comment.
| tree, and it should contain the following code : | |
| tree, and will contain the following code: |
|
@seberg, I would like to merge this patch as-is and then I'm going to release a series pull-requests to improve it. can we just consider it as a seed doc? |
|
OK, works for me, you can just use whatever suggestions in any followup. Lets put it in, thanks @seiko2plus. |
@seiko2plus I reformatted your document and made it into a separate PR here as rst.