Skip to content
This repository was archived by the owner on Jan 12, 2026. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/sources/_images/DPEP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sources/_images/dpep-cores.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sources/_images/dpep-ilp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sources/_images/dpep-simd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sources/_images/fp-cancellation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sources/_images/hetero-devices.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sources/_images/kernel-queue-device.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sources/_images/queue-exception1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sources/_images/queue-exception2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sources/_images/queue-exception3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 39 additions & 0 deletions docs/sources/parallelism.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,42 @@

Parallelism in modern data parallel architectures
=================================================

Python is loved for its productivity and interactivity. But when it comes to dealing with
computationally heavy codes Python performance cannot be compromised. Intel and Python numerical
computing communities, such as `NumFOCUS <https://numfocus.org/>`_, dedicated attention to
optimizing core numerical and data science packages for leveraging parallelism available in modern CPUs:

* **Multiple computational cores:** Several computational cores allow processing data concurrently.
Compared to a single core CPU, *N* cores can process either *N* times bigger data in a fixed time, or
reduce a computation time *N* times for a fixed amount of data.

.. image:: ./_images/dpep-cores.png
:width: 600px
:align: center
:alt: Multiple CPU Cores

* **SIMD parallelism:** SIMD (Single Instruction Multiple Data) is a special type of instructions
that perform operations on vectors of data elements at the same time. The size of vectors is called SIMD width.
If SIMD width is *K* then a SIMD instruction can process *K* data elements in parallel.

In the following diagram the SIMD width is 2, which means that a single instruction processes two elements simultaneously.
Compared to regular instructions that process one element at a time, 2-wide SIMD instruction performs
2 times more data in fixed time, or, respectively, process a fixed amount of data 2 times faster.

.. image:: ./_images/dpep-simd.png
:width: 150px
:align: center
:alt: SIMD

* **Instruction-Level Parallelism:** Modern CISC architectures, such as x86, allow performing data independent
instructions in parallel. In the following example, we compute :math:`a * b + (c - d)`.
Operations :math:`*` and :math:`-` can be executed in parallel, the last instruction
:math:`+` depends on availability of :math:`a * b` and :math:`c - d` and hence cannot be executed in parallel
with :math:`*` and :math:`-`.

.. image:: ./_images/dpep-ilp.png
:width: 150px
:align: center
:alt: SIMD