diff --git a/docs/sources/_images/DPEP.png b/docs/sources/_images/DPEP.png new file mode 100644 index 0000000..a6e2610 Binary files /dev/null and b/docs/sources/_images/DPEP.png differ diff --git a/docs/sources/_images/dpep-cores.png b/docs/sources/_images/dpep-cores.png new file mode 100644 index 0000000..4debbcb Binary files /dev/null and b/docs/sources/_images/dpep-cores.png differ diff --git a/docs/sources/_images/dpep-ilp.png b/docs/sources/_images/dpep-ilp.png new file mode 100644 index 0000000..ddf8725 Binary files /dev/null and b/docs/sources/_images/dpep-ilp.png differ diff --git a/docs/sources/_images/dpep-simd.png b/docs/sources/_images/dpep-simd.png new file mode 100644 index 0000000..8f2e078 Binary files /dev/null and b/docs/sources/_images/dpep-simd.png differ diff --git a/docs/sources/_images/fp-cancellation.png b/docs/sources/_images/fp-cancellation.png new file mode 100644 index 0000000..6163146 Binary files /dev/null and b/docs/sources/_images/fp-cancellation.png differ diff --git a/docs/sources/_images/hetero-devices.png b/docs/sources/_images/hetero-devices.png new file mode 100644 index 0000000..3d1d935 Binary files /dev/null and b/docs/sources/_images/hetero-devices.png differ diff --git a/docs/sources/_images/kernel-queue-device.png b/docs/sources/_images/kernel-queue-device.png new file mode 100644 index 0000000..15482f5 Binary files /dev/null and b/docs/sources/_images/kernel-queue-device.png differ diff --git a/docs/sources/_images/queue-exception1.png b/docs/sources/_images/queue-exception1.png new file mode 100644 index 0000000..b707ab9 Binary files /dev/null and b/docs/sources/_images/queue-exception1.png differ diff --git a/docs/sources/_images/queue-exception2.png b/docs/sources/_images/queue-exception2.png new file mode 100644 index 0000000..003b06b Binary files /dev/null and b/docs/sources/_images/queue-exception2.png differ diff --git a/docs/sources/_images/queue-exception3.png b/docs/sources/_images/queue-exception3.png new file mode 100644 index 0000000..630cd12 Binary files /dev/null and b/docs/sources/_images/queue-exception3.png differ diff --git a/docs/sources/parallelism.rst b/docs/sources/parallelism.rst index e1a5a00..f9f2ad1 100644 --- a/docs/sources/parallelism.rst +++ b/docs/sources/parallelism.rst @@ -3,3 +3,42 @@ Parallelism in modern data parallel architectures ================================================= + +Python is loved for its productivity and interactivity. But when it comes to dealing with +computationally heavy codes Python performance cannot be compromised. Intel and Python numerical +computing communities, such as `NumFOCUS `_, dedicated attention to +optimizing core numerical and data science packages for leveraging parallelism available in modern CPUs: + +* **Multiple computational cores:** Several computational cores allow processing data concurrently. + Compared to a single core CPU, *N* cores can process either *N* times bigger data in a fixed time, or + reduce a computation time *N* times for a fixed amount of data. + +.. image:: ./_images/dpep-cores.png + :width: 600px + :align: center + :alt: Multiple CPU Cores + +* **SIMD parallelism:** SIMD (Single Instruction Multiple Data) is a special type of instructions + that perform operations on vectors of data elements at the same time. The size of vectors is called SIMD width. + If SIMD width is *K* then a SIMD instruction can process *K* data elements in parallel. + + In the following diagram the SIMD width is 2, which means that a single instruction processes two elements simultaneously. + Compared to regular instructions that process one element at a time, 2-wide SIMD instruction performs + 2 times more data in fixed time, or, respectively, process a fixed amount of data 2 times faster. + +.. image:: ./_images/dpep-simd.png + :width: 150px + :align: center + :alt: SIMD + +* **Instruction-Level Parallelism:** Modern CISC architectures, such as x86, allow performing data independent + instructions in parallel. In the following example, we compute :math:`a * b + (c - d)`. + Operations :math:`*` and :math:`-` can be executed in parallel, the last instruction + :math:`+` depends on availability of :math:`a * b` and :math:`c - d` and hence cannot be executed in parallel + with :math:`*` and :math:`-`. + +.. image:: ./_images/dpep-ilp.png + :width: 150px + :align: center + :alt: SIMD +