forked from OpenMP/Examples
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathExamples_reduction.tex
More file actions
237 lines (178 loc) · 11.1 KB
/
Examples_reduction.tex
File metadata and controls
237 lines (178 loc) · 11.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
\pagebreak
\section{Reduction}
\label{sec:reduction}
This section covers ways to perform reductions in parallel, task, taskloop, and SIMD regions.
\subsection{The \code{reduction} Clause}
\label{subsec:reduction}
The following example demonstrates the \code{reduction} clause; note that some
reductions can be expressed in the loop in several ways, as shown for the \code{max}
and \code{min} reductions below:
\cexample{reduction}{1}
\pagebreak
\ffreeexample{reduction}{1}
A common implementation of the preceding example is to treat it as if it had been
written as follows:
\cexample{reduction}{2}
\fortranspecificstart
\ffreenexample{reduction}{2}
The following program is non-conforming because the reduction is on the
\emph{intrinsic procedure name} \code{MAX} but that name has been redefined to be the variable
named \code{MAX}.
\ffreenexample{reduction}{3}
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
\end{figure}
The following conforming program performs the reduction using the
\emph{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed
to \code{REN}.
\ffreenexample{reduction}{4}
The following conforming program performs the reduction using
\plc{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed
to \code{MIN}.
\ffreenexample{reduction}{5}
\fortranspecificend
\pagebreak
The following example is non-conforming because the initialization (\code{a =
0}) of the original list item \code{a} is not synchronized with the update of
\code{a} as a result of the reduction computation in the \code{for} loop. Therefore,
the example may print an incorrect value for \code{a}.
To avoid this problem, the initialization of the original list item \code{a}
should complete before any update of \code{a} as a result of the \code{reduction}
clause. This can be achieved by adding an explicit barrier after the assignment
\code{a = 0}, or by enclosing the assignment \code{a = 0} in a \code{single}
directive (which has an implied barrier), or by initializing \code{a} before
the start of the \code{parallel} region.
\cexample{reduction}{6}
\fexample{reduction}{6}
The following example demonstrates the reduction of array \plc{a}. In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause. The corresponding Fortran example uses array syntax supported in the base language. As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted. But this oversight will be fixed in the next release of the specification.
\cexample{reduction}{7}
\ffreeexample{reduction}{7}
\subsection{Task Reduction}
\label{subsec:task_reduction}
The following C/C++ and Fortran examples show how to implement
a task reduction over a linked list.
Task reductions are supported by the \code{task\_reduction} clause which can only be
applied to the \code{taskgroup} directive, and a \code{in\_reduction} clause
which can be applied to the \code{task} construct among others.
The \code{task\_reduction} clause on the \code{taskgroup} construct is used to
define the scope of a new reduction, and after the \code{taskgroup}
region the original variable will contain the final value of the reduction.
In the task-generating while loop the \code{in\_reduction} clause of the \code{task}
construct is used to specify that the task participates "in" the reduction.
Note: The \plc{res} variable is private in the \plc{linked\_list\_sum} routine
and is not required to be shared (as in the case of a \code{parallel} construct
reduction).
\cexample{task_reduction}{1}
\ffreeexample{task_reduction}{1}
\subsection{Taskloop Reduction}
\label{subsec:taskloop_reduction}
In the OpenMP 5.0 Specification the \code{taskloop} construct
was extended to include the reductions.
The following two examples show how to implement a reduction over an array
using taskloop reduction in two different ways.
In the first
example we apply the \code{reduction} clause to the \code{taskloop} construct. As it was
explained above in the task reduction examples, a reduction over tasks is
divided in two components: the scope of the reduction, which is defined by a
\code{taskgroup} region, and the tasks that participate in the reduction. In this
example, the \code{reduction} clause defines both semantics. First, it specifies that
the implicit \code{taskgroup} region associated with the \code{taskloop} construct is the scope of the
reduction, and second, it defines all tasks created by the \code{taskloop} construct as
participants of the reduction. About the first property, it is important to note
that if we add the \code{nogroup} clause to the \code{taskloop} construct the code will be
nonconforming, basically because we have a set of tasks that participate in a
reduction that has not been defined.
\cexample{taskloop_reduction}{1}
\ffreeexample{taskloop_reduction}{1}
%In the second example, we are computing exactly the same
%value but we do it in a very different way. The first thing that we do in the
%\plc{array\_sum} function is to create a \code{taskgroup} region that defines the scope of a
%new reduction using the \code{task\_reduction} clause.
%After that, we specify that a task and also the tasks generated
%by a taskloop will participate in that reduction using the \code{in\_reduction} clause
%on the \code{task} and \code{taskloop} constructs, respectively. Note that
%we also added the \code{nogroup} clause to the \code{taskloop} construct. This is allowed
%because what we are expressing with the \code{in\_reduction} clause is different
%from what we were expressing with the \code{reduction} clause. In one case we specify
%that the generated tasks will participate in a previously declared reduction
%(\code{in\_reduction} clause) whereas in the other case we specify that we want to
%create a new reduction and also that all tasks generated by the taskloop will
%participate on it.
The second example computes exactly the same value as in the preceding\plc{taskloop\_reduction.1} code section,
but in a very different way.
First, in the \plc{array\_sum} function a \code{taskgroup} region is created
that defines the scope of a new reduction using the \code{task\_reduction} clause.
After that, a task and also the tasks generated by a taskloop participate in
that reduction by using the \code{in\_reduction} clause on the \code{task}
and \code{taskloop} constructs, respectively.
Note that the \code{nogroup} clause was added to the \code{taskloop} construct.
This is allowed because what is expressed with the \code{in\_reduction} clause
is different from what is expressed with the \code{reduction} clause.
In one case the generated tasks are specified to participate in a previously
declared reduction (\code{in\_reduction} clause) whereas in the other case
creation of a new reduction is specified and also that all tasks generated
by the taskloop will participate on it.
\cexample{taskloop_reduction}{2}
\ffreeexample{taskloop_reduction}{2}
In the OpenMP 5.0 Specification, \code{reduction} clauses for the
\code{taskloop}~\code{ simd} construct were also added.
The examples below compare reductions for the \code{taskloop} and the \code{taskloop}~\code{simd} constructs.
These examples illustrate the use of \code{reduction} clauses within
"stand-alone" \code{taskloop} constructs, and the use of \code{in\_reduction} clauses for tasks of taskloops to participate
with other reductions within the scope of a parallel region.
\textbf{taskloop reductions:}
In the \plc{taskloop reductions} section of the example below,
\plc{taskloop 1} uses the \code{reduction} clause
in a \code{taskloop} construct for a sum reduction, accumulated in \plc{asum}.
The behavior is as though a \code{taskgroup} construct encloses the
taskloop region with a \code{task\_reduction} clause, and each taskloop
task has an \code{in\_reduction} clause with the specifications
of the \code{reduction} clause.
At the end of the taskloop region \plc{asum} contains the result of the reduction.
The next taskloop, \plc{taskloop 2}, illustrates the use of the
\code{in\_reduction} clause to participate in a previously defined
reduction scope of a \code{parallel} construct.
The task reductions of \plc{task 2} and \plc{taskloop 2} are combined
across the \code{taskloop} construct and the single \code{task} construct, as specified
in the \code{reduction(task,}~\code{+:asum)} clause of the \code{parallel} construct.
At the end of the parallel region \plc{asum} contains the combined result of all reductions.
\textbf{taskloop simd reductions:}
Reductions for the \code{taskloop}~\code{simd} construct are shown in the second half of the code.
Since each component construct, \code{taskloop} and \code{simd},
can accept a reduction-type clause, the \code{taskloop}~\code{simd} construct
is a composite construct, and the specific application of the reduction clause is defined
within the \code{taskloop}~\code{simd} construct section of the OpenMP 5.0 Specification.
The code below illustrates use cases for these reductions.
In the \plc{taskloop simd reduction} section of the example below,
\plc{taskloop simd 3} uses the \code{reduction} clause
in a \code{taskloop}~\code{simd} construct for a sum reduction within a loop.
For this case a \code{reduction} clause is used, as one would use
for a \code{simd} construct.
The SIMD reductions of each task are combined, and the results of these tasks are further
combined just as in the \code{taskloop} construct with the \code{reduction} clause for \plc{taskloop 1}.
At the end of the taskloop region \plc{asum} contains the combined result of all reductions.
If a \code{taskloop}~\code{simd} construct is to participate in a previously defined
reduction scope, the reduction participation should be specified with
a \code{in\_reduction} clause, as shown in the \code{parallel} region enclosing
\plc{task 4} and \plc{taskloop simd 4} code sections.
Here the \code{taskloop}~\code{simd} construct's
\code{in\_reduction} clause specifies participation of the construct's tasks as
a task reduction within the scope of the parallel region.
That is, the results of each task of the \code{taskloop} construct component
contribute to the reduction in a broader level, just as in \plc{parallel reduction a} code section above.
Also, each \code{simd}-component construct
occurs as if it has a \code{reduction} clause, and the
SIMD results of each task are combined as though to form a single result for
each task (that participates in the \code{in\_reduction} clause).
At the end of the parallel region \plc{asum} contains the combined result of all reductions.
%Just as in \plc{parallel reduction a} the
%\code{taskloop simd} construct reduction results are combined
%with the \code{task} construct reduction results
%as specified by the \code{in\_reduction} clause of the \code{task} construct
%and the \plc{task} reduction-modifier of the \code{reduction} clause of
%the \code{parallel} construct.
%At the end of the parallel region \plc{asum} contains the combined result of all reductions.
\cexample{taskloop_simd_reduction}{1}
\ffreeexample{taskloop_simd_reduction}{1}
% All other reductions