forked from OpenMP/Examples
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathExamples_target.tex
More file actions
145 lines (104 loc) · 6.04 KB
/
Examples_target.tex
File metadata and controls
145 lines (104 loc) · 6.04 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
\pagebreak
\section{\code{target} Construct}
\label{sec:target}
\subsection{\code{target} Construct on \code{parallel} Construct}
\label{subsec:target_parallel}
This following example shows how the \code{target} construct offloads a code
region to a target device. The variables \plc{p}, \plc{v1}, \plc{v2}, and \plc{N} are implicitly mapped
to the target device.
\cexample[4.0]{target}{1}
\ffreeexample[4.0]{target}{1}
\subsection{\code{target} Construct with \code{map} Clause}
\label{subsec:target_map}
This following example shows how the \code{target} construct offloads a code
region to a target device. The variables \plc{p}, \plc{v1} and \plc{v2} are explicitly mapped to the
target device using the \code{map} clause. The variable \plc{N} is implicitly mapped to
the target device.
\cexample[4.0]{target}{2}
\ffreeexample[4.0]{target}{2}
\subsection{\code{map} Clause with \code{to}/\code{from} map-types}
\label{subsec:target_map_tofrom}
The following example shows how the \code{target} construct offloads a code region
to a target device. In the \code{map} clause, the \code{to} and \code{from}
map-types define the mapping between the original (host) data and the target (device)
data. The \code{to} map-type specifies that the data will only be read on the
device, and the \code{from} map-type specifies that the data will only be written
to on the device. By specifying a guaranteed access on the device, data transfers
can be reduced for the \code{target} region.
The \code{to} map-type indicates that at the start of the \code{target} region
the variables \plc{v1} and \plc{v2} are initialized with the values of the corresponding variables
on the host device, and at the end of the \code{target} region the variables
\plc{v1} and \plc{v2} are not assigned to their corresponding variables on the host device.
The \code{from} map-type indicates that at the start of the \code{target} region
the variable \plc{p} is not initialized with the value of the corresponding variable
on the host device, and at the end of the \code{target} region the variable \plc{p}
is assigned to the corresponding variable on the host device.
\cexample[4.0]{target}{3}
The \code{to} and \code{from} map-types allow programmers to optimize data
motion. Since data for the \plc{v} arrays are not returned, and data for the \plc{p} array
are not transferred to the device, only one-half of the data is moved, compared
to the default behavior of an implicit mapping.
\ffreeexample[4.0]{target}{3}
\subsection{\code{map} Clause with Array Sections}
\label{subsec:target_array_section}
The following example shows how the \code{target} construct offloads a code region
to a target device. In the \code{map} clause, map-types are used to optimize
the mapping of variables to the target device. Because variables \plc{p}, \plc{v1} and \plc{v2} are
pointers, array section notation must be used to map the arrays. The notation \code{:N}
is equivalent to \code{0:N}.
\cexample[4.0]{target}{4}
\clearpage
In C, the length of the pointed-to array must be specified. In Fortran the extent
of the array is known and the length need not be specified. A section of the array
can be specified with the usual Fortran syntax, as shown in the following example.
The value 1 is assumed for the lower bound for array section \plc{v2(:N)}.
\ffreeexample[4.0]{target}{4}
A more realistic situation in which an assumed-size array is passed to \code{vec\_mult}
requires that the length of the arrays be specified, because the compiler does
not know the size of the storage. A section of the array must be specified with
the usual Fortran syntax, as shown in the following example. The value 1 is assumed
for the lower bound for array section \plc{v2(:N)}.
\ffreeexample[4.0]{target}{4b}
\subsection{\code{target} Construct with \code{if} Clause}
\label{subsec:target_if}
The following example shows how the \code{target} construct offloads a code region
to a target device.
The \code{if} clause on the \code{target} construct indicates that if the variable
\plc{N} is smaller than a given threshold, then the \code{target} region will be executed
by the host device.
The \code{if} clause on the \code{parallel} construct indicates that if the
variable \plc{N} is smaller than a second threshold then the \code{parallel} region
is inactive.
\cexample[4.0]{target}{5}
\ffreeexample[4.0]{target}{5}
The following example is a modification of the above \plc{target.5} code to show the combined \code{target}
and parallel loop directives. It uses the \plc{directive-name} modifier in multiple \code{if}
clauses to specify the component directive to which it applies.
The \code{if} clause with the \code{target} modifier applies to the \code{target} component of the
combined directive, and the \code{if} clause with the \code{parallel} modifier applies
to the \code{parallel} component of the combined directive.
\cexample[4.5]{target}{6}
\ffreeexample[4.5]{target}{6}
\subsection{target Reverse Offload}
\label{subsec:target_reverse_offload}
Beginning with OpenMP 5.0, implementations are allowed to
offload back to the host (reverse offload).
In the example below the \plc{error\_handler} function
is executed back on the host, if an erroneous value is
detected in the \plc{A} array on the device.
This is accomplished by specifying the \plc{device-modifier}
\code{ancestor} modifier, along with a device number of \code{1},
to indicate that the execution is to be performed on the
immediate parent (\plc{1st ancestor})-- the host.
The \code{requires} directive (another 5.0 feature)
uses the \code{reverse\_offload} clause to guarantee
that the reverse offload is implemented.
Note that the \code{declare target} directive uses the
\code{device\_type} clause (another 5.0 feature) to specify that
the \plc{error\_handler} function is compiled to
execute on the \plc{host} only. This ensures that no
attempt will be made to create a device version of the
function. This feature may be necessary if the function
exists in another compile unit.
\cexample[5.0]{target_reverse_offload}{7}
\ffreeexample[5.0]{target_reverse_offload}{7}