-
Notifications
You must be signed in to change notification settings - Fork 225
Expand file tree
/
Copy pathusdt.html
More file actions
275 lines (237 loc) · 10.4 KB
/
usdt.html
File metadata and controls
275 lines (237 loc) · 10.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
<b>User Defined Tracing - How it works</b>
<p>
USDT is a mechanism for user land applications to embed their
own probes into executables. For example, a Perl or Python
interpreter might use it to gain access to stack traces of applications
which are already started.
<p>
The goal of the DTrace team was near zero overhead when not invoked.
This works well - even commercial applications can embed
probes and not worry about performance or run time dependencies.
<p>
The steps involved in a USDT probe are as follows:
<ul>
<li>Add probe calls, via DTRACE_PROBE macros to the source code
of the application.</li>
<li>Compile code to object file (*.o)</li>
<li>Use dtrace command line tool to convert the object file. This involves
stubbing out the assembler function calls, and creating a table
in the ELF file enumerating the probes.
</li>
<li>Create (link) the application binary, with a special object file
(drti.o). drti.o runs before main() and takes the table
of probes, and lets the kernel know (via an ioctl() to the dtrace
driver) of the probes.</li>
<li>Run the application: drti.o takes control and issues the ioctl()
of the probes. Whilst the application is running, you can use "dtrace -l"
to see the probes. Probes are a function of the pid provider, so you
will see a new suite of probes for each process running with USDT, and
as many probes as there are DTRACE_PROBE calls in the source code.</li>
<li>Whilst the application is running, you can use dtrace to monitor
these probes at any granularity you like (eg all probes from the process,
or specific probes from all such processes).</li>
<li>When a dtrace monitors the probe, the site where the call instruction
is placed is modified and an INT3 (breakpoint instruction) is placed
at the site of what was the original CALL instruction. When the breakpoint
is hit, the dtrace driver takes control and actions the probe.
This is very similar to how a kernel FBT probe works, except the
breakpoint happened in user space. At the point of breakpoint execution,
any D script associated with the probe is invoked. The target application
is frozen until the D script completes, allowing it to take a
static snapshot of any details it likes. Typically, this might include
taking a user stack dump (ustack()).
</li>
<li>Terminating a dtrace which is probing the application will
remove the breakpoints and restore the NOP instructions.</li>
</ul>
<p>
<h2>Annotating a program with custom probes</h2>
<p>
DTrace provides a bunch of DTRACE_PROBEn() macros which allow
you to place probes in source code. The macros allow for zero to 4 arguments
to be passed. DTrace also allows you to provide application specific
macros. The following example shows using the raw macros.
<pre>
# include <stdio.h>
# include <sys/sdt.h>
int main(int argc, char **argv)
{
while (1) {
printf("here on line %d\n", __LINE__);
DTRACE_PROBE1(simple, saw__line, 0x1234);
printf("here on line %d\n", __LINE__);
DTRACE_PROBE1(simple, saw__word, 0x87654321);
printf("here on line %d\n", __LINE__);
DTRACE_PROBE1(simple, saw__word, 0xdeadbeef);
printf("here on line %d\n", __LINE__);
sleep(1);
}
}
</pre>
<p>
The DTRACE_PROBEx macros translate into a function call.
To gain near-zero overhead, during linking, the function call
is replaced by a series of NOP instructions. Heres an example disassembly
of the above
<pre>
#define DTRACE_PROBE1(provider, name, arg1) { \
extern void __dtrace_##provider##___##name(unsigned long); \
__dtrace_##provider##___##name((unsigned long)arg1); \
}
</pre>
The macro looks a little ugly, but its basically putting in a C
function call to a parameterised function name, e.g.
__dtrace__simple__saw__word, in the example above. Theres a number
of macros depending on the number of arguments (1..5).
In the example above, its as if we did something like:
<pre>
__dtrace__simple__saw__word(0x12345678);
</pre>
The function __dtrace__simple__saw__word never exists - its simply a place
holder.
<pre>
0000000000400e7c <main>:
400e7c: 55 push %rbp
400e7d: 48 89 e5 mov %rsp,%rbp
400e80: 48 83 ec 10 sub $0x10,%rsp
400e84: 89 7d fc mov %edi,0xfffffffffffffffc(%rbp)
400e87: 48 89 75 f0 mov %rsi,0xfffffffffffffff0(%rbp)
400e8b: be 07 00 00 00 mov $0x7,%esi
400e90: bf df 11 40 00 mov $0x4011df,%edi
400e95: b8 00 00 00 00 mov $0x0,%eax
400e9a: e8 01 f9 ff ff callq 4007a0 <printf@plt>
400e9f: bf 34 12 00 00 mov $0x1234,%edi
400ea4: 90 nop
400ea5: 90 nop
400ea6: 90 nop
400ea7: 90 nop
400ea8: 90 nop
400ea9: be 09 00 00 00 mov $0x9,%esi
400eae: bf df 11 40 00 mov $0x4011df,%edi
400eb3: b8 00 00 00 00 mov $0x0,%eax
400eb8: e8 e3 f8 ff ff callq 4007a0 <printf@plt>
400ebd: bf 21 43 65 87 mov $0x87654321,%edi
400ec2: 90 nop
400ec3: 90 nop
400ec4: 90 nop
400ec5: 90 nop
400ec6: 90 nop
400ec7: be 0b 00 00 00 mov $0xb,%esi
....
</pre>
<p>
When an application is built, dtrace is run on the
object files to rewrite the objects, stubbing out the calls
for probes, and creating a table in the executable of the
places where the stubs are located. (The code
for this is located in libdtrace/dt_link.c).
<p>
When the application is started up, a piece of code is
executed (before main() is called). [Code located in
libdtrace/drti.c]. This code looks at the current system,
to see if dtrace is loaded into the kernel and communicates
with the /dev/dtrace/helper driver to inform it that new
probes are available in this process.
<p>
Voila! We are done. Or nearly.
<p>
At this point, whilst the application is running, 'dtrace -l'
should reveal your new probes.
<b>The Kernel</b>
When a user elects to monitor the probe, the patched (NOP-ed) code
will be change into a call back into the kernel to notify the
function/probe is being invoked.
Cancellation of the probe will undo the patched code and we are done.
You can see these user level code segment modifications in the
example usdt/c/simple.c code. It computes a checksum of the main()
function and periodically dumps the checksum. If you attach to the
probes with dtrace, the checksum will change, and as you Ctrl-C dtrace,
the checksum will be restored.
<h2>Apple/Mac OS X</h2>
<p>
Apple has done some enhancements to dtrace, which are interesting and
simplify the mechanism for creating a USDT enabled application.
With Apple's dtrace mechanism, only two commands are needed to generate
a probable executable:
<pre>
$ dtrace -h -s simple_probes.d
$ gcc -I../../uts/common -o simple simple.c
</pre>
The dtrace command generates the header file, containing the probe
definitions, e.g.
<pre>
define SIMPLE_SAW_LINE(arg0) \
do { \
__asm__ volatile(".reference " SIMPLE_TYPEDEFS); \
__dtrace_probe$simple$saw__line$v1$696e74(arg0); \
__asm__ volatile(".reference " SIMPLE_STABILITY); \
} while (0)
</pre>
The use of ".reference" is interesting, since this is doing the work
of the traditional phase used to create an ELF object file with
the probe table information.
<h2>Issues</h2>
<p>
DTrace relies on the DTRACE_PROBE macros to create the assembler
sequence to call a dummy function. When a probe is activated, the
CALL instruction (which has now been NOP'ed) is replaced with a breakpoint
instruction. When the breakpoint is taken, all the registers and stack are
set up as if we were going to invoke the dummy function, and are available
as arguments to the probe (arg0, arg1, etc). When dtrace is not running,
there is some overhead to compute those arguments, even though a breakpoint
is not taken. So, it is best to keep the function arguments to be simple,
and not involve complex or time consuming calculations.
<p>
DTrace probes an IS_ENABLED macro to help reduce overhead. For each
probe you create, a macro is defined, allowing code like this:
<pre>
if (SIMPLE_ENTRY_ENABLED()) {
DTRACE_PROBE1(...);
}
</pre>
where the prefix "SIMPLE" is provided by the header definition for the
provider.
<h2>What people are using USDT for</h2>
<p>
The dtrace providers in the kernel are varied and powerful. But it
requires device driver and kernel internals knowledge to use or enhance
them. USDT has enormous potential for user land applications. Here are
some common examples, easily found on the web.
<h3>Perl, Ruby, Python, TCL, etc</h3>
<p>
Adding USDT probes to script languages like these, allow for performance
monitoring at the script level, rather than the implementation level.
Getting a stack trace, when a probe is hit in a Perl script, should
result in the Perl stack, not the virtual machine emulator, which is typically
unfathomable.
<h3>HTTP</h3>
<p>
There is probably a wealth of things which could be probed in a running
http server, e.g. raw requests, form submissions, and other things
which are HTTP server specific.
<p>
I have seen references to HTTP providers, which can be useful
<h2>References and Links</h2>
<ul>
<li>http://chrisa.github.com/blog/2011/12/04/libusdt-runtime-dtrace-providers/
I havent check this out, but the ability to dynamically create a probe point
at runtime is very useful, e.g. in a runtime interpreter or JIT type of
application.
</li>
<li>http://dtrace.org/blogs/ahl/2006/05/08/user-land-tracing-gets-better-and-better/
Adam is one of the original authors of dtrace, and reflects on some
enhancements to USDT (in the 2006/7 timeframe). Unfortunately, many
of the links referred to in this article are now defunct.
</li>
<li>
http://www.ibm.com/developerworks/aix/library/au-dtraceprobes.html?ca=drs-
Nice summary and examples of USDT, similar to this document.
</li>
<li>
http://prefetch.net/blog/index.php/2005/11/30/viewing-http-requests-with-dtrace-apache-top/
Interesting example of monitoring an Apache server to monitor HTTP requests.
</li>
<li>http://blogs.oracle.com/binujp/entry/dtrace_provider_for_python
Shows examples of using the Python provider to see into the execution of
a Python script.</li>
</ul>