PrescienceLab/fpspy
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
FPSpy Tool
==========
Copyright (c) 2017 Peter A. Dinda Please see LICENSE file.
This is a tool for floating point exception interception and
statistics gathering that can run underneath existing, unmodified
binaries.
FPSpy is documented in
P. Dinda, A. Bernat, C. Hetland, Spying on the Floating Point Behavior
of Existing, Unmodifed Scientific Applications, Proceedings of the
29th ACM Symposium on High-performance Parallel and Distributed
Computing (HPDC 2020), June, 2020.
You can also see the comments in src/fpspy.c for some details of how
this works and what it illustrates.
Building and Testing
--------------------
To build:
make
To test:
make test
Running
-------
You generally want your environment configured as follows:
export PATH=$FPSPY_DIR/bin/$ARCH:$FPSPY_DIR/scripts:$PATH
The code has two modes of operation:
- Aggregate mode simply captures the floating point exception state
at the beginning and end of the program. Since the exception state
is sticky, this will let us know if the program had 1 or more
occurances of each of the possible exceptions
- Individual mode captures individual floating point exceptions,
emulating the instructions that cause them.
The code can be run against a dynamically linked binary which crosses
the shared library boundary for the fe* library calls, which
manipulate the FPU behavior, and for the signal and sigaction system
calls.
To run against a binary:
LD_PRELOAD=fpspy.so [FPSPY_MODE=<mode>] [FPSPY_AGGRESSIVE=<yes|no>] exec.exe
The modes are "aggregate" and "individual" as noted above. If no
mode is given, aggregate mode is assumed.
Generally, FPSpy gets out of the way if the executable itself
attempts to manipulate the FPU signaling state via the fe* and
signal/sigaction system calls. By default, it is very sensitive to
this. If FPSPY_AGGRESSIVE is set, then it is less sensitive, which means
that more can be captured, but the execution is more likely to be
broken.
Additional environment variables
FPSPY_DISABLE_PTHREADS=yes (or DISABLE_PTHREADS=yes)
Do not trace newly created pthreads
You will also want to set this for any application which
does not dynamically link the pthread library. Otherwise startup
will fail when attempting to shim non-existent pthread functions.
FPSPY_MAXCOUNT=k
means that only the first k exceptions will be recorded
this only affects individual mode
k=-1 means that there is no limit to how many exceptions
will be recorded. By default, k is about 64,000.
FPSPY_SAMPLE=k
means that only every kth exception will be recorded
this only affects individual mode
FPSPY_EXCEPT_LIST=list
means that only the listed exceptions will be intercepted
this only affects individual mode
the comma-delimited list can include:
invalid (NAN)
denorm
divide (divide by zero)
overflow
underflow
precision (rounding)
FPSPY_POISSON=A:B
means that Poisson sampling will be used with the ON period
chosen from an exponential random distro with mean A usec
and OFF period chosen from an exponential distro with mean
B seconds.
Time-based sampling and poisson sampling model
FPSPY_SEED=n
means the internal random number generator for sampling
is seeded with value n
FPSPY_TIMER=real|virtual|prof (default real)
virtual timer means by instructions, real timer means by real-time
That is, with FPSPY_POISSON=A:B, and FPSPY_TIMER=virtual, A and B are
interpretted as time spent awake, instead of time spent. prof timer
is virtual time in both kernel and user space, and using a signal
the application is unlikely to be using.
FPSPY_KICKSTART=y|n
If set to y, then FPSPY does not start on the initial process
until a SIGTRAP is delivered externally. Otherwise, it starts immediately
For getting a sense of how FPSPY_POISSON operates, you cano
also run:
make test_sleepy (real time)
or
make test_dopey (virtual or profile time)
FPSPY_KERNEL=y|n (default n)
Attempt to use kernel support to make FP traps faster.
This is the same support as in FPVM and uses the same kernel module
Forced changes in floating point execution environment
FPSPY_FORCE_ROUNDING=positive|negative|zero|nearest[;daz][;ftz]
This forces rounding to operate in the noted way (IEEE default is nearest).
If daz is included, this means all denorms are treated as zeros [Intel specific]
if ftz is included, this means all denorms are rounded to zeros [Intel specific]
To get a sense of how FPSPY_FORCE_ROUNDING operates, you can
also run:
make test_rounding
Output and Analysis Scripts
---------------------------
FPSpy produces a trace for each thread.
In aggregate mode, a trace is short, simple, user-readable file which
is self-explanatory.
In individual mode, a trace is a binary format file which may be huge.
We provide tools to display and analyze such traces.
in includ and src:
libtrace.h / libtrace.c -> trace access from C via memory mapping
(trace shows up as a giant array of structs)
trace_print.c -> example use (just prints file in human-readable format)
In scripts:
parse_individual.pl -> trace_print in perl
analyze_individual.pl -> create report from trace
extrace_fp_event_timestamps.pl -> create time series from trace