Table of Contents
Linux perf is a powerful performance analysis tool that uses hardware performance counters and kernel tracepoints to profile applications and the system. Unlike sampling profilers that add overhead, perf leverages CPU hardware to count events like cycles, instructions, cache misses, and branch mispredictions with minimal impact on program execution.
Perf helps identify CPU hotspots, memory access patterns, and system bottlenecks.
It's essential for optimizing performance-critical code, understanding where time
is spent, and validating optimization efforts. The tool works on any executable
without recompilation, though debug symbols (-g) improve output readability.
Record and analyze CPU samples:
$ perf record ./myprogram # record CPU samples
$ perf report # interactive report
$ perf report --stdio # text report
$ perf record -g ./myprogram # record with call graphs
$ perf report -g # show call graph in reportCommon record options:
$ perf record -F 99 ./prog # sample at 99 Hz
$ perf record -p 1234 # attach to running process
$ perf record -a sleep 10 # system-wide for 10 seconds
$ perf record -o out.data ./prog # custom output fileGet summary statistics without recording samples:
$ perf stat ./myprogram
Performance counter stats for './myprogram':
1,234.56 msec task-clock
123 context-switches
1,000,000 cycles
800,000 instructions # 0.80 insn per cycle
50,000 cache-misses
10,000 branch-misses
$ perf stat -e cycles,instructions ./prog # specific events
$ perf stat -r 5 ./prog # run 5 times, show stats
$ perf stat -d ./prog # detailed statsReal-time view of system or process hotspots:
$ perf top # system-wide live view
$ perf top -p 1234 # specific process
$ perf top -F 99 # sample at 99 Hz
$ perf top -g # show call graphs
$ perf top -ns comm,dso # sort by process and library
$ perf top -e cache-misses # profile cache missesProfile specific hardware and software events:
# CPU events
$ perf stat -e cycles,instructions,branches,branch-misses ./prog
# Cache events
$ perf stat -e cache-references,cache-misses ./prog
$ perf stat -e L1-dcache-loads,L1-dcache-load-misses ./prog
$ perf stat -e LLC-loads,LLC-load-misses ./prog
# Memory events
$ perf stat -e page-faults,minor-faults,major-faults ./prog
# System calls
$ perf stat -e 'syscalls:sys_enter_*' ./progList available events:
$ perf list # all events
$ perf list hw # hardware events
$ perf list sw # software events
$ perf list cache # cache events
$ perf list tracepoint # kernel tracepointsUnderstand where time is spent in the call hierarchy:
# Record with frame pointers (compile with -fno-omit-frame-pointer)
$ perf record -g ./myprogram
$ perf report -g
# Record with DWARF unwinding (works without frame pointers)
$ perf record --call-graph dwarf ./myprogram
# Record with LBR (Last Branch Record, Intel CPUs)
$ perf record --call-graph lbr ./myprogramFlame graphs visualize profiling data as interactive SVGs. The x-axis shows stack depth, and width represents time spent. Download FlameGraph tools from https://github.com/brendangregg/FlameGraph.
$ perf record -g ./myprogram
$ perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
# CPU flame graph
$ perf record -F 99 -g ./prog
$ perf script | stackcollapse-perf.pl | flamegraph.pl > cpu.svg
# Off-CPU flame graph (shows where program waits)
$ perf record -e sched:sched_switch -g ./progSee which source lines consume the most cycles:
$ perf record ./myprogram
$ perf annotate # interactive annotation
$ perf annotate func_name # annotate specific function
$ perf annotate --stdio # text outputRequires debug symbols (-g) and ideally compiled with -fno-omit-frame-pointer.
Profile the entire system to find bottlenecks:
$ perf top -a # live system-wide view
$ perf record -a -g sleep 30 # record system for 30 seconds
$ perf report
# Find which processes use most CPU
$ perf top -ns comm
# Trace context switches
$ perf record -e context-switches -a sleep 10Trace specific events and system calls:
# Trace system calls
$ perf trace ./myprogram
$ perf trace -p 1234 # trace running process
# Trace specific syscalls
$ perf trace -e open,read,write ./prog
# Count syscalls
$ perf stat -e 'syscalls:sys_enter_*' ./progCompare performance between two runs:
$ perf record -o before.data ./prog_v1
$ perf record -o after.data ./prog_v2
$ perf diff before.data after.dataFind CPU hotspots:
$ perf record -g ./myprogram
$ perf report
# Look for functions with highest "Overhead" percentageDiagnose cache performance:
$ perf stat -e cache-references,cache-misses,L1-dcache-load-misses ./prog
# High cache-miss ratio indicates poor memory access patternsProfile a running server:
$ perf record -p $(pgrep myserver) -g sleep 30
$ perf reportCheck if CPU-bound or I/O-bound:
$ perf stat ./myprogram
# Low instructions-per-cycle + high context-switches = I/O bound
# High instructions-per-cycle = CPU bound