perf.rst

Perf

Table of Contents

Introduction
Basic Profiling
Perf Stat
Perf Top
Useful Events
Call Graphs
Flame Graphs
Annotate Source
System-Wide Analysis
Tracing
Comparing Runs
Common Workflows

Introduction

Linux perf is a powerful performance analysis tool that uses hardware performance counters and kernel tracepoints to profile applications and the system. Unlike sampling profilers that add overhead, perf leverages CPU hardware to count events like cycles, instructions, cache misses, and branch mispredictions with minimal impact on program execution.

Perf helps identify CPU hotspots, memory access patterns, and system bottlenecks. It's essential for optimizing performance-critical code, understanding where time is spent, and validating optimization efforts. The tool works on any executable without recompilation, though debug symbols (-g) improve output readability.

Basic Profiling

Record and analyze CPU samples:

$ perf record ./myprogram        # record CPU samples
$ perf report                    # interactive report
$ perf report --stdio            # text report

$ perf record -g ./myprogram     # record with call graphs
$ perf report -g                 # show call graph in report

Common record options:

$ perf record -F 99 ./prog       # sample at 99 Hz
$ perf record -p 1234            # attach to running process
$ perf record -a sleep 10        # system-wide for 10 seconds
$ perf record -o out.data ./prog # custom output file

Perf Stat

Get summary statistics without recording samples:

$ perf stat ./myprogram
 Performance counter stats for './myprogram':
      1,234.56 msec task-clock
           123 context-switches
     1,000,000 cycles
       800,000 instructions  #  0.80 insn per cycle
        50,000 cache-misses
        10,000 branch-misses

$ perf stat -e cycles,instructions ./prog   # specific events
$ perf stat -r 5 ./prog                     # run 5 times, show stats
$ perf stat -d ./prog                       # detailed stats

Perf Top

Real-time view of system or process hotspots:

$ perf top                       # system-wide live view
$ perf top -p 1234               # specific process
$ perf top -F 99                 # sample at 99 Hz
$ perf top -g                    # show call graphs
$ perf top -ns comm,dso          # sort by process and library
$ perf top -e cache-misses       # profile cache misses

Useful Events

Profile specific hardware and software events:

# CPU events
$ perf stat -e cycles,instructions,branches,branch-misses ./prog

# Cache events
$ perf stat -e cache-references,cache-misses ./prog
$ perf stat -e L1-dcache-loads,L1-dcache-load-misses ./prog
$ perf stat -e LLC-loads,LLC-load-misses ./prog

# Memory events
$ perf stat -e page-faults,minor-faults,major-faults ./prog

# System calls
$ perf stat -e 'syscalls:sys_enter_*' ./prog

List available events:

$ perf list                      # all events
$ perf list hw                   # hardware events
$ perf list sw                   # software events
$ perf list cache                # cache events
$ perf list tracepoint           # kernel tracepoints

Call Graphs

Understand where time is spent in the call hierarchy:

# Record with frame pointers (compile with -fno-omit-frame-pointer)
$ perf record -g ./myprogram
$ perf report -g

# Record with DWARF unwinding (works without frame pointers)
$ perf record --call-graph dwarf ./myprogram

# Record with LBR (Last Branch Record, Intel CPUs)
$ perf record --call-graph lbr ./myprogram

Flame Graphs

Flame graphs visualize profiling data as interactive SVGs. The x-axis shows stack depth, and width represents time spent. Download FlameGraph tools from https://github.com/brendangregg/FlameGraph.

$ perf record -g ./myprogram
$ perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg

# CPU flame graph
$ perf record -F 99 -g ./prog
$ perf script | stackcollapse-perf.pl | flamegraph.pl > cpu.svg

# Off-CPU flame graph (shows where program waits)
$ perf record -e sched:sched_switch -g ./prog

Annotate Source

See which source lines consume the most cycles:

$ perf record ./myprogram
$ perf annotate                  # interactive annotation
$ perf annotate func_name        # annotate specific function
$ perf annotate --stdio          # text output

Requires debug symbols (-g) and ideally compiled with -fno-omit-frame-pointer.

System-Wide Analysis

Profile the entire system to find bottlenecks:

$ perf top -a                    # live system-wide view
$ perf record -a -g sleep 30     # record system for 30 seconds
$ perf report

# Find which processes use most CPU
$ perf top -ns comm

# Trace context switches
$ perf record -e context-switches -a sleep 10

Tracing

Trace specific events and system calls:

# Trace system calls
$ perf trace ./myprogram
$ perf trace -p 1234             # trace running process

# Trace specific syscalls
$ perf trace -e open,read,write ./prog

# Count syscalls
$ perf stat -e 'syscalls:sys_enter_*' ./prog

Comparing Runs

Compare performance between two runs:

$ perf record -o before.data ./prog_v1
$ perf record -o after.data ./prog_v2
$ perf diff before.data after.data

Common Workflows

Find CPU hotspots:

$ perf record -g ./myprogram
$ perf report
# Look for functions with highest "Overhead" percentage

Diagnose cache performance:

$ perf stat -e cache-references,cache-misses,L1-dcache-load-misses ./prog
# High cache-miss ratio indicates poor memory access patterns

Profile a running server:

$ perf record -p $(pgrep myserver) -g sleep 30
$ perf report

Check if CPU-bound or I/O-bound:

$ perf stat ./myprogram
# Low instructions-per-cycle + high context-switches = I/O bound
# High instructions-per-cycle = CPU bound

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf

Introduction

Basic Profiling

Perf Stat

Perf Top

Useful Events

Call Graphs

Flame Graphs

Annotate Source

System-Wide Analysis

Tracing

Comparing Runs

Common Workflows

Uh oh!

FilesExpand file tree

perf.rst

Latest commit

History

perf.rst

File metadata and controls

Perf

Introduction

Basic Profiling

Perf Stat

Perf Top

Useful Events

Call Graphs

Flame Graphs

Annotate Source

System-Wide Analysis

Tracing

Comparing Runs

Common Workflows