Skip to content

Perf: Cache downcased header key in str_headers#3874

Merged
nateberkopec merged 1 commit intopuma:mainfrom
hadrienblanc:perf/cache-header-key-downcase
Jan 28, 2026
Merged

Perf: Cache downcased header key in str_headers#3874
nateberkopec merged 1 commit intopuma:mainfrom
hadrienblanc:perf/cache-header-key-downcase

Conversation

@hadrienblanc
Copy link
Copy Markdown
Contributor

@hadrienblanc hadrienblanc commented Jan 27, 2026

Description

What original problem led to this PR?

PR #3704 added k.downcase calls in str_headers for Rack 3 compliance.
However, since we already call k.downcase in the case statement, we now
call it twice per header instead of once.

Are there related issues / prior discussions?

Follow-up optimization to PR #3704 (merged in v7.0.0) and issue #3250.

What alternatives have been tried?

Considered naming alternatives (k_lwr, lower_k) but settled on key
for readability.

Why do you make the choices you did? What are the tradeoffs?

  • Use a local variable key to save the downcase result
  • Zero behavior change
  • Instead of calling downcase twice per header, we call it once. I did a GC.stat measurement and a micro-benchmark below.

Benchmark

Memory allocations

# 10 headers × 1000 iterations
GC.stat[:total_allocated_objects]

BEFORE: 20,004 allocations
AFTER:  10,004 allocations
GAIN:   10,000 allocations (50%)

Each String#downcase call creates a new String object. With ~10 headers
per response, this saves ~10 String allocations per request.

Micro-benchmark

                         user     system      total        real
BEFORE (2 downcase)  0.595148   0.003618   0.598766 (  0.674s)
AFTER  (1 downcase)  0.442885   0.002580   0.445465 (  0.490s)

In my runs, it was faster on the header loop (~27% in this instance), but I
wouldn't draw strong conclusions in terms of exact metrics.


Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • All new and existing tests passed, including Rubocop.

Avoid calling k.downcase twice per header by caching the result.
Reduces String allocations by 50% in the header loop.

Follow-up to puma#3704.
@github-actions github-actions Bot added the waiting-for-review Waiting on review from anyone label Jan 27, 2026
@nateberkopec
Copy link
Copy Markdown
Member

Hello, this is Codex speaking.

Benchmarks (Docker):

  • Built from tools/Dockerfile (ruby 4.0.1 aarch64, bundle exec rake compile)
  • Installed wrk in-container
  • Script: benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp (wrk -t8 -c16 -d10s)
  • Note: container-local patch to avoid TypeError in benchmarks/local/response_time_wrk.rb (replace @threads[/\d+\z/] || 5 with @threads || 5). No repo changes.

Avg RPS (mean across array/chunk/string/io)

Size(kB)   main avg   PR avg   delta
1          26,255     26,839   +2.2%
10         25,838     26,783   +3.7%
100        25,765     26,664   +3.5%
256        26,360     26,509   +0.6%
512        26,349     26,379   +0.1%
1024       26,053     26,478   +1.6%
2048       25,727     26,755   +4.0%

Latency notes

  • 50% latency slightly lower in most rows (~0.01–0.03ms)
  • One outlier: 256kB chunked 100% latency spike in PR run (31.88ms vs 5.57ms on main); other percentiles looked stable

@nateberkopec
Copy link
Copy Markdown
Member

Thanks for the optimization and the detailed context here!

@nateberkopec nateberkopec merged commit b7365d3 into puma:main Jan 28, 2026
85 checks passed
@nateberkopec
Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants