Skip to content

JRuby HTTP parser improvements#3838

Merged
MSP-Greg merged 13 commits intopuma:mainfrom
headius:direct_parsing
Jan 31, 2026
Merged

JRuby HTTP parser improvements#3838
MSP-Greg merged 13 commits intopuma:mainfrom
headius:direct_parsing

Conversation

@headius
Copy link
Copy Markdown
Contributor

@headius headius commented Dec 13, 2025

This PR will contain improvements for the HTTP parser implementation for JRuby.

  • Pass bytes from the request handler through the parser loop and leaf methods, only copying when necessary.
  • Pre-allocate static header keys and avoid constructing them again while preparing the request hash.
  • Perfect hash for retrieving pre-allocated header keys from incoming bytes.
  • Refactor parser classes to reduce pointer dereferences
  • Eliminate dead code
  • Intern unusual headers for future calls using zero-allocation "fake string" logic

Contribution checklist:

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

As mentioned in the deleted comment, the parser should be able to
operate directly against the incoming bytes, rather than making a
copy. This commit does so, parsing directly against the incoming
bytes and passing them to the leaf methods, which make their own
copies of the fields and values as appropriate.
@headius
Copy link
Copy Markdown
Contributor Author

headius commented Dec 13, 2025

The first improvement, passing the read bytes through the parser without copying, is mostly done. I'm not quite clear on the life cycle of that buffer string, so my patch still makes defensive copies of the header values before putting them in the request hash. I have not looked at the C code to see if it is sharing the buffer contents with those strings. If anyone has insight here, I would appreciate it.

Next phase will be to pre-allocate the static header keys, similar to the C code, since the Java extension is currently recreating them every time.

I'm not sure I can keep these aggregate changes to less than 100 lines of modifications, so please let me know if that will be a problem for reviewing.

@headius
Copy link
Copy Markdown
Contributor Author

headius commented Dec 13, 2025

Related to preallocating the header strings: #3825

I wonder what other optimizations over the years have only been applied to the C code. 🤔

@headius
Copy link
Copy Markdown
Contributor Author

headius commented Dec 14, 2025

I don't love the linear search for HTTP envs but it matches the C ext now.

@nateberkopec
Copy link
Copy Markdown
Member

Legend! Thank you @headius

Working to get away from storing another reference to the runtime.
While most headers are either pre-allocated or cached as fstrings,
header values and query strings must be allocated new for each
request. By marking the incoming buffer as shared and allocating
other strings as shared views into that buffer's bytes, we can
avoid copying those bytes again for each string as is done in
CRuby.

Marking the original buffer as shared does mean the request logic
will be forced to create a new ByteList and byte[] if further data
is read, but this is no worse (and probably better) than makiong N
copied slices of the buffer while parsing the request.

Because of this copy-on-write behavior, the entirei incoming buffer
will be held in memory for the duration of the request, but most of
that data would be in memory until processed anyway, potentially
with a larger actual heap size than just a single large shared
buffer.
This removes the linear search for matching field names and
replaces it with a perfect hash based on the standard HTTP/1.1
field names (plus some additions) and the supported CGI variables.

The array of pre-allocated strings is passed through the parser
from Http11.

The EnvKey enum moves to its own top-level class, and now contains
a gperf-generated perfect hash function to quickly find or reject
incoming fields.

Along with the EnvKey changes, the snake-upcasing of incoming
fields is done lazily, and never writes to the original buffer.
I have reverted a previous commit and restored the JRuby-specific
assertions related to this change, and I believe the direct
modification of the read buffer is a bug, albeit not a very
important one since it only affects error messages for malformed
HTTP requests.
@headius
Copy link
Copy Markdown
Contributor Author

headius commented Dec 17, 2025

I've completed most of the items I found. Performance is only marginally better, probably because the code generated by Ragel is just bad (I've run into similar performance limitations with the json library), but the memory throughput should be drastically reduced. A benchmark like the one linked from #3825 now only allocates objects for the benchmark itself, unusual headers, and values from the request (other than objects required for JRuby internals).

Benchmark were run with a patched JRuby 10.1 that does not raise internal exceptions to wake a ConditionVariable and uses fast fixnum hashing. Numbers are peak performance seen after warmup stabilization.

Performance before this PR:

jruby 10.1.0.0-SNAPSHOT (3.4.5) 2025-12-15 8c2eca74b5 OpenJDK 64-Bit Server VM 25+36-LTS on 25+36-LTS +indy +jit [arm64-darwin]
Warming up --------------------------------------
               parse     7.103k i/100ms
Calculating -------------------------------------
               parse     70.925k (± 0.6%) i/s   (14.10 μs/i) -    355.150k in   5.007611s

After:

jruby 10.1.0.0-SNAPSHOT (3.4.5) 2025-12-15 8c2eca74b5 OpenJDK 64-Bit Server VM 25+36-LTS on 25+36-LTS +indy +jit [arm64-darwin]
Warming up --------------------------------------
               parse     7.401k i/100ms
Calculating -------------------------------------
               parse     73.935k (± 1.1%) i/s   (13.53 μs/i) -    370.050k in   5.005638s

The final item – using a "fake string" to cache and lookup additional header fstrings – is trickier to do in JRuby than CRuby, as the fstring cache does not currently provide a way to perform a lookup of fstrings using only an array of bytes, and any temporary strings used for this purpose must be isolated across threads. If the Http11 object is only ever used in a single thread, such a temporary string could be constructed there at the cost of an additional string per request. This may be worth it to avoid constantly creating new strings for the remaining less-common headers, such as those used in the #3825 benchmark.

@headius
Copy link
Copy Markdown
Contributor Author

headius commented Dec 17, 2025

An experiment with interning all incoming headers yielded a small gain (maybe) but I'm not sure it's worth the exposure. Given a large enough request with bogus headers, memory could be made to increase rapidly. I believe this behavior of Puma may be exposing an issue with the fstring cache in CRuby 4.0, as it produces very erratic numbers for the request parsing benchmark:

ruby 4.0.0dev (2025-11-18T23:50:36Z master 32b8f97b34) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               parse    27.569k i/100ms
Calculating -------------------------------------
               parse    201.399k (±12.0%) i/s    (4.97 μs/i) -      1.020M in   5.142527s
ruby 4.0.0dev (2025-11-18T23:50:36Z master 32b8f97b34) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               parse    15.937k i/100ms
Calculating -------------------------------------
               parse    118.456k (±12.5%) i/s    (8.44 μs/i) -    589.669k in   5.075580s
ruby 4.0.0dev (2025-11-18T23:50:36Z master 32b8f97b34) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               parse    10.524k i/100ms
Calculating -------------------------------------
               parse     90.644k (± 9.4%) i/s   (11.03 μs/i) -    452.532k in   5.054908s
ruby 4.0.0dev (2025-11-18T23:50:36Z master 32b8f97b34) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               parse     8.775k i/100ms
Calculating -------------------------------------
               parse     83.366k (± 2.0%) i/s   (12.00 μs/i) -    421.200k in   5.054538s
ruby 4.0.0dev (2025-11-18T23:50:36Z master 32b8f97b34) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               parse     8.055k i/100ms
Calculating -------------------------------------
               parse     72.597k (± 4.0%) i/s   (13.77 μs/i) -    362.475k in   5.001894s
ruby 4.0.0dev (2025-11-18T23:50:36Z master 32b8f97b34) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               parse     6.705k i/100ms
Calculating -------------------------------------
               parse    183.568k (±60.6%) i/s    (5.45 μs/i) -    549.810k in   5.004438s

I think I'm going to exclude that feature from the JRuby extension for now.

This is probably all I can do for the Ragel-based parser extension on JRuby. It's probably worth looking into a non-Ragel solution, especially for HTTP/1.1 which is quite easy to parse.

@headius
Copy link
Copy Markdown
Contributor Author

headius commented Dec 18, 2025

@nateberkopec @schneems I'm not sure why that one JRuby timed out, nor why those unrelated jobs have been consistently failing.

The CRuby jobs have been failing since the first commit, and I did not touch any code used by the CRuby version of the gem. Is it some oddity with PR jobs?

I'll try to reproduce the hang locally but everything else passes and I don't know why my changes would have caused that sort of failure.

headius added a commit to headius/jruby that referenced this pull request Dec 18, 2025
Previously the long form of the FString was cached and returned
by longHashCode, but that value is actually just an int and can
also be returned by hashCode. We can also pre-allocate and cache
the fixnum version of the hash so it does not have to be created
each time it is used from Ruby.

Found while working on performance optimizations for Puma in
puma/puma#3838.
headius added a commit to headius/jruby that referenced this pull request Dec 18, 2025
When creating an FString for the first time, we should never trust
that the incoming byte[] or ByteList are now ours to own. The
caller might still have a reference to them that they continue to
modify, resulting in a zombie FString that can't be found or that
has incorrect contents.

This came up while trying to implement the "fake string"
optimization for uncached headers in puma/puma#3838. Holding a
reference to a RubyString and its ByteList that could be updated
in-place and then used to caches new headers led to those cached
FStrings being modified directly.
@MSP-Greg
Copy link
Copy Markdown
Member

The CRuby jobs have been failing since the first commit, and I did not touch any code used by the CRuby version of the gem. Is it some oddity with PR jobs?

Disregard the CRuby 'head' jobs. I hoped to have time for a better look at the issue. In the meantime, I'll post a fix so CI passes.

@MSP-Greg
Copy link
Copy Markdown
Member

@headius

Well, I didn't turn out quite as I hoped, but the CI tests seem to be much better now...

@headius headius marked this pull request as ready for review January 12, 2026 18:41
@headius
Copy link
Copy Markdown
Contributor Author

headius commented Jan 12, 2026

I've merged in recent changes from main to clean up CI. I'm satisfied with this round of changes and it generally performs much better on JRuby now than on CRuby.

@headius
Copy link
Copy Markdown
Contributor Author

headius commented Jan 12, 2026

CI is green except for two truffleruby-head builds that are unrelated to my changes.

@github-actions github-actions Bot added the waiting-for-review Waiting on review from anyone label Jan 12, 2026
@MSP-Greg MSP-Greg merged commit dc947d9 into puma:main Jan 31, 2026
69 of 71 checks passed
@nateberkopec
Copy link
Copy Markdown
Member

Thanks again @headius

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants