Conversation
As mentioned in the deleted comment, the parser should be able to operate directly against the incoming bytes, rather than making a copy. This commit does so, parsing directly against the incoming bytes and passing them to the leaf methods, which make their own copies of the fields and values as appropriate.
|
The first improvement, passing the read bytes through the parser without copying, is mostly done. I'm not quite clear on the life cycle of that buffer string, so my patch still makes defensive copies of the header values before putting them in the request hash. I have not looked at the C code to see if it is sharing the buffer contents with those strings. If anyone has insight here, I would appreciate it. Next phase will be to pre-allocate the static header keys, similar to the C code, since the Java extension is currently recreating them every time. I'm not sure I can keep these aggregate changes to less than 100 lines of modifications, so please let me know if that will be a problem for reviewing. |
|
Related to preallocating the header strings: #3825 I wonder what other optimizations over the years have only been applied to the C code. 🤔 |
|
I don't love the linear search for HTTP envs but it matches the C ext now. |
|
Legend! Thank you @headius |
Working to get away from storing another reference to the runtime.
While most headers are either pre-allocated or cached as fstrings, header values and query strings must be allocated new for each request. By marking the incoming buffer as shared and allocating other strings as shared views into that buffer's bytes, we can avoid copying those bytes again for each string as is done in CRuby. Marking the original buffer as shared does mean the request logic will be forced to create a new ByteList and byte[] if further data is read, but this is no worse (and probably better) than makiong N copied slices of the buffer while parsing the request. Because of this copy-on-write behavior, the entirei incoming buffer will be held in memory for the duration of the request, but most of that data would be in memory until processed anyway, potentially with a larger actual heap size than just a single large shared buffer.
This removes the linear search for matching field names and replaces it with a perfect hash based on the standard HTTP/1.1 field names (plus some additions) and the supported CGI variables. The array of pre-allocated strings is passed through the parser from Http11. The EnvKey enum moves to its own top-level class, and now contains a gperf-generated perfect hash function to quickly find or reject incoming fields. Along with the EnvKey changes, the snake-upcasing of incoming fields is done lazily, and never writes to the original buffer. I have reverted a previous commit and restored the JRuby-specific assertions related to this change, and I believe the direct modification of the read buffer is a bug, albeit not a very important one since it only affects error messages for malformed HTTP requests.
63ccfec to
356a7a7
Compare
|
I've completed most of the items I found. Performance is only marginally better, probably because the code generated by Ragel is just bad (I've run into similar performance limitations with the json library), but the memory throughput should be drastically reduced. A benchmark like the one linked from #3825 now only allocates objects for the benchmark itself, unusual headers, and values from the request (other than objects required for JRuby internals). Benchmark were run with a patched JRuby 10.1 that does not raise internal exceptions to wake a ConditionVariable and uses fast fixnum hashing. Numbers are peak performance seen after warmup stabilization. Performance before this PR: After: The final item – using a "fake string" to cache and lookup additional header fstrings – is trickier to do in JRuby than CRuby, as the fstring cache does not currently provide a way to perform a lookup of fstrings using only an array of bytes, and any temporary strings used for this purpose must be isolated across threads. If the Http11 object is only ever used in a single thread, such a temporary string could be constructed there at the cost of an additional string per request. This may be worth it to avoid constantly creating new strings for the remaining less-common headers, such as those used in the #3825 benchmark. |
|
An experiment with interning all incoming headers yielded a small gain (maybe) but I'm not sure it's worth the exposure. Given a large enough request with bogus headers, memory could be made to increase rapidly. I believe this behavior of Puma may be exposing an issue with the fstring cache in CRuby 4.0, as it produces very erratic numbers for the request parsing benchmark: I think I'm going to exclude that feature from the JRuby extension for now. This is probably all I can do for the Ragel-based parser extension on JRuby. It's probably worth looking into a non-Ragel solution, especially for HTTP/1.1 which is quite easy to parse. |
|
@nateberkopec @schneems I'm not sure why that one JRuby timed out, nor why those unrelated jobs have been consistently failing. The CRuby jobs have been failing since the first commit, and I did not touch any code used by the CRuby version of the gem. Is it some oddity with PR jobs? I'll try to reproduce the hang locally but everything else passes and I don't know why my changes would have caused that sort of failure. |
Previously the long form of the FString was cached and returned by longHashCode, but that value is actually just an int and can also be returned by hashCode. We can also pre-allocate and cache the fixnum version of the hash so it does not have to be created each time it is used from Ruby. Found while working on performance optimizations for Puma in puma/puma#3838.
When creating an FString for the first time, we should never trust that the incoming byte[] or ByteList are now ours to own. The caller might still have a reference to them that they continue to modify, resulting in a zombie FString that can't be found or that has incorrect contents. This came up while trying to implement the "fake string" optimization for uncached headers in puma/puma#3838. Holding a reference to a RubyString and its ByteList that could be updated in-place and then used to caches new headers led to those cached FStrings being modified directly.
Disregard the CRuby 'head' jobs. I hoped to have time for a better look at the issue. In the meantime, I'll post a fix so CI passes. |
|
Well, I didn't turn out quite as I hoped, but the CI tests seem to be much better now... |
|
I've merged in recent changes from main to clean up CI. I'm satisfied with this round of changes and it generally performs much better on JRuby now than on CRuby. |
|
CI is green except for two truffleruby-head builds that are unrelated to my changes. |
|
Thanks again @headius |
This PR will contain improvements for the HTTP parser implementation for JRuby.
Intern unusual headers for future calls using zero-allocation "fake string" logicContribution checklist:
[ci skip]to the title of the PR.#issue" to the PR description or my commit messages.