Add a new API to update the min & max thread count dynamically#3658
Conversation
|
Is How would one use this? Could you show an example dummy app using this? |
|
As @dentarg mentioned. Maybe change Re an example, it's not clear how to use this with several workers running (just a quick look)... |
27e8607 to
1a3a682
Compare
|
Thanks for your replies, @dentarg and @MSP-Greg! And apologies for a long delay. I have updated the method name to With regard to how we use this method, we set up a timer that periodically calculates the optimal worker count based on the metrics we collect with the class OurAwesomeGem
...
def start_timer
@timer ||= Thread.new do
loop do
avg_duration, avg_io, avg_gvl = average_metrics_from_gvl_timing
max = calculate_optimal_worker_count(avg_duration, avg_io, avg_gvl)
# This is a `Puma::Server` object. `min` could always be 1, so this is setting `max` only.
@server.update_thread_pool_min_max(max: max)
sleep interval
end
end
end
...
endIn order to grab the # our_awesome_gem/lib/puma/plugin/our_awesome_gem.rb so this could be loaded with
# `--plugin our_awesome_gem`:
Puma::Plugin.create do
def start(launcher)
launcher.config.configure do |user_dsl, *|
if launcher.config.options[:workers] > 0
# Cluster mode:
user_dsl.on_worker_boot do
OurAwesomeGem.start_timer
end
user_dsl.on_thread_start do
OurAwesomeGem.puma_server ||= Puma::Server.current
end
else
# Single mode:
user_dsl.on_booted do
OurAwesomeGem.puma_server = Puma::Server.current
OurAwesomeGem.start_timer
end
end
end
end
endThe second commit in this PR 1a3a682 has been added. Without this the Another thing I noticed was that It seems like Puma's lifecycle events and hooks had been added as people needed them, and there seems to be room for improvement in the API design. I wonder if we could think about how to make these events and hooks more predictable, regardless of cluster/single mode? Issues like sidekiq/sidekiq#6661 have already been reported, so it would be great if I could get some feedback on this. |
05cc3de to
7384c68
Compare
|
Not sure if there was a reason for not doing so, if not, please rebase? I can have a look tomorrow. We may need an 'integration' test for this, especially if a clustered test is done... |
b0a1985 to
16a26f2
Compare
|
Good day. By using the following, each request is sent, the response is generated, and the 'client' reads it. Then the next request is sent. So, it's not filling up the server with a 'backlog' of requests. 10.times { send_http_read_response(GET_11) }The following does create a 'backlog' and the test passes. It creates the requests with a small delay in the app. def test_update_thread_pool_min_max
@app = ->(env) do
sleep 0.001
[200, {}, [env['rack.url_scheme']]]
end
server_run(min_threads: 1, max_threads: 1, auto_trim_time: 1)
assert_equal 1, @server.stats[:running]
@server.update_thread_pool_min_max(min: 3, max: 3)
# By making multiple requests, we can ensure that the pool is filled up to the max.
req_ary = send_http_array 20
resp_ary = req_ary.map { |req| req.read_response }
assert_equal 3, @pool.max
assert_equal 3, @pool.min
assert_equal 3, @server.max_threads
assert_equal 3, @server.min_threads
assert_equal 3, @pool&.spawned
endEDIT: I should probably have more coffee, but if one increases the |
|
@yuki24 and @nateberkopec I pushed the test fix, CI is passing. Question, and this pertains to the PR, and maybe Puma: What happens if a EDIT: also, what happens if |
|
@MSP-Greg Thank you for reviewing! Really appreciated the test fix ❤️
I just merged your suggestion. I also think we may want to raise an exception in both the cases. At the same time, I don't think I fully understand the risk of raising an exception from within a web server. It's probably safer to just warn than an exception. What do you think? |
13a0d74 to
450e9b5
Compare
4841059 to
5ec8349
Compare
There are proposals to make thread counts dynamic RE #3658. If this happens we need to pull in the current value, and cannot rely on it being set once.
* Move cluster logic to its own class Simplify the calculation. This is an alternative to #3723. Puma 6.6.1 behavior documented here https://gist.github.com/schneems/cef38e9448dfb72943d13050a7da0869 This changes the puma 7 behavior to: - Starts sleeping the event loop before being overloaded (the prior `pool.busy_threads` included in the ready @todo queue) and would only fire. This is closer to puma 6.6.1 behavior - Sleeps a consistent proportional amount, closer to puma 7 behavior, versus puma 6 used a static value. - Sleeping behavior is restricted to 2 or more workers, verus puma 7 was accidentally enabled for clusters with only 1 worker Fixes #3740. This `wait_for_less_busy_worker` value is now treated as a maximum value for the sleep calculation. It also now respects a value of 0 to mean "don't sleep at all" which was the prior behavior of that value. * Whitespace * Remove unused method The busy_thread call is now accessed directly through the thread pool and only used in one place. * Add docs about server <-> reactor relationship The code doesn't make it clear who is calling whom when looking at it in isolation. * Make maximum 25x thread count and allow for overage Puma can take in more requests than it has threads. This change preserves the same logic as before, but it doesn't hit maximum sleep until it is at 25x the number of max threads. That means that if a server was using 5 threads, before it would hit 0.005 sleep if those five threads were busy, now if (only) five threads are busy it will sleep 0.0002 seconds. * Move order of comparison If busy threads is zero we don't need to calculate percentage. * Refactor out clamp Because we're clamped at the numerator, the division will naturally tend towards 1.0. We don't need a second clamp on the result. This allows us to remove an intermediate variable and multiply directly on the return calculation. * Lazy max_threads There are proposals to make thread counts dynamic RE #3658. If this happens we need to pull in the current value, and cannot rely on it being set once. * Avoid re-checking the `max_delay` number every iteration * Refactor out branching The equation already holds up the tested properties without needing to special-case when busy threads is zero. Removes a branching conditional. * Push worker logic into delay class The logic of whether or not the calculation should be performed based on worker count can be calculated once and not repeated on every iteration. * Optimize case where no threads are busy Moves the condition from checking if the delay is zero to checking if the input is zero earlier. This is prevents some calculations and preserves the same number of conditionals.
8d75bbe to
cbe7fd0
Compare
cbe7fd0 to
a9da027
Compare
a9da027 to
bc71679
Compare
nateberkopec
left a comment
There was a problem hiding this comment.
Two issues block this change for typical usage: (1) ServerPluginControl is created before server.thread_pool is assigned, so defaulting min/max via @thread_pool raises in before_thread_start; (2) update_thread_pool_min_max rejects min==0, which is the default config.
|
For posterity: codex 5.2 xhigh found those two issues, I watched it verify them manually, and I attest they are correct. |
bc71679 to
1b510ab
Compare
02581a7 to
cf6dcd4
Compare
schneems
left a comment
There was a problem hiding this comment.
No major issues, made some suggestions, please apply before merging or suggest alternatives.
c8ccef5 to
f15cf94
Compare
|
Addressed the comments and rebased against |
Description
@nateberkopec and I are working on something that automatically identifies the optional worker count on the fly and updates the thread count accordingly and dynamically. The algorithm we are actively working on will be proprietary, but we think that it makes sense to add an API to dynamically change the thread count could be part of the open source Puma, hence the PR here.
I have looked into Puma's source code, and noticed that this isn't extremely complicated, as Puma's thread pool already looks at the
minandmaxto not allocate too many threads or trim unused threads dynamically. What this means is that in order to achieve what we want to do, we can simply set a new number tominor/andmax.This PR includes a simple change where
#update_worker_countwas added to do that. The accessors forminandmaxhave also been added to theThreadPool. I know this may seem a bit scary, but that's basically what dynamic update means so I thought it might be fine.Please let me know what you all think.
Your checklist for this pull request
[ci skip]to the title of the PR.#issue" to the PR description or my commit messages.