perf: improve signing latency by collecting lazy signatures under lock#6911
perf: improve signing latency by collecting lazy signatures under lock#6911PastaPastaPasta wants to merge 1 commit intodashpay:developfrom
Conversation
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
WalkthroughTryRecoverSig in src/llmq/signing_shares.cpp was refactored to reduce the critical-section duration. Under the lock the code now collects CBLSLazySignature wrappers and signer IDs (and an isSingleNode flag) instead of materializing full signatures. After releasing the lock, the lazy signatures are materialized (Get) and used for either single-node or multi-node recovery; the function also returns early if there are insufficient lazy signatures for recovery. Comments were updated to reflect lazy collection and post-lock materialization. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes The changes are localized and follow a consistent pattern (defer expensive BLS/Get work outside the lock) but require careful verification of concurrency semantics, correct handling of single-node vs multi-node branches, and that no race conditions or behavioral regressions were introduced. Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📥 CommitsReviewing files that changed from the base of the PR and between 1662e976d0734645cddc5b9f789b18cf5c98c919 and 282ca7d. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1662e97 to
282ca7d
Compare
| std::vector<CBLSSignature> sigSharesForRecovery; | ||
| sigSharesForRecovery.reserve(lazySignatures.size()); | ||
| for (const auto& lazySig : lazySignatures) { | ||
| sigSharesForRecovery.emplace_back(lazySig.Get()); // Expensive, but outside lock |
There was a problem hiding this comment.
Is copy of bls signature indeed that expensive?
Maybe there's a bug in our bls's wrapper which calls "is-valid" or "serialize&deserialize" for each constructor copy?
Just copy of 500bytes are not that expensive; I think we should improve performance of our bls-wrapper rather than do this refactoring... Do I miss anything?
There was a problem hiding this comment.
It's not a copy; look into the implementation of .Get()
There was a problem hiding this comment.
See:
Lines 488 to 508 in a488c8d
I'd love ways to increase performance here, but it is quite expensive as is.
|
This pull request has conflicts, please rebase. |
282ca7d to
8922fb1
Compare
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Sound performance optimization. Both agents correctly confirmed no correctness, race, or consensus-safety issues. The two substantive findings are minor naming/comment clarity nitpicks on newly introduced code.
Reviewed commit: 8922fb1
💬 2 nitpick(s)
| // Collect lazy signatures (cheap copy) under lock, then materialize outside lock | ||
| std::vector<CBLSLazySignature> lazySignatures; | ||
| std::vector<CBLSId> idsForRecovery; | ||
| bool isSingleNode = false; |
There was a problem hiding this comment.
💬 Nitpick: isSingleNode should be isSingleMember for consistency
The codebase consistently uses 'member' terminology: is_single_member() (params.h:124), getQuorumMember(), quorum.members. The new variable isSingleNode breaks this convention. The pre-existing log strings ("single-node signature") use 'node' but those predate this PR; the new variable should align with the API it guards (is_single_member()).
💡 Suggested change
| bool isSingleNode = false; | |
| bool isSingleMember = false; | |
| // ... (also update line 622: isSingleMember = true; and line 644: if (isSingleMember)) |
source: ['claude']
| } | ||
|
|
||
| std::vector<CBLSSignature> sigSharesForRecovery; | ||
| // Collect lazy signatures (cheap copy) under lock, then materialize outside lock |
There was a problem hiding this comment.
💬 Nitpick: "cheap copy" is imprecise — state what it avoids
CBLSLazyWrapper's copy constructor acquires a mutex on the source and deep-copies a 96-byte array plus potentially a full bls::G2Element (if already materialized). It IS much cheaper than Get() on the hot path (unmaterialized objects skip BLS point decompression), but "cheap" without qualification could mislead a reader into thinking it's trivial. The same comment pattern appears at lines 620 and 633.
💡 Suggested change
| // Collect lazy signatures (cheap copy) under lock, then materialize outside lock | |
| // Collect lazy signatures (avoids BLS deserialization) under lock, then materialize outside lock |
source: ['claude']
| idsForRecovery.emplace_back(quorum.members[sigShare.getQuorumMember()]->proTxHash); | ||
| } | ||
| // Materialize signatures outside the critical section (expensive BLS operations) | ||
| if (isSingleNode) { |
There was a problem hiding this comment.
| if (isSingleNode) { | |
| if (quorum.params.is_single_member()) { |
and drop isSingleNode
Issue being fixed or feature implemented
see benchmark code here: https://gist.github.com/PastaPastaPasta/01e2e3a22c7dbacf957a2b3222b1457c
LLMQSigShares_LockOccupancy_InsideLLMQSigShares_LockOccupancy_OutsideLLMQSigShares_MaterializeInsideLockLLMQSigShares_MaterializeOutsideLockthis appears to show that calling the .Get inside the lock is very expensive, and that the impact to overall performance may actually improve a few percentage points as well. The main gain here is reducing the time spent inside of signing_shares.cpp cs lock to minimize overall contention.
AI interpretation
Benchmark Interpretation
The benchmarks demonstrate that performing CBLSLazySignature::Get() calls inside the cs lock is extremely expensive, and moving that work outside of the critical section nearly eliminates lock contention:
Interpretation:
Total signing work remains roughly constant, but the time spent holding the lock drops from ~3.8 ms per 64-share recovery to ~0.00015 ms. This drastically reduces contention in signing_shares.cpp, improving throughput and tail latency under concurrent load.
In multi-threaded real-world conditions, this means:
• Fewer stalls on cs
• Faster signature recovery for all participants
• Lower p95/p99 signing latency in high-load LLMQ signing scenarios
What was done?
Collect the pointers inside the lock, perform expensive BLS serialization outside the lock.
How Has This Been Tested?
builds; benched; this isn't super easy to detect in proper integration level benchmarks because only 1 of 15 nodes is the "recovery" member, so contention in this spot kinda gets overshadowed by all the other typical contention over this cs. The data I do have, shows the number of contentions over this lock isn't too bad, but the time spent in contention when there is contention is pretty bad, averaging 500μs (0.5ms).
Breaking Changes
None
Checklist:
Go over all the following points, and put an
xin all the boxes that apply.