-
Notifications
You must be signed in to change notification settings - Fork 1.2k
perf: improve signing latency by collecting lazy signatures under lock #6911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
perf: improve signing latency by collecting lazy signatures under lock #6911
Conversation
|
WalkthroughTryRecoverSig in src/llmq/signing_shares.cpp was refactored to reduce the critical-section duration. Under the lock the code now collects CBLSLazySignature wrappers and signer IDs (and an isSingleNode flag) instead of materializing full signatures. After releasing the lock, the lazy signatures are materialized (Get) and used for either single-node or multi-node recovery; the function also returns early if there are insufficient lazy signatures for recovery. Comments were updated to reflect lazy collection and post-lock materialization. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes The changes are localized and follow a consistent pattern (defer expensive BLS/Get work outside the lock) but require careful verification of concurrency semantics, correct handling of single-node vs multi-node branches, and that no race conditions or behavioral regressions were introduced. Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1662e97 to
282ca7d
Compare
| std::vector<CBLSSignature> sigSharesForRecovery; | ||
| sigSharesForRecovery.reserve(lazySignatures.size()); | ||
| for (const auto& lazySig : lazySignatures) { | ||
| sigSharesForRecovery.emplace_back(lazySig.Get()); // Expensive, but outside lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is copy of bls signature indeed that expensive?
Maybe there's a bug in our bls's wrapper which calls "is-valid" or "serialize&deserialize" for each constructor copy?
Just copy of 500bytes are not that expensive; I think we should improve performance of our bls-wrapper rather than do this refactoring... Do I miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a copy; look into the implementation of .Get()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See:
Lines 488 to 508 in a488c8d
| const BLSObject& Get() const | |
| { | |
| std::unique_lock<std::mutex> l(mutex); | |
| static BLSObject invalidObj; | |
| if (!bufValid && !objInitialized) { | |
| return invalidObj; | |
| } | |
| if (!objInitialized) { | |
| obj.SetBytes(vecBytes, bufLegacyScheme); | |
| if (!obj.IsValid()) { | |
| bufValid = false; | |
| return invalidObj; | |
| } | |
| if (!obj.CheckMalleable(vecBytes, bufLegacyScheme)) { | |
| bufValid = false; | |
| return invalidObj; | |
| } | |
| objInitialized = true; | |
| } | |
| return obj; | |
| } |
I'd love ways to increase performance here, but it is quite expensive as is.
|
This pull request has conflicts, please rebase. |
Issue being fixed or feature implemented
see benchmark code here: https://gist.github.com/PastaPastaPasta/01e2e3a22c7dbacf957a2b3222b1457c
LLMQSigShares_LockOccupancy_InsideLLMQSigShares_LockOccupancy_OutsideLLMQSigShares_MaterializeInsideLockLLMQSigShares_MaterializeOutsideLockthis appears to show that calling the .Get inside the lock is very expensive, and that the impact to overall performance may actually improve a few percentage points as well. The main gain here is reducing the time spent inside of signing_shares.cpp cs lock to minimize overall contention.
AI interpretation
Benchmark Interpretation
The benchmarks demonstrate that performing CBLSLazySignature::Get() calls inside the cs lock is extremely expensive, and moving that work outside of the critical section nearly eliminates lock contention:
Interpretation:
Total signing work remains roughly constant, but the time spent holding the lock drops from ~3.8 ms per 64-share recovery to ~0.00015 ms. This drastically reduces contention in signing_shares.cpp, improving throughput and tail latency under concurrent load.
In multi-threaded real-world conditions, this means:
• Fewer stalls on cs
• Faster signature recovery for all participants
• Lower p95/p99 signing latency in high-load LLMQ signing scenarios
What was done?
Collect the pointers inside the lock, perform expensive BLS serialization outside the lock.
How Has This Been Tested?
builds; benched; this isn't super easy to detect in proper integration level benchmarks because only 1 of 15 nodes is the "recovery" member, so contention in this spot kinda gets overshadowed by all the other typical contention over this cs. The data I do have, shows the number of contentions over this lock isn't too bad, but the time spent in contention when there is contention is pretty bad, averaging 500μs (0.5ms).
Breaking Changes
None
Checklist:
Go over all the following points, and put an
xin all the boxes that apply.