Skip to content

Conversation

@PastaPastaPasta
Copy link
Member

@PastaPastaPasta PastaPastaPasta commented Oct 21, 2025

Issue being fixed or feature implemented

see benchmark code here: https://gist.github.com/PastaPastaPasta/01e2e3a22c7dbacf957a2b3222b1457c

ns/sigs sigs/s err% total benchmark
59,266.86 16,872.84 0.0% 0.76 LLMQSigShares_LockOccupancy_Inside
2.43 411,787,414.75 0.0% 0.00 LLMQSigShares_LockOccupancy_Outside
60,162.66 16,621.60 0.0% 0.77 LLMQSigShares_MaterializeInsideLock
58,093.54 17,213.62 0.0% 0.74 LLMQSigShares_MaterializeOutsideLock

this appears to show that calling the .Get inside the lock is very expensive, and that the impact to overall performance may actually improve a few percentage points as well. The main gain here is reducing the time spent inside of signing_shares.cpp cs lock to minimize overall contention.

AI interpretation

Benchmark Interpretation

The benchmarks demonstrate that performing CBLSLazySignature::Get() calls inside the cs lock is extremely expensive, and moving that work outside of the critical section nearly eliminates lock contention:

Metric Inside Lock Outside Lock Improvement
Lock occupancy (ns/sig) 59,266.86 2.43 ≈ 24,000× reduction
Total time per sig (ns/sig) 60,162.66 58,093.54 ~3–4% faster overall

Interpretation:
Total signing work remains roughly constant, but the time spent holding the lock drops from ~3.8 ms per 64-share recovery to ~0.00015 ms. This drastically reduces contention in signing_shares.cpp, improving throughput and tail latency under concurrent load.

In multi-threaded real-world conditions, this means:
• Fewer stalls on cs
• Faster signature recovery for all participants
• Lower p95/p99 signing latency in high-load LLMQ signing scenarios

What was done?

Collect the pointers inside the lock, perform expensive BLS serialization outside the lock.

How Has This Been Tested?

builds; benched; this isn't super easy to detect in proper integration level benchmarks because only 1 of 15 nodes is the "recovery" member, so contention in this spot kinda gets overshadowed by all the other typical contention over this cs. The data I do have, shows the number of contentions over this lock isn't too bad, but the time spent in contention when there is contention is pretty bad, averaging 500μs (0.5ms).

Breaking Changes

None

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@PastaPastaPasta PastaPastaPasta added this to the 23.1 milestone Oct 21, 2025
@github-actions
Copy link

github-actions bot commented Oct 21, 2025

⚠️ Potential Merge Conflicts Detected

This PR has potential conflicts with the following open PRs:

Please coordinate with the authors of these PRs to avoid merge conflicts.

@coderabbitai
Copy link

coderabbitai bot commented Oct 21, 2025

Walkthrough

TryRecoverSig in src/llmq/signing_shares.cpp was refactored to reduce the critical-section duration. Under the lock the code now collects CBLSLazySignature wrappers and signer IDs (and an isSingleNode flag) instead of materializing full signatures. After releasing the lock, the lazy signatures are materialized (Get) and used for either single-node or multi-node recovery; the function also returns early if there are insufficient lazy signatures for recovery. Comments were updated to reflect lazy collection and post-lock materialization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes are localized and follow a consistent pattern (defer expensive BLS/Get work outside the lock) but require careful verification of concurrency semantics, correct handling of single-node vs multi-node branches, and that no race conditions or behavioral regressions were introduced.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "perf: improve signing latency by collecting lazy signatures under lock" directly aligns with the main change described in the raw summary. The title clearly identifies the type of change (performance improvement), the benefit (improved signing latency), and the key mechanism (collecting lazy signatures under lock). It is specific and descriptive without being verbose, accurately reflecting the core optimization: moving expensive BLS signature materialization outside the critical section to reduce lock contention in signing_shares.cpp.
Description Check ✅ Passed The PR description is directly related to the changeset and provides substantial context for the optimization. It explains the performance problem with benchmark data, describes the solution approach (collect pointers under lock, materialize outside lock), details the testing methodology, and confirms no breaking changes. The description connects the technical change to the measured performance impact and addresses both the why and how of the modification, making it clear and relevant to the actual code changes in signing_shares.cpp.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1662e97 and 282ca7d.

📒 Files selected for processing (1)
  • src/llmq/signing_shares.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/llmq/signing_shares.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: mac-build / Build source
  • GitHub Check: win64-build / Build source
  • GitHub Check: linux64_nowallet-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: linux64-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64_tsan-build / Build source
  • GitHub Check: arm-linux-build / Build source
  • GitHub Check: Lint / Run linters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

std::vector<CBLSSignature> sigSharesForRecovery;
sigSharesForRecovery.reserve(lazySignatures.size());
for (const auto& lazySig : lazySignatures) {
sigSharesForRecovery.emplace_back(lazySig.Get()); // Expensive, but outside lock
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is copy of bls signature indeed that expensive?

Maybe there's a bug in our bls's wrapper which calls "is-valid" or "serialize&deserialize" for each constructor copy?

Just copy of 500bytes are not that expensive; I think we should improve performance of our bls-wrapper rather than do this refactoring... Do I miss anything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a copy; look into the implementation of .Get()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See:

dash/src/bls/bls.h

Lines 488 to 508 in a488c8d

const BLSObject& Get() const
{
std::unique_lock<std::mutex> l(mutex);
static BLSObject invalidObj;
if (!bufValid && !objInitialized) {
return invalidObj;
}
if (!objInitialized) {
obj.SetBytes(vecBytes, bufLegacyScheme);
if (!obj.IsValid()) {
bufValid = false;
return invalidObj;
}
if (!obj.CheckMalleable(vecBytes, bufLegacyScheme)) {
bufValid = false;
return invalidObj;
}
objInitialized = true;
}
return obj;
}

I'd love ways to increase performance here, but it is quite expensive as is.

@github-actions
Copy link

This pull request has conflicts, please rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants