Skip to content

perf: using asynchronous worker to validate BLS signatures in quorum commitments #6692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

knst
Copy link
Collaborator

@knst knst commented May 28, 2025

Issue being fixed or feature implemented

During blocks validation, quorum commitments are processed in single thread.
While some blocks have up to 32 commitments (blocks which have rotation quorums commitment), each quorum commitment have hundreds public keys to validate and 2 signature (quorum signature and public key signature). It takes up to 30% in total indexing time and up to 1 second for heavy blocks.

What was done?

CCheckQueue which is used for validation ECDSA signatures is used now for validation of BLS signatures in quorum commitments.
Quorum signature and members signatures are validated simultaneously now which makes performance improvement even for blocks which has only 1 commitment.

How Has This Been Tested?

Invalidated + reconsidered 15k blocks (~1 months worth)
This PR makes validation of Quorum Commitment 3.5x times faster; overall indexing 25% faster on my 12 cores environment.

PR:
image

2025-05-28T10:17:56Z [bench]         - m_qblockman: 0.03ms [28.90s]
2025-05-28T10:17:56Z [bench]   - Connect total: 9.01ms [184.16s (11.86ms/blk)]
2025-05-28T10:17:56Z [bench] - Connect block: 9.21ms [190.33s (12.25ms/blk)]

develop:
image

2025-05-22T18:39:44Z [bench]         - m_qblockman: 0.03ms [96.90s]
2025-05-22T18:39:44Z [bench]   - Connect total: 9.31ms [252.80s (16.28ms/blk)]
2025-05-22T18:39:44Z [bench] - Connect block: 9.50ms [258.90s (16.67ms/blk)]

Breaking Changes

N/A

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone

@knst knst force-pushed the perf-bls-parallel branch from ec3a9d4 to 8b6414d Compare May 28, 2025 11:53
Copy link

coderabbitai bot commented May 28, 2025

Warning

Rate limit exceeded

@knst has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 56 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 9a6385b and 6906aa2.

📒 Files selected for processing (1)
  • src/llmq/utils.cpp (1 hunks)

Walkthrough

The changes introduce asynchronous BLS signature verification for quorum commitments by adding a BlsCheck struct that encapsulates signature verification data and logic. A parallelized check queue (m_bls_queue) is added to the CQuorumBlockProcessor class to process BLS signature checks concurrently, with worker threads managed in the constructor and destructor. The signature verification logic is refactored into a new method, VerifySignatureAsync, in the CFinalCommitment class, supporting both synchronous and asynchronous modes. A new command-line argument -parbls is introduced to configure the number of BLS verification threads. These updates separate signature verification from commitment processing and centralize verification logic for improved concurrency.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/llmq/blockprocessor.cpp (1)

208-220: Consider adding error logging when signature verification fails.

The asynchronous signature verification implementation is correct and follows proper patterns. However, when queue_control.Wait() returns false, it would be helpful to log which commitment(s) failed verification for debugging purposes.

Consider adding a log message before returning false:

 if (!queue_control.Wait()) {
     // at least one check failed
+    LogPrintf("[ProcessBlock] BLS signature verification failed for block at height %d\n", pindex->nHeight);
     return false;
 }
src/llmq/commitment.cpp (1)

31-95: Consider refactoring for improved modularity.

The implementation correctly handles both synchronous and asynchronous BLS signature verification. However, the method is quite long (64 lines) and handles multiple responsibilities.

Consider extracting helper methods to improve readability and maintainability:

bool CFinalCommitment::VerifySignatureAsync(CDeterministicMNManager& dmnman, CQuorumSnapshotManager& qsnapman,
                                           gsl::not_null<const CBlockIndex*> pQuorumBaseBlockIndex,
                                           CCheckQueueControl<utils::BlsCheck>* queue_control) const
{
    auto members = utils::GetAllQuorumMembers(llmqType, dmnman, qsnapman, pQuorumBaseBlockIndex);
    const auto& llmq_params_opt = Params().GetLLMQ(llmqType);
    if (!llmq_params_opt.has_value()) {
        LogPrint(BCLog::LLMQ, "CFinalCommitment -- q[%s] invalid llmqType=%d\n", quorumHash.ToString(),
                 ToUnderlying(llmqType));
        return false;
    }
    const auto& llmq_params = llmq_params_opt.value();

    uint256 commitmentHash = BuildCommitmentHash(llmq_params.type, quorumHash, validMembers, quorumPublicKey,
                                                 quorumVvecHash);
    LogMemberDetails(members, commitmentHash);

    if (!VerifyMemberSignatures(llmq_params, members, commitmentHash, queue_control)) {
        return false;
    }

    if (!VerifyQuorumSignature(commitmentHash, queue_control)) {
        return false;
    }

    return true;
}

private:
bool CFinalCommitment::VerifyMemberSignatures(const Consensus::LLMQParams& llmq_params,
                                             const std::vector<CDeterministicMNCPtr>& members,
                                             const uint256& commitmentHash,
                                             CCheckQueueControl<utils::BlsCheck>* queue_control) const
{
    // Implementation for member signature verification
}

bool CFinalCommitment::VerifyQuorumSignature(const uint256& commitmentHash,
                                            CCheckQueueControl<utils::BlsCheck>* queue_control) const
{
    // Implementation for quorum signature verification
}
🛑 Comments failed to post (1)
src/llmq/commitment.h (1)

34-37: ⚠️ Potential issue

Fix namespace formatting issue identified by pipeline.

The nested namespace declaration needs proper formatting according to clang-format requirements.

Apply this diff to fix the formatting:

-namespace utils
-{
-struct BlsCheck;
-} // namespace utils
+namespace utils {
+struct BlsCheck;
+} // namespace utils
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

namespace utils {
struct BlsCheck;
} // namespace utils
🤖 Prompt for AI Agents
In src/llmq/commitment.h around lines 34 to 37, the nested namespace declaration
for utils is not properly formatted according to clang-format standards. Adjust
the namespace declaration to use the correct syntax and indentation style,
ensuring the opening and closing braces align properly and the comment
indicating the namespace closure is correctly placed.

@knst
Copy link
Collaborator Author

knst commented May 28, 2025

just to be sure that CCheckQueue works at all (and not just return true) I modified code a bit and it failed as expected:

diff --git a/src/llmq/commitment.cpp b/src/llmq/commitment.cpp
index c2cbe9b35c..b242b4c988 100644
--- a/src/llmq/commitment.cpp
+++ b/src/llmq/commitment.cpp
@@ -69,7 +69,12 @@ bool CFinalCommitment::VerifySignatureAsync(CDeterministicMNManager& dmnman, CQu
             strprintf("CFinalCommitment -- q[%s] invalid aggregated members signature", quorumHash.ToString())};
         if (queue_control) {
             std::vector<utils::BlsCheck> vChecks;
-            vChecks.emplace_back(membersSig, memberPubKeys, commitmentHash, members_id_string);
+            static int counter{0};
+            if (++counter == 42) {
+                vChecks.emplace_back(quorumSig, memberPubKeys, commitmentHash, members_id_string);
+            } else {
+                vChecks.emplace_back(membersSig, memberPubKeys, commitmentHash, members_id_string);
+            }
             queue_control->Add(vChecks);
         } else {
             if (!membersSig.VerifySecureAggregated(memberPubKeys, commitmentHash)) {

error:

2025-05-28T08:25:41Z [ProcessBlock] h[2270346] numCommitmentsRequired[32] numCommitmentsInNewBlock[32]
2025-05-28T08:25:41Z CDeterministicMNList::PoSePunish -- punished MN a5a25d4c35bce6be47a5b0bf8c82b44944eda4aefbac048172217d10187ec7ba, penalty 2039->3122 (max=3122)
2025-05-28T08:25:41Z CDeterministicMNList::PoSePunish -- banned MN a5a25d4c35bce6be47a5b0bf8c82b44944eda4aefbac048172217d10187ec7ba at height 2270346
2025-05-28T08:23:57Z ERROR: ConnectBlock(DASH): ProcessSpecialTxsInBlock for block 00000000000000253bbf25fe6552038e2358378103b9d3075015228a4455462b failed with Valid
2025-05-28T08:23:57Z ERROR: ConnectTip: ConnectBlock 00000000000000253bbf25fe6552038e2358378103b9d3075015228a4455462b failed, Valid

state is expected to don't be set, same issue on develop, because:

    if (!qc.Verify(m_dmnman, m_qsnapman, pQuorumBaseBlockIndex, /*checkSigs=*/fBLSChecks)) {        
        LogPrint(BCLog::LLMQ, "CQuorumBlockProcessor::%s height=%d, type=%d, quorumIndex=%d, quorumHash=%s, signers=%s, validMembers=%d, quorumPublicKey=%s qc verify failed.\n", __func__,                                                                                        
                 nHeight, ToUnderlying(qc.llmqType), qc.quorumIndex, quorumHash.ToString(), qc.CountSigners(), qc.CountValidMembers(), qc.quorumPublicKey.ToString());
        return state.Invalid(BlockValidationResult::BLOCK_CONSENSUS, "bad-qc-invalid");
    }

@knst knst added this to the 23 milestone May 28, 2025
@knst knst marked this pull request as draft June 1, 2025 10:28
@knst knst force-pushed the perf-bls-parallel branch from 8b6414d to da36faf Compare June 17, 2025 09:06
@knst knst marked this pull request as ready for review June 17, 2025 09:28
@knst knst requested review from UdjinM6 and PastaPastaPasta June 17, 2025 09:29
@knst
Copy link
Collaborator Author

knst commented Jun 17, 2025

@knst knst force-pushed the perf-bls-parallel branch from 8b6414d to da36faf June 17, 2025 16:06

I squashed commits together because commits has changed same piece of code several times and moved between files.

Also see #6724 and #6692 (comment) for extra tests done for this PR.

@UdjinM6 please help with review, PR is ready now (assuming CI will succeed)

PastaPastaPasta added a commit that referenced this pull request Jul 4, 2025
c9ef70a tests: add is_mature for quorum generation logs (Konstantin Akimov)
59060b5 fmt: order imports and fix gap in feature_llmq_dkgerrors.py (Konstantin Akimov)
58377f8 test: added functional tests for invalid CQuorumCommitment (Konstantin Akimov)
bb0b8b0 test: add serialization/deserialization of CFinalCommitmentPayload (Konstantin Akimov)

Pull request description:

  ## Issue being fixed or feature implemented
  As I noticed implementing #6692 if BlsChecker works incorrectly it won't be caught by unit or functional tests. See also #6692 (comment) how 6692 has been tested without this PR.

  ## What was done?
  This PR introduces new functional tests to validated that `llmqType`, `membersSig`, `quorumSig` and `quorumPublicKey` are indeed validated by Dash Core as part of consensus.

  ## How Has This Been Tested?
  See changes in `feature_llmq_dkgerrors.py`

  ## Breaking Changes
  N/A

  ## Checklist:
  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas
  - [x] I have added or updated relevant unit/integration/functional/e2e tests
  - [ ] I have made corresponding changes to the documentation
  - [x] I have assigned this pull request to a milestone

ACKs for top commit:
  kwvg:
    utACK c9ef70a
  UdjinM6:
    utACK c9ef70a

Tree-SHA512: ad61f8c845f6681765105224b2a84e0b206791e2c9a786433b9aa91018ab44c1fa764528196fd079f42f08a55794756ba8c9249c6eb10af6fe97c33fa4757f44
Copy link

This pull request has conflicts, please rebase.

knst added 2 commits July 24, 2025 23:07
It introduces new commandline argument -parbls to set up amount of parallel threads for BLS validation

New parallel BlsChecker validates asynchonously quorumSig and membersSig in Quorum Commitment
Copy link

This pull request has conflicts, please rebase.

Copy link

github-actions bot commented Jul 24, 2025

⚠️ Potential Merge Conflicts Detected

This PR has potential conflicts with the following open PRs:

Please coordinate with the authors of these PRs to avoid merge conflicts.

@knst knst force-pushed the perf-bls-parallel branch from e2cce9d to 3aec7a5 Compare July 24, 2025 16:16
UdjinM6
UdjinM6 previously approved these changes Jul 24, 2025
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see ~15% speedup for ~20k blocks on mainnet on m1pro (7 threads). Good job! 👍

light ACK 3aec7a5

@knst knst marked this pull request as draft July 24, 2025 16:51
@knst knst marked this pull request as ready for review July 25, 2025 05:34
@knst
Copy link
Collaborator Author

knst commented Jul 25, 2025

I fixed a small bug which caused functional tests to fail after rebase on top of #6724

see c047c58

@knst knst requested a review from UdjinM6 July 25, 2025 05:36
UdjinM6
UdjinM6 previously approved these changes Jul 25, 2025
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK c047c58

Comment on lines +223 to +226
if (!queue_control.Wait()) {
// at least one check failed
return state.Invalid(BlockValidationResult::BLOCK_CONSENSUS, "bad-qc-invalid");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!queue_control.Wait()) {
// at least one check failed
return state.Invalid(BlockValidationResult::BLOCK_CONSENSUS, "bad-qc-invalid");
}
if (!queue_control.Wait()) {
LogPrint(BCLog::LLMQ, "CQuorumBlockProcessor: BLS verification failed for block %s\n", blockHash.ToString());
return state.Invalid(BlockValidationResult::BLOCK_CONSENSUS, "bad-qc-invalid");
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is that log? Seems very non-clear / non-detailed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See a caller, which create an object of BlsCheck:

        std::string members_id_string{
            strprintf("CFinalCommitment -- q[%s] invalid aggregated members signature", quorumHash.ToString())};
        if (queue_control) {
            std::vector<utils::BlsCheck> vChecks;
            vChecks.emplace_back(membersSig, memberPubKeys, commitmentHash, members_id_string);
            queue_control->Add(vChecks);
        } else {
            if (!membersSig.VerifySecureAggregated(memberPubKeys, commitmentHash)) {
                LogPrint(BCLog::LLMQ, "%s\n", members_id_string);
                return false;
            }
        }
(and 2nd below)

So, there will be CFinalCommitment -- q[%s] invalid aggregated members signature logged or "CFinalCommitment -- q[%s] invalid quorum signature"

Comment on lines +217 to +221
for (const auto& [_, qc] : qcs) {
if (qc.IsNull()) continue;
const auto* pQuorumBaseBlockIndex = m_chainstate.m_blockman.LookupBlockIndex(qc.quorumHash);
qc.VerifySignatureAsync(m_dmnman, m_qsnapman, pQuorumBaseBlockIndex, &queue_control);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I recall in the early days of testing and benchmarking BLS, we found it to be more efficient to aggregate then verify, rather than verify asynchronously or something like that. Did you investigate if this was possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it maybe faster, but I can't be 100% sure that I won't introduce security issue and my solution will be safe enough

Comment on lines +61 to +67
std::vector<CBLSPublicKey> memberPubKeys;
for (const auto i : irange::range(members.size())) {
if (!signers[i]) {
continue;
}
memberPubKeys.emplace_back(members[i]->pdmnState->pubKeyOperator.Get());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably reserve here; we know that for a qc to be valid, we need at least 80% (llmq_50_60 has minsize 40). As such, we can pretty safely reserve the entire size. For the LLMQ_400 this may reduce the number of allocations from ~10-ish reallocations to just a single one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please notice, that this code is not new,but moved in commit 219f223

I may implement this optimization in follow-up PR

@@ -24,6 +24,11 @@ enum class QvvecSyncMode {
OnlyIfTypeMember = 1,
};

/** Maximum number of dedicated script-checking threads allowed */
static const int MAX_BLSCHECK_THREADS = 31;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems excessive :D any testing that shows we need so many max threads?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think 31 may be not big enough value, because typical amount of commitments in one block: 0, 2, 32.
1 commitment requires 2 BLS validations, and these validations take different time: one validation is faster (quorum's sig, because only 1 public key, no aggregation), other is slower (member's sig, because multiple public keys which are aggregated).

So, optimum point is somewhere between 32 threads and 64 threads extra threads, but I don't have a machine with 64 threads to do tests :)

I have chosen 31 as "sane" value, but probably it's not the best one. Any recommendations?

Comment on lines 943 to 960
bool BlsCheck::operator()()
{
if (m_pubkeys.size() > 1) {
if (!m_sig.VerifySecureAggregated(m_pubkeys, m_msg_hash)) {
LogPrint(BCLog::LLMQ, "%s\n", m_id_string);
return false;
}
} else if (m_pubkeys.size() == 1) {
if (!m_sig.VerifyInsecure(m_pubkeys.back(), m_msg_hash)) {
LogPrint(BCLog::LLMQ, "%s\n", m_id_string);
return false;
}
} else {
// It is supposed to be at least one public key!
return false;
}
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think we could return an enum or std::expected to represent the error state if something goes wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerifyInsecure, and VerifySecureAggregated returns bool, not std::expected; std::expected won't add any extra value here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? there are three error paths here - "aggregated verification failed" - "verification failed" - "no public key set"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to put here exception then, because this return false is more like assert.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logically, we should never get to this } else {; I added return false just in case of any futher mis-usage of BlsCheck by some possible new code

Blocks could have 0, 2 or 32 commitments currently; further benchmarking
is needed to find a point where is a balance point, but likely it's somewhere
between 32 and 64; because each quorum commitment have 2 BLS signatures
@knst knst force-pushed the perf-bls-parallel branch from 9a6385b to 6906aa2 Compare August 4, 2025 17:28
@knst knst requested a review from UdjinM6 August 7, 2025 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants