Skip to content

Conversation

@PastaPastaPasta
Copy link
Member

@PastaPastaPasta PastaPastaPasta commented Nov 21, 2025

Issue being fixed or feature implemented

  • Problem: After we process a chain lock, making a recSig irrelevant , TruncateRecoveredSig removes the id/signHash indexes (rs_r/rs_s) for that recovered signature. A very late sigShare for that session then sees HasRecoveredSigForId as false, recreates a session, and eventually times out, even though we already finished the session.
  • Behavior change: Keep the rs_s entry (signHash -> 1) when truncating recovered sigs so HasRecoveredSigForSession(signHash) stays true after truncation. Cleanup still removes the remaining keys when appropriate.
  • Handling late shares: CSigSharesManager::ProcessMessageSigShare now also checks HasRecoveredSigForSession and logs/drops sigShares for sessions already completed, preventing “zombie” sessions/timeouts.
  • Notes: This leaves the recSig hash path (rs_h) unchanged; full deletions (conflict/cleanup) still remove all indexes. No extra caching insertions to keep behavior parallel to rs_h.

Before this change on testnet, I saw ~1/5 recSigs having a "late sigShare" appear resulting in wasted processing and book keeping. I don't believe this is a significant performance boost (maybe moderate boost), but it will make the logs a lot clearer about when there is a true session timeout.

It seems like what was happening was 1 masternode peer was very slow / behind, resulting in them sending us their sigShares minutes late.

How Has This Been Tested?

Breaking Changes

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@PastaPastaPasta PastaPastaPasta added this to the 23.1 milestone Nov 21, 2025
@github-actions
Copy link

github-actions bot commented Nov 21, 2025

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

@coderabbitai
Copy link

coderabbitai bot commented Nov 21, 2025

Walkthrough

The changes make recovered-signature cleanup conditional and add a pre-lock early-rejection in sigshare processing. RemoveRecoveredSig now erases the k4 (sign-key) DB entry and invalidates the hasSigForSessionCache only when deleteHashKey is true. TruncateRecoveredSig comment updated to state both hash and signHash DB entries are left behind. In ProcessMessageSigShare, signHash and an alreadyRecovered check are computed before taking the lock; incoming sigshares for already-recovered IDs or sessions are dropped early, and logging updated to use the local signHash.

Sequence Diagram(s)

sequenceDiagram
    participant Peer
    participant SigShareHandler
    participant DB as Cache/DB
    participant Lock as CriticalSection

    Peer->>SigShareHandler: sends sigShare

    rect rgba(120,180,240,0.12)
    Note over SigShareHandler: Pre-lock checks
    SigShareHandler->>SigShareHandler: compute signHash
    SigShareHandler->>DB: query alreadyRecovered(sig id / session)
    DB-->>SigShareHandler: alreadyRecovered? (yes/no)
    alt alreadyRecovered == yes
        SigShareHandler->>SigShareHandler: log and drop sigShare
        SigShareHandler-->>Peer: return
    end
    end

    rect rgba(160,220,160,0.12)
    Note over Lock: Critical section
    SigShareHandler->>Lock: acquire
    alt alreadyRecovered inside lock
        SigShareHandler->>SigShareHandler: return early
    else
        SigShareHandler->>DB: has sigShare key?
        alt duplicate
            SigShareHandler->>SigShareHandler: skip duplicate
        else
            SigShareHandler->>SigShareHandler: store/process sigShare
        end
    end
    SigShareHandler->>Lock: release
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Files to focus on:
    • src/llmq/signing.cpp — verify conditional erase and cache invalidation moved correctly and that DB state remains consistent when deleteHashKey is false.
    • src/llmq/signing_shares.cpp — confirm pre-lock signHash and alreadyRecovered checks are race-safe and logging change uses the local variable correctly.
    • src/llmq/signing.h — ensure comment update accurately reflects behavior and doesn't mislead about cleanup timing.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: retaining signHash marker during truncation and dropping late sigShares, matching the core objectives of the PR.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining the problem, solution, and implementation details for handling late sigShares and signHash retention.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d15ce14 and 8f6f075.

📒 Files selected for processing (3)
  • src/llmq/signing.cpp (2 hunks)
  • src/llmq/signing.h (1 hunks)
  • src/llmq/signing_shares.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/llmq/signing.cpp
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: kwvg
Repo: dashpay/dash PR: 6761
File: src/chainlock/signing.cpp:247-250
Timestamp: 2025-07-29T14:32:48.369Z
Learning: In PR #6761, kwvg acknowledged a null pointer check issue in ChainLockSigner::Cleanup() method but deferred it to follow-up, consistent with the pattern of avoiding scope creep in refactoring PRs.
Learnt from: knst
Repo: dashpay/dash PR: 6692
File: src/llmq/blockprocessor.cpp:217-224
Timestamp: 2025-08-19T14:57:31.801Z
Learning: In PR #6692, knst acknowledged a null pointer dereference issue in ProcessBlock() method where LookupBlockIndex may return nullptr but is passed to gsl::not_null, and created follow-up PR #6789 to address it, consistent with avoiding scope creep in performance-focused PRs.
Learnt from: kwvg
Repo: dashpay/dash PR: 6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request #6543 is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.
📚 Learning: 2025-07-29T14:32:48.369Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6761
File: src/chainlock/signing.cpp:247-250
Timestamp: 2025-07-29T14:32:48.369Z
Learning: In PR #6761, kwvg acknowledged a null pointer check issue in ChainLockSigner::Cleanup() method but deferred it to follow-up, consistent with the pattern of avoiding scope creep in refactoring PRs.

Applied to files:

  • src/llmq/signing.h
  • src/llmq/signing_shares.cpp
📚 Learning: 2025-10-05T20:38:28.457Z
Learnt from: knst
Repo: dashpay/dash PR: 6871
File: contrib/guix/libexec/build.sh:358-360
Timestamp: 2025-10-05T20:38:28.457Z
Learning: In the Dash repository, when backporting code from Bitcoin Core, typos and minor issues in comments should be kept as-is to reduce merge conflicts in future backports, even if they remain unfixed in Bitcoin Core's master branch.

Applied to files:

  • src/llmq/signing_shares.cpp
🪛 GitHub Actions: Clang Diff Format Check
src/llmq/signing_shares.cpp

[error] 527-532: Clang-format differences found. Run 'clang-format-diff.py -p1' or 'clang-format' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: linux64_nowallet-build / Build source
  • GitHub Check: mac-build / Build source
  • GitHub Check: arm-linux-build / Build source
  • GitHub Check: linux64_tsan-build / Build source
  • GitHub Check: win64-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: Lint / Run linters
🔇 Additional comments (5)
src/llmq/signing.h (1)

207-209: LGTM! Clear documentation of the truncation behavior.

The updated comment accurately reflects that both hash and signHash DB entries are preserved during truncation, which is essential for preventing late sigShare sessions from being recreated while the signature has already been recovered.

src/llmq/signing_shares.cpp (4)

506-508: Good optimization: pre-lock computation for early rejection.

Computing signHash and alreadyRecovered before acquiring the lock enables early rejection of sigShares for already-completed sessions without lock contention. While the check could theoretically become stale (TOCTOU), this is acceptable as it's a best-effort optimization: false negatives only cause unnecessary work that will be caught by later checks in the pipeline.


513-520: LGTM! Correctly prevents zombie sessions from late sigShares.

The early-exit logic properly drops sigShares for sessions that have already been recovered, addressing the core timing issue described in the PR objectives. The log message provides good visibility for debugging late sigShare occurrences.


522-524: LGTM! Correctly preserves duplicate-skipping behavior.

The duplicate check is properly ordered after the recovery check, maintaining the original logic while adding the new early-rejection path for recovered sessions.


530-531: LGTM! Good reuse of computed signHash.

Using the local signHash variable instead of calling GetSignHash() again is a sensible optimization that avoids redundant computation.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…sigShares

  - Problem: After we process a recovered signature, TruncateRecoveredSig removes the id/signHash indexes (rs_r/rs_s). A very late sigShare for
    that session then sees HasRecoveredSigForId as false, recreates a session, and eventually times out, even though we already finished the
    session.
  - Behavior change: Keep the rs_s entry (signHash -> 1) when truncating recovered sigs so HasRecoveredSigForSession(signHash) stays true after
    truncation. Cleanup still removes the remaining keys when appropriate.
  - Handling late shares: CSigSharesManager::ProcessMessageSigShare now also checks HasRecoveredSigForSession and logs/drops sigShares for sessions
    already completed, preventing “zombie” sessions/timeouts.
  - Notes: This leaves the recSig hash path (rs_h) unchanged; full deletions (conflict/cleanup) still remove all indexes. No extra caching
    insertions to keep behavior parallel to rs_h.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants