Skip to content

crypto: assume true 64-bit target, add benchmarks for constituent hashes, apply stronger optimizations #6796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

kwvg
Copy link
Collaborator

@kwvg kwvg commented Aug 2, 2025

Additional Information

  • In order to build this PR locally, you need to run git clean -dfx src/crypto/ and ./autogen.sh as source files are renamed from C to C++, which causes problems with building due to residual files that aren't always removed by make clean (see below).

    Error:
    make[2]: *** No rule to make target 'crypto/x11/blake.c', needed by 'crypto/x11/libdashconsensus_la-blake.lo'.  Stop.
    make[2]: *** Waiting for unfinished jobs....
    

    This step will need to be repeated when switching back to another branch due to the same problem inverted.

  • sphlib requires platforms to be SPH_64 in order to build variants like Blake512 (source) and Bmw512 (source). While other variants have alternate implementations for non-SPH_64 platforms like Groestl (source), non-SPH_64 platforms cannot build Blake512 or Bmw512.

    As X11 (and by extension, Dash) requires both, we can safely assume that Dash Core doesn't support non-SPH_64 targets and can remove fallback code.

  • To inform decisions when optimizing X11, this pull request introduces benchmarks for all constituent hash algorithms across the same set of data sizes (32b, 80b, 128b, 512b, 1024b, 2048b and 1M), all tests can be run with bench_dash --filter="Pow(.*)" and can be filtered by data size with bench_dash --filter="Pow(.*)1024b(.*)" (see below)

    Benchmarks:
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |                1.60 |      625,125,455.46 |    0.3% |           25.60 |            7.10 |  3.603 |           0.13 |    0.0% |      0.01 | `Pow_Blake512_1024b`
    |                0.75 |    1,333,435,941.89 |    0.2% |           17.06 |            3.33 |  5.124 |           0.12 |    0.0% |      0.01 | `Pow_Bmw512_1024b`
    |                6.24 |      160,132,129.57 |    0.0% |          118.36 |           27.74 |  4.267 |           0.25 |    0.4% |      0.01 | `Pow_Cubehash512_1024b`
    |               14.64 |       68,308,839.95 |    0.0% |          261.17 |           65.02 |  4.017 |           1.63 |    0.6% |      0.01 | `Pow_Echo512_1024b`
    |                8.72 |      114,678,066.51 |    0.1% |          169.16 |           38.73 |  4.368 |           0.25 |    0.4% |      0.01 | `Pow_Groestl512_1024b`
    |                8.87 |      112,774,410.39 |    0.1% |          153.89 |           39.38 |  3.907 |           0.15 |    0.6% |      0.01 | `Pow_Jh512_1024b`
    |                3.79 |      264,078,320.88 |    0.1% |           81.14 |           16.81 |  4.826 |           0.23 |    0.0% |      0.01 | `Pow_Keccak512_1024b`
    |                6.87 |      145,603,041.03 |    0.1% |          131.55 |           30.46 |  4.319 |           1.11 |    0.1% |      0.01 | `Pow_Luffa512_1024b`
    |                7.54 |      132,578,659.24 |    0.1% |          119.56 |           33.48 |  3.572 |           0.26 |    0.4% |      0.01 | `Pow_Shavite512_1024b`
    |                7.24 |      138,072,734.74 |    0.5% |          129.58 |           32.09 |  4.037 |           2.38 |    0.0% |      0.01 | `Pow_Simd512_1024b`
    |                1.17 |      857,609,484.77 |    0.0% |           20.67 |            5.18 |  3.992 |           0.15 |    0.0% |      0.01 | `Pow_Skein512_1024b`
    |               11.45 |       87,351,960.89 |    0.0% |          195.23 |           50.85 |  3.840 |           1.30 |    0.1% |      0.01 | `Pow_X11_1024b`
    
    • While functionality is pass/fail (testable with genesis block validation), that assertion is made before any potential dispatcher can be called, which means the default (i.e. software) implementation is used, not validating the alternative backend's correctness. To mitigate this gap, the X11 benchmark will initialize a TestingSetup, which will retrigger genesis validation in the benchmark, well after the dispatcher can be called.

      This should allow for pass/fail validation before printing benchmark values of an otherwise potentially incorrect implementation.

  • This pull request also makes changes that result in performance impact, the baseline measurement is given below

    Build parameters: LDFLAGS="-Wl,--as-needed -Wl,-O2" CC=clang-16 CXX=clang++-16 ./configure --prefix=$(pwd)/depends/x86_64-linux-gnu --enable-reduce-exports --disable-tests --disable-gui-tests --disable-fuzz --disable-fuzz-binary --disable-ccache --disable-maintainer-mode --disable-dependency-tracking --without-gui

    Baseline benchmarks (489c5f0):
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |              321.49 |        3,110,502.97 |    0.1% |        5,527.82 |        1,427.78 |  3.872 |          38.97 |    0.1% |      0.01 | `X11_0032b_single`
    |              128.63 |        7,774,397.65 |    0.1% |        2,211.03 |          571.22 |  3.871 |          15.56 |    0.1% |      0.01 | `X11_0080b_single`
    |               81.97 |       12,199,203.11 |    0.1% |        1,404.79 |          364.00 |  3.859 |           9.80 |    0.1% |      0.01 | `X11_0128b_single`
    |               21.55 |       46,394,313.29 |    0.1% |          368.03 |           95.73 |  3.844 |           2.51 |    0.1% |      0.01 | `X11_0512b_single`
    |               11.48 |       87,085,396.93 |    0.1% |          195.23 |           51.00 |  3.828 |           1.30 |    0.1% |      0.01 | `X11_1024b_single`
    |                1.43 |      700,791,473.89 |    0.1% |           22.61 |            6.33 |  3.572 |           0.09 |    0.0% |      0.02 | `X11_1M`
    |                6.45 |      155,097,633.83 |    0.0% |          108.83 |           28.64 |  3.801 |           0.69 |    0.1% |      0.01 | `X11_2048b_single`
    

    Enablement of small footprint for JH, CubeHash (~6-7% improvement)

    Benchmarks (941948e):
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |              298.22 |        3,353,250.39 |    0.0% |        5,177.04 |        1,324.33 |  3.909 |          46.22 |    0.1% |      0.01 | `Pow_X11_0032b`
    |              119.23 |        8,387,348.51 |    0.0% |        2,070.72 |          529.55 |  3.910 |          18.46 |    0.1% |      0.01 | `Pow_X11_0080b`
    |               76.04 |       13,151,349.99 |    0.0% |        1,317.10 |          337.72 |  3.900 |          11.61 |    0.1% |      0.01 | `Pow_X11_0128b`
    |               20.05 |       49,883,086.52 |    0.0% |          346.10 |           89.04 |  3.887 |           2.97 |    0.1% |      0.01 | `Pow_X11_0512b`
    |               10.71 |       93,395,759.32 |    0.0% |          184.27 |           47.56 |  3.875 |           1.53 |    0.1% |      0.01 | `Pow_X11_1024b`
    |                1.38 |      723,729,348.38 |    0.2% |           22.60 |            6.14 |  3.684 |           0.09 |    0.0% |      0.02 | `Pow_X11_1M`
    |                6.04 |      165,472,841.88 |    0.0% |          103.35 |           26.84 |  3.851 |           0.81 |    0.1% |      0.01 | `Pow_X11_2048b`
    

Breaking Changes

None expected.

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation (note: N/A)
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@kwvg kwvg added this to the 23 milestone Aug 2, 2025
Copy link

github-actions bot commented Aug 2, 2025

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

kwvg added 5 commits August 12, 2025 11:10
This is to help us inform our decisions when deciding which code to
remove and if our optimizations work as intended and where.
This is valuable when we start optimization work as genesis block
validation provides an immediate pass/fail and since we are sufficiently
further along in execution, we can be reasonably assured the dispatcher
(when implemented), has been called and the benchmark is done against
any potential accelerated implementation.
@kwvg kwvg changed the title crypto: assume true 64-bit target, add benchmarks for constituent hashes, implement AES-NI backend and dispatcher for Echo512 and Shavite512 crypto: assume true 64-bit target, add benchmarks for constituent hashes, apply stronger optimizations Aug 12, 2025
kwvg added 5 commits August 13, 2025 15:42
A lot of the operations defined in `sph_types` are already available to
us from the standard library or upstream. Let's just make them aliases
to what we already have and actively maintain.
Reduces code length without compromising on performance (as they're eval
at compile time) while giving us reusable portions to generate other
tables.
Alongside availing more aggressive optimization if the compiler supports
it, Cubehash and JH perform better with their small footprint variants
so let's default to those variants. We aren't enabling _all_ small
footprint variants as algorithms like Groestl perform significantly
worse.
@kwvg kwvg marked this pull request as ready for review August 13, 2025 20:07
Copy link

coderabbitai bot commented Aug 13, 2025

Walkthrough

Adds SPHLIB_CXXFLAGS with optional -O3 in configure and exports it. Introduces a new static SPH crypto library, wiring SPH X11/512-bit sources into the build and consensus, with SPH-specific CPP/CXX flags. Replaces aes_helper.c with a new header-only constexpr AES helper. Converts multiple X11 algorithms (blake, bmw, cubehash, echo, groestl, jh, keccak, luffa, shavite, simd, skein) to typed context APIs, removes extern "C" guards, and consolidates to 64-bit paths. Overhauls sph_types.h to fixed-width types and std rotations. Adds a new pow_hash benchmark file, updates bench sources, and trims X11 benchmarks from crypto_hash. Updates lint exclusions.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
src/crypto/x11/sph_cubehash.h (1)

76-114: Action required: document vendored edits to src/crypto/x11/sph_cubehash.h and confirm C-linkage safety

Short verification summary:

  • I inspected src/crypto/x11/sph_cubehash.h (no extern "C" present).
  • No .c translation units include any sph_*.h headers; sph headers are only included by C++ TUs.
  • No legacy void* prototypes for the sph_*512 APIs were found.

Files to note:

  • src/crypto/x11/sph_cubehash.h (vendored header — changed linkage/signature surface)
  • Consumers found: src/crypto/x11/cubehash.cpp (implementation), src/hash_x11.h (uses ctx & calls), src/bench/pow_hash.cpp

Required actions (pick one):

  • Add a vendor/patch-notes entry documenting why this vendored file was edited and the intended vendor-sync strategy; OR
  • If editing vendor code is disallowed, revert the change and implement any C++-specific API surface outside the vendored tree; OR
  • If C translation units need to consume these APIs now or later, provide a non-vendor C-compatible wrapper (reintroducing extern "C") rather than modifying the vendored header.

Reason: edits touch a vendored path — even though current usage is C++-only, the repo policy and future C TU usage require explicit documentation or a safer wrapper approach.

♻️ Duplicate comments (4)
src/crypto/x11/sph_shavite.h (1)

99-116: close/addbits_and_close prototypes updated correctly

The change to sph_shavite512_context* aligns with implementation and documented reinit behavior post-close.

The same vendored-path and extern "C" considerations from src/crypto/x11/sph_cubehash.h apply here.

src/crypto/x11/sph_bmw.h (1)

97-114: Finalization prototypes updated to typed contexts

sph_bmw512_close and addbits_and_close now take sph_bmw512_context*, matching the implementation and other 512-bit APIs.

Vendored-path and extern "C" linkage notes from the CubeHash review apply here as well.

src/crypto/x11/sph_simd.h (1)

97-114: close and addbits_and_close updated consistently

The finalization functions now use sph_simd512_context*. Behavior matches implementation delegating from close->addbits_and_close.

Same vendor/linkage caveats as noted previously.

src/crypto/x11/sph_skein.h (1)

101-117: Finalization functions updated to typed context pointer

sph_skein512_close and addbits_and_close use sph_skein512_context*, aligning with implementation and other algorithms’ 512-bit API.

Vendor policy and extern "C" considerations from earlier comments apply here too.

🧹 Nitpick comments (9)
test/lint/lint-include-guards.py (1)

29-29: Broadened exclusion to src/crypto/x11: verify we’re not masking real issues

Excluding the entire directory will skip include-guard checks for all X11 headers. If this is temporary to accommodate in-flight refactors, consider tracking a follow-up to narrow the scope later or to fix guards and re-enable checks.

If you want to tighten semantics to true “prefix” matching (and avoid accidental substring matches), refactor the filter to use startswith() against normalized paths. I can draft a patch if desired.

test/lint/lint-whitespace.py (1)

21-21: Exclude src/crypto/x11 from whitespace lint: sanity-check necessity and duration

This aligns with the include-guard exclusion. Please confirm this is intended (e.g., to land a large vendor-ish import or transitional formatting) and plan to re-enable once stable, to keep lint coverage high.

src/bench/pow_hash.cpp (3)

47-47: Pass the hash buffer, not the array address.

You currently pass &hash (type uint8_t ()[64]) to the close functions. While it implicitly converts to void, it’s non-idiomatic and can be confusing. Pass hash (decays to uint8_t*) or &hash[0] for clarity.

Apply this diff to all close calls:

-        sph_blake512_close(&ctx, &hash);
+        sph_blake512_close(&ctx, hash);
-        sph_bmw512_close(&ctx, &hash);
+        sph_bmw512_close(&ctx, hash);
-        sph_cubehash512_close(&ctx, &hash);
+        sph_cubehash512_close(&ctx, hash);
-        sph_echo512_close(&ctx, &hash);
+        sph_echo512_close(&ctx, hash);
-        sph_groestl512_close(&ctx, &hash);
+        sph_groestl512_close(&ctx, hash);
-        sph_jh512_close(&ctx, &hash);
+        sph_jh512_close(&ctx, hash);
-        sph_keccak512_close(&ctx, &hash);
+        sph_keccak512_close(&ctx, hash);
-        sph_luffa512_close(&ctx, &hash);
+        sph_luffa512_close(&ctx, hash);
-        sph_shavite512_close(&ctx, &hash);
+        sph_shavite512_close(&ctx, hash);
-        sph_simd512_close(&ctx, &hash);
+        sph_simd512_close(&ctx, hash);
-        sph_skein512_close(&ctx, &hash);
+        sph_skein512_close(&ctx, hash);

Also applies to: 59-59, 71-71, 83-83, 95-95, 107-107, 119-119, 131-131, 143-143, 155-155, 167-167


39-169: Optional: reduce duplication with a small generic helper.

All Pow_*512 functions share the same pattern. A templated helper taking init/update/close callables (or function pointers) plus context type would reduce boilerplate and make future additions less error-prone, without affecting benchmark fidelity.


29-37: Optional: guard against over-optimization in microbench.

Writes to a local buffer via external functions are unlikely to be optimized away, but for belt-and-suspenders, consider calling ancl::nanobench::doNotOptimizeAway(hash) (or the bench helper, if exposed) after the run to pin the side effect.

src/crypto/x11/groestl.cpp (1)

2432-2459: Nit: [[maybe_unused]] on ‘buf’ while it’s used.

buf is assigned and subsequently used (for address derivation earlier). Either remove [[maybe_unused]] or drop the unused assignment entirely to avoid confusion. Given this is vendored, consider leaving as-is unless you’re already touching this file.

src/crypto/x11/jh.cpp (1)

1-1: Update the file header comment to reflect the C++ file.

The file identifier comment still references jh.c but this is now a C++ file (jh.cpp).

-/* $Id: jh.c 255 2011-06-07 19:50:20Z tp $ */
+/* $Id: jh.cpp 255 2011-06-07 19:50:20Z tp $ */
src/crypto/x11/aes_helper.h (1)

1-1: Update the file header comment to reflect the header file.

The file identifier comment references aes_helper.c but this is a header file (aes_helper.h).

-/* $Id: aes_helper.c 220 2010-06-09 09:21:50Z tp $ */
+/* $Id: aes_helper.h 220 2010-06-09 09:21:50Z tp $ */
src/crypto/x11/sph_types.h (1)

389-394: “Aligned” helpers no longer differ from unaligned implementations

The aligned variants currently call the same Write*/Read* routines as the unaligned versions (which use memcpy internally and don’t depend on alignment). That’s fine for correctness, but the docstrings still state alignment requirements. Consider updating the comments to avoid confusion.

Also applies to: 414-418, 440-444, 465-468, 490-494, 515-518, 540-544, 565-568

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 489c5f0 and 941948e.

📒 Files selected for processing (34)
  • configure.ac (2 hunks)
  • src/Makefile.am (3 hunks)
  • src/Makefile.bench.include (1 hunks)
  • src/bench/crypto_hash.cpp (0 hunks)
  • src/bench/pow_hash.cpp (1 hunks)
  • src/crypto/common.h (1 hunks)
  • src/crypto/x11/aes_helper.c (0 hunks)
  • src/crypto/x11/aes_helper.h (1 hunks)
  • src/crypto/x11/blake.cpp (2 hunks)
  • src/crypto/x11/bmw.cpp (1 hunks)
  • src/crypto/x11/cubehash.cpp (1 hunks)
  • src/crypto/x11/echo.cpp (2 hunks)
  • src/crypto/x11/groestl.cpp (2 hunks)
  • src/crypto/x11/jh.c (0 hunks)
  • src/crypto/x11/jh.cpp (1 hunks)
  • src/crypto/x11/keccak.cpp (1 hunks)
  • src/crypto/x11/luffa.cpp (7 hunks)
  • src/crypto/x11/shavite.cpp (2 hunks)
  • src/crypto/x11/simd.cpp (2 hunks)
  • src/crypto/x11/skein.cpp (1 hunks)
  • src/crypto/x11/sph_blake.h (4 hunks)
  • src/crypto/x11/sph_bmw.h (4 hunks)
  • src/crypto/x11/sph_cubehash.h (4 hunks)
  • src/crypto/x11/sph_echo.h (4 hunks)
  • src/crypto/x11/sph_groestl.h (4 hunks)
  • src/crypto/x11/sph_jh.h (4 hunks)
  • src/crypto/x11/sph_keccak.h (4 hunks)
  • src/crypto/x11/sph_luffa.h (4 hunks)
  • src/crypto/x11/sph_shavite.h (4 hunks)
  • src/crypto/x11/sph_simd.h (4 hunks)
  • src/crypto/x11/sph_skein.h (4 hunks)
  • src/crypto/x11/sph_types.h (19 hunks)
  • test/lint/lint-include-guards.py (1 hunks)
  • test/lint/lint-whitespace.py (1 hunks)
💤 Files with no reviewable changes (3)
  • src/bench/crypto_hash.cpp
  • src/crypto/x11/jh.c
  • src/crypto/x11/aes_helper.c
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/crypto/common.h
  • src/crypto/x11/sph_cubehash.h
  • src/crypto/x11/sph_simd.h
  • src/crypto/x11/sph_luffa.h
  • src/crypto/x11/sph_bmw.h
  • src/crypto/x11/sph_keccak.h
  • src/bench/pow_hash.cpp
  • src/crypto/x11/skein.cpp
  • src/crypto/x11/cubehash.cpp
  • src/crypto/x11/sph_shavite.h
  • src/crypto/x11/sph_skein.h
  • src/crypto/x11/jh.cpp
  • src/crypto/x11/aes_helper.h
  • src/crypto/x11/luffa.cpp
  • src/crypto/x11/sph_groestl.h
  • src/crypto/x11/sph_blake.h
  • src/crypto/x11/sph_echo.h
  • src/crypto/x11/bmw.cpp
  • src/crypto/x11/sph_jh.h
  • src/crypto/x11/simd.cpp
  • src/crypto/x11/shavite.cpp
  • src/crypto/x11/keccak.cpp
  • src/crypto/x11/groestl.cpp
  • src/crypto/x11/sph_types.h
  • src/crypto/x11/blake.cpp
  • src/crypto/x11/echo.cpp
src/crypto/{ctaes,x11}/**

📄 CodeRabbit Inference Engine (CLAUDE.md)

Do not make changes under any circumstances to vendored dependencies in src/crypto/ctaes and src/crypto/x11

Files:

  • src/crypto/x11/sph_cubehash.h
  • src/crypto/x11/sph_simd.h
  • src/crypto/x11/sph_luffa.h
  • src/crypto/x11/sph_bmw.h
  • src/crypto/x11/sph_keccak.h
  • src/crypto/x11/skein.cpp
  • src/crypto/x11/cubehash.cpp
  • src/crypto/x11/sph_shavite.h
  • src/crypto/x11/sph_skein.h
  • src/crypto/x11/jh.cpp
  • src/crypto/x11/aes_helper.h
  • src/crypto/x11/luffa.cpp
  • src/crypto/x11/sph_groestl.h
  • src/crypto/x11/sph_blake.h
  • src/crypto/x11/sph_echo.h
  • src/crypto/x11/bmw.cpp
  • src/crypto/x11/sph_jh.h
  • src/crypto/x11/simd.cpp
  • src/crypto/x11/shavite.cpp
  • src/crypto/x11/keccak.cpp
  • src/crypto/x11/groestl.cpp
  • src/crypto/x11/sph_types.h
  • src/crypto/x11/blake.cpp
  • src/crypto/x11/echo.cpp
src/bench/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit Inference Engine (CLAUDE.md)

Performance benchmarks should be placed in src/bench/ and use nanobench

Files:

  • src/bench/pow_hash.cpp
🧠 Learnings (2)
📚 Learning: 2025-07-20T18:42:49.794Z
Learnt from: CR
PR: dashpay/dash#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-20T18:42:49.794Z
Learning: Applies to src/bench/**/*.{cpp,h,cc,cxx,hpp} : Performance benchmarks should be placed in src/bench/ and use nanobench

Applied to files:

  • src/Makefile.bench.include
📚 Learning: 2025-07-23T09:30:34.631Z
Learnt from: kwvg
PR: dashpay/dash#6761
File: src/chainlock/signing.h:5-6
Timestamp: 2025-07-23T09:30:34.631Z
Learning: Dash Core uses BITCOIN_ prefix for header guards as the standard convention, inherited from Bitcoin Core. Only a few BLS-specific files in src/bls/ use DASH_ prefix. The vast majority of files (385+) use BITCOIN_ prefix.

Applied to files:

  • src/Makefile.am
🧬 Code Graph Analysis (13)
src/crypto/x11/sph_cubehash.h (1)
src/crypto/x11/cubehash.cpp (6)
  • sph_cubehash512_init (561-565)
  • sph_cubehash512_init (562-562)
  • sph_cubehash512 (568-572)
  • sph_cubehash512 (569-569)
  • sph_cubehash512_close (575-579)
  • sph_cubehash512_close (576-576)
src/crypto/x11/sph_simd.h (1)
src/crypto/x11/simd.cpp (6)
  • sph_simd512_init (1089-1093)
  • sph_simd512_init (1090-1090)
  • sph_simd512 (1095-1099)
  • sph_simd512 (1096-1096)
  • sph_simd512_close (1101-1105)
  • sph_simd512_close (1102-1102)
src/crypto/x11/sph_luffa.h (1)
src/crypto/x11/luffa.cpp (6)
  • sph_luffa512_init (688-696)
  • sph_luffa512_init (689-689)
  • sph_luffa512 (699-703)
  • sph_luffa512 (700-700)
  • sph_luffa512_close (706-710)
  • sph_luffa512_close (707-707)
src/crypto/x11/sph_bmw.h (1)
src/crypto/x11/bmw.cpp (6)
  • sph_bmw512_init (489-493)
  • sph_bmw512_init (490-490)
  • sph_bmw512 (496-500)
  • sph_bmw512 (497-497)
  • sph_bmw512_close (503-507)
  • sph_bmw512_close (504-504)
src/crypto/x11/sph_keccak.h (1)
src/crypto/x11/keccak.cpp (5)
  • sph_keccak512_init (1016-1019)
  • sph_keccak512 (1022-1026)
  • sph_keccak512 (1023-1023)
  • sph_keccak512_close (1029-1033)
  • sph_keccak512_close (1030-1030)
src/bench/pow_hash.cpp (2)
src/hash_x11.h (1)
  • HashX11 (26-90)
src/crypto/x11/keccak.cpp (1)
  • sph_keccak512_init (1016-1019)
src/crypto/x11/sph_shavite.h (1)
src/crypto/x11/shavite.cpp (6)
  • sph_shavite512_init (932-936)
  • sph_shavite512_init (933-933)
  • sph_shavite512 (939-943)
  • sph_shavite512 (940-940)
  • sph_shavite512_close (946-951)
  • sph_shavite512_close (947-947)
src/crypto/x11/sph_skein.h (1)
src/crypto/x11/skein.cpp (6)
  • sph_skein512_init (673-677)
  • sph_skein512_init (674-674)
  • sph_skein512 (680-684)
  • sph_skein512 (681-681)
  • sph_skein512_close (687-691)
  • sph_skein512_close (688-688)
src/crypto/x11/sph_groestl.h (1)
src/crypto/x11/groestl.cpp (6)
  • sph_groestl512_init (2462-2466)
  • sph_groestl512_init (2463-2463)
  • sph_groestl512 (2469-2473)
  • sph_groestl512 (2470-2470)
  • sph_groestl512_close (2476-2480)
  • sph_groestl512_close (2477-2477)
src/crypto/x11/sph_blake.h (1)
src/crypto/x11/blake.cpp (6)
  • sph_blake512_init (609-613)
  • sph_blake512_init (610-610)
  • sph_blake512 (616-620)
  • sph_blake512 (617-617)
  • sph_blake512_close (623-627)
  • sph_blake512_close (624-624)
src/crypto/x11/sph_echo.h (1)
src/crypto/x11/echo.cpp (6)
  • sph_echo512_init (406-410)
  • sph_echo512_init (407-407)
  • sph_echo512 (413-417)
  • sph_echo512 (414-414)
  • sph_echo512_close (420-424)
  • sph_echo512_close (421-421)
src/crypto/x11/sph_jh.h (1)
src/crypto/x11/jh.cpp (6)
  • sph_jh512_init (467-471)
  • sph_jh512_init (468-468)
  • sph_jh512 (474-478)
  • sph_jh512 (475-475)
  • sph_jh512_close (481-485)
  • sph_jh512_close (482-482)
src/crypto/x11/sph_types.h (2)
src/chainlock/signing.h (1)
  • int32_t (28-87)
src/crypto/common.h (12)
  • WriteBE16 (73-77)
  • ReadBE16 (52-57)
  • WriteLE16 (34-38)
  • ReadLE16 (13-18)
  • WriteBE32 (79-83)
  • ReadBE32 (59-64)
  • WriteLE32 (40-44)
  • ReadLE32 (20-25)
  • WriteBE64 (85-89)
  • ReadBE64 (66-71)
  • WriteLE64 (46-50)
  • ReadLE64 (27-32)
🔇 Additional comments (95)
src/crypto/common.h (1)

73-78: Add BE16 writer: consistent, correct, and matches existing pattern

Implementation mirrors the LE/BE 32/64 helpers (htobe16_internal + memcpy), which is the established, alias-safe pattern in this header. Good addition.

src/Makefile.bench.include (1)

46-46: Confirmed: bench/bench_dash links SPH via LIBBITCOIN_CRYPTO — no change required

Short summary: pow_hash.cpp is built into bench/bench_dash and bench/bench_dash is linked against $(LIBBITCOIN_CRYPTO). src/Makefile.am adds crypto/libbitcoin_crypto_sph.la to LIBBITCOIN_CRYPTO and the sph lib includes SPH sources/flags, so SPH symbols are transitively available to the bench binary.

Relevant locations:

  • src/Makefile.bench.include
    • bench_bench_dash_SOURCES includes bench/pow_hash.cpp
    • bench_bench_dash_LDADD includes $(LIBBITCOIN_CRYPTO)
  • src/Makefile.am
    • LIBBITCOIN_CRYPTO_SPH = crypto/libbitcoin_crypto_sph.la
    • LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_SPH)
    • crypto_libbitcoin_crypto_sph_la_CXXFLAGS includes $(SPHLIB_CXXFLAGS) and the sph sources/headers are listed

Result: no action required; SPH objects are already linked into the bench target.

configure.ac (1)

1880-1881: AC_SUBST(SPHLIB_CXXFLAGS) export: good plumbing

This ensures the per-SPH flags propagate to generated Makefiles. Combined with the guarded -O3, this gives fine-grained control without impacting global CXXFLAGS.

src/crypto/x11/sph_cubehash.h (2)

97-114: close/addbits_and_close prototypes updated consistently

The close and addbits_and_close prototypes now use sph_cubehash512_context*, consistent with the rest of the API and the implementation. Reinitialization semantics via the close path match the implementation.


76-86: Typed context migration for CubeHash-512 API looks good

I searched the repo — header, definitions, and callers have been updated to use sph_cubehash512_context*; no legacy prototypes using void *cc remain.

Files of interest:

  • src/crypto/x11/sph_cubehash.h — typedef/prototypes updated (lines: 67, 76, 86, 97, 112-113).
  • src/crypto/x11/cubehash.cpp — function definitions use sph_cubehash512_context *cc (lines: 562, 569, 576, 583).
  • Call sites:
    • src/hash_x11.h — ctx declaration and calls (lines: 36, 73–75).
    • src/bench/pow_hash.cpp — ctx and calls (lines: 65, 69–71).

No further changes required.

src/crypto/x11/sph_shavite.h (1)

78-89: Typed-context migration for SHAvite-512 is consistent — no action required

Header, implementation and callers all use sph_shavite512_context*; verified locations:

  • src/crypto/x11/sph_shavite.h — declarations use sph_shavite512_context*
  • src/crypto/x11/shavite.cpp — implementations match the header
  • src/hash_x11.h and src/bench/pow_hash.cpp — callers updated to pass &ctx_shavite

No fixes needed.

src/crypto/x11/sph_bmw.h (1)

76-87: LGTM — BMW-512 uses typed context pointer; no action required

Verified that the header, implementation, and callers are consistent: the API uses sph_bmw512_context* and all uses match.

  • src/crypto/x11/sph_bmw.h — typedef + prototypes (sph_bmw512_init, sph_bmw512, sph_bmw512_close, sph_bmw512_addbits_and_close)
  • src/crypto/x11/bmw.cpp — matching function definitions (wrap bmw64_* implementations)
  • src/hash_x11.h — caller usage (sph_bmw512_init(&ctx_bmw), sph_bmw512(...), sph_bmw512_close(...))
  • src/bench/pow_hash.cpp — bench usage (sph_bmw512_context ctx; sph_bmw512_init(&ctx); ...)
src/crypto/x11/sph_simd.h (1)

76-87: SIMD-512 typed context parameterization looks correct — no outliers found

Declaration and implementation use sph_simd512_context* and the callsites pass a pointer (e.g., &ctx_simd, &ctx, cc).

Files/locations checked:

  • src/crypto/x11/sph_simd.h:76 — declaration: void sph_simd512_init(sph_simd512_context *cc);
  • src/crypto/x11/simd.cpp:1090 — implementation signature
  • src/crypto/x11/simd.cpp:1111 — call: sph_simd512_init(cc);
  • src/hash_x11.h:81 — call: sph_simd512_init(&ctx_simd);
  • src/bench/pow_hash.cpp:153 — call: sph_simd512_init(&ctx);
src/crypto/x11/sph_skein.h (1)

80-90: Approve — Skein-512 typed context adoption is consistent and callers pass addresses

Verified with the provided grep output: declaration and implementation use sph_skein512_context* and init callers pass the address of context variables, so no changes needed.

  • src/crypto/x11/sph_skein.h:80 — declaration (init) and update prototype present
  • src/crypto/x11/skein.cpp:674 — init signature; 698 — internal call
  • src/hash_x11.h:57 — sph_skein512_init(&ctx_skein);
  • src/bench/pow_hash.cpp:165 — sph_skein512_init(&ctx);
src/crypto/x11/sph_luffa.h (3)

62-62: Typed context API for Luffa-512 looks correct and consistent.

Signatures now use sph_luffa512_context* and align with the corresponding implementation in luffa.cpp. The change removes void* ambiguity and is a net improvement.

Also applies to: 72-72, 83-83, 99-99


74-84: Double-check doc promise: “context is automatically reinitialized.”

Per luffa.cpp, sph_luffa512_close delegates to addbits_and_close, and addbits_and_close reinitializes the context. This matches the header’s contract. Please ensure all other 512-bit alg headers maintain the same documented behavior for consistency.

Also applies to: 86-99


39-41: C linkage removed — verify no C translation units include this header.

With extern "C" removed, these symbols will have C++ linkage. Confirm that all inclusion sites compile as C++ (which is the case across this PR).

src/crypto/x11/sph_blake.h (3)

78-78: Typed context API for BLAKE-512 is consistent with blake.cpp.

The updated prototypes match the implementation (blake.cpp lines 608–626), and removing void* improves type safety.

Also applies to: 88-88, 99-99, 115-115


46-47: SPH_64 gating removed — confirm only 64-bit targets build these headers.

Given PR scope (assume true 64-bit target), defining SPH_SIZE_blake512 unconditionally is fine. Just ensure all build configs set/assume 64-bit, since BLAKE-512 X11 path depends on it.


40-41: C linkage removed — verify all inclusions compile as C++.

As with other sph_* headers in this PR, ensure there are no remaining C-object files depending on C linkage for these names.

src/bench/pow_hash.cpp (1)

31-37: No action needed — bench target already links test/util (LIBTEST_UTIL)

Verified: src/Makefile.bench.include wires $(LIBTEST_UTIL) into bench_bench_dash_LDADD, and test/util sources are present, so instantiating TestingSetup in bench files is already supported.

Files to note:

  • src/Makefile.bench.include — bench_bench_dash_LDADD includes $(LIBTEST_UTIL) (around lines 61–66).
  • src/Makefile.test_util.include — contains test/util sources/headers (around lines 21–23).
  • src/bench/pow_hash.cpp — calls MakeNoLogFileContext() (lines 31–33).
src/crypto/x11/skein.cpp (1)

674-699: Typed context migration for Skein-512 is correct; reinit on close is preserved.

The new signatures match the header, and addbits_and_close reinitializes via sph_skein512_init(cc), preserving prior semantics.

src/crypto/x11/groestl.cpp (1)

2463-2487: Typed context 512-bit API looks good and matches the new headers.

The 64-bit-only pathway and sph_groestl512_* functions align with the PR’s 64-bit assumption and header changes.

src/crypto/x11/shavite.cpp (5)

933-936: API migration to typed context pointer looks correct (init)

sph_shavite512_init now strongly types the context. This improves type safety and matches the broader refactor. No issues spotted.


940-943: Typed update signature is consistent and correct

sph_shavite512 takes a typed context and forwards to shavite_big_core as expected. Looks good.


947-951: Close + reinit flow preserved with typed context

sph_shavite512_close forwards to shavite_big_close and reinitializes with IV512. Behavior matches typical SPH semantics. Good change.


955-959: Addbits-and-close API matches typed migration and resets state

Signature and flow are consistent with the refactor. Good to go.


60-60: aes_helper.h defines AES_ROUND_NOKEY_LE and is header-only-safe

Verified: AES_ROUND_NOKEY_LE is present as an inline function and there are no obvious non-inline function definitions in the header, so including it from multiple TUs should not cause ODR/link errors.

  • File: src/crypto/x11/aes_helper.h — inline definition at lines 148–151:
    inline void AES_ROUND_NOKEY_LE(const sph_u32 x0, const sph_u32 x1, const sph_u32 x2, const sph_u32 x3,
    sph_u32& y0, sph_u32& y1, sph_u32& y2, sph_u32& y3)
    { return AES_ROUND_LE(x0, x1, x2, x3, /k0=/0, /k1=/0, /k2=/0, /k3=/0, y0, y1, y2, y3); }
src/crypto/x11/sph_echo.h (4)

79-79: Strongly-typed echo512 context in the public API is a solid improvement

The move from void* to sph_echo512_context* improves compile-time safety for callers. No concerns here.


89-89: Update API uses typed context for processing — consistent with implementation

The data processing function signature aligns with echo.cpp usage. Looks correct.


100-100: Close API now typed — consistent with bench and impl call sites

Signature is consistent with the implementation shown in echo.cpp. Good.


115-116: Addbits-and-close API typed; C++ linkage is safe — no in-repo C consumers found

Signature is correct. I searched the repo: there are no extern "C" wrappers under src/crypto/x11 and no C (.c) translation units include sph_*.h, so the SPH headers having C++ linkage should not break in-repo C consumers.

  • File under review

    • src/crypto/x11/sph_echo.h (lines 115–116):
      void sph_echo512_addbits_and_close(
      sph_echo512_context *cc, unsigned ub, unsigned n, void *dst);
  • Verification summary

    • repo-wide search found extern "C" occurrences in other areas (configure.ac, src/stacktraces.cpp, third-party headers), but none in src/crypto/x11.
    • No .c files include sph_.h (no matches for #include "sph_.h" in .c files).

Conclusion: no fix required here; resolving this review comment.

src/crypto/x11/bmw.cpp (5)

483-483: Use of static_cast for out pointer is appropriate in C++

Casting void* to unsigned char* with static_cast keeps the implementation idiomatic for C++. No issues.


490-493: bmw512_init: typed context pointer and IV initialization look correct

Forwarding to bmw64_init(cc, IV512) with a typed context aligns with the refactor.


497-500: bmw512 update: signature migration is correct

bmw64(cc, data, len) is unmodified; using sph_bmw512_context* is consistent. LGTM.


504-507: bmw512_close forwards to addbits-and-close — clean and reduces duplication

Pattern is consistent across other algorithms. Looks good.


511-515: bmw512_addbits-and-close: typed signature and reinit behavior are correct

Closes, outputs, then re-initializes state — expected SPH behavior. No issues.

src/crypto/x11/cubehash.cpp (5)

555-555: C++-style cast for out pointer is correct

Switching to static_cast<unsigned char*> is the right choice here. No functional change otherwise.


562-565: cubehash512_init: typed context pointer migration looks good

Initialization path unchanged and correct under the new signature.


569-572: cubehash512 update: signature migration consistent

cubehash_core call is unchanged; typed context improves safety. No issues.


576-579: cubehash512_close: delegates to addbits-and-close — consistent pattern

Refactor reduces duplication. LGTM.


583-587: cubehash512_addbits-and-close: typed context and reinit pattern are correct

The function resets state after finalization as expected. Looks good.

src/crypto/x11/sph_jh.h (4)

78-78: Typed JH-512 init API improves type safety

The updated signature aligns with jh.cpp implementation. No issues.


88-88: Typed JH-512 update signature is consistent with implementation

Forwarding to jh_core with a typed context, as in jh.cpp, looks correct.


99-99: Typed JH-512 close signature is correct and aligns with jh.cpp

Matches jh_close(cc, 0, 0, dst, 16, IV512) pattern noted in the implementation.


115-116: sph_jh512: ABI & 64-bit-layout — verification

Short summary: I checked the repo. Prototypes are typed (sph_jh512_context*), no void*-based sph_jh512 prototypes remain, and all in-repo callsites are C++ — but the header has no extern "C" guard and the implementation operates on H.wide (sph_u64), so the context layout assumes 64-bit. If you have external C consumers or need 32-bit compatibility, this is a potential ABI break.

Findings / locations:

  • Declaration / context:
    • src/crypto/x11/sph_jh.h — typed prototypes (sph_jh512_init, sph_jh512, sph_jh512_close, sph_jh512_addbits_and_close) and sph_jh512_context typedef (uses union { sph_u64 wide[16]; sph_u32 narrow[32]; } and sph_u64 block_count).
  • Definitions:
    • src/crypto/x11/jh.cpp — definitions for sph_jh512_* (uses H.wide in READ_STATE/WRITE_STATE and jh_close/jh_init).
  • Call sites (all in-repo, C++):
    • src/hash_x11.h — sph_jh512_init(&ctx_jh), sph_jh512(...), sph_jh512_close(&ctx_jh, static_cast<void*>(&hash[4])).
    • src/bench/pow_hash.cpp — sph_jh512_init(&ctx); sph_jh512(...); sph_jh512_close(&ctx, &hash).
  • Search: no matches for void-based sph_jh512_* prototypes were found in the repository.

What to do (brief):

  • If there are any external C consumers, re‑introduce extern "C" around the declarations or provide a C wrapper to preserve the C ABI.
  • If 32-bit builds must be supported, verify/restore a 32-bit-compatible context layout or add compatibility shims/document that this API now requires 64-bit contexts.

Tag:

src/crypto/x11/keccak.cpp (5)

1016-1019: Typed Keccak-512 init: API migration looks correct.

Switching to sph_keccak512_context* aligns with the new header and the 64-bit-only path.


1023-1026: Typed Keccak-512 update: OK.

Directly forwarding to keccak_core with lim=72 is consistent with Keccak-512 rate.


1030-1033: Typed Keccak-512 close: OK.

Delegation to addbits-and-close preserves behavior.


1037-1040: Typed Keccak-512 addbits-and-close: OK.

Matches the DEFCLOSE(64,72) implementation.


1016-1040: No C translation units call the 512 APIs — removing extern "C" wrappers is safe

Search of .c translation units found no includes of sph_keccak|groestl|echo|luffa headers and no calls to sph_*512 symbols. Only occurrence is a comment in src/crypto/x11/aes_helper.h referencing aes_helper.c (harmless).

  • Verified: no .c files reference sph_keccak/groestl/echo/luffa 512 APIs.
  • Location of C++ definitions: src/crypto/x11/keccak.cpp (sph_keccak512, sph_keccak512_close, sph_keccak512_addbits_and_close).
  • Action: no repo changes required. If external C consumers exist outside this repository, update them or restore a C-compatible header/extern "C" wrapper as needed.
src/crypto/x11/sph_groestl.h (5)

78-79: Typed Groestl-512 init prototype: OK.

Matches the updated implementation in groestl.cpp.


88-89: Typed Groestl-512 update prototype: OK.

Signature and constness are appropriate.


99-100: Typed Groestl-512 close prototype: OK.


114-116: Typed Groestl-512 addbits-and-close prototype: OK.

Consistent with the implementation.


78-116: Verify header/API changes don’t break ABI/usage — no issues found

Quick summary:

  • No C translation units include src/crypto/x11/sph_groestl.h — the header is only included from C++ sources (src/crypto/x11/groestl.cpp, src/hash_x11.h, src/bench/pow_hash.cpp).
  • The groestl context now uses a single 64-bit field (sph_u64 count) in src/crypto/x11/sph_groestl.h, and src/crypto/x11/groestl.cpp consistently uses sc->count / SPH_T64; there are no references to count_low/count_high in the Groestl implementation.
  • count_low/count_high remain in the unrelated SIMD code (src/crypto/x11/sph_simd.h and src/crypto/x11/simd.cpp).

Conclusion: removing C linkage and switching to a single 64-bit count for Groestl does not break usage in this repository — no changes required.

src/crypto/x11/echo.cpp (6)

416-417: C++-safe cast for data pointer: LGTM.

static_cast to const unsigned char* avoids C-style casts and is correct for buffer arithmetic.


407-410: Typed Echo-512 init: OK.

Aligned with sph_echo.h.


414-417: Typed Echo-512 update: OK.

Correct forwarding to echo_big_core.


421-424: Typed Echo-512 close: OK.


428-431: Typed Echo-512 addbits-and-close: OK.


52-52: Switch to header-only AES helper: OK.

aes_helper.h defines functions as constexpr inline / inline and there's no aes_helper.c present, so including the .h from multiple TUs should not cause multiple-definition issues.

  • src/crypto/x11/aes_helper.h — inline defs at lines 57 (pack_le), 65 (gf8_mul2), 87 (gf8_mul), 137 (AES_ROUND_LE), 148 (AES_ROUND_NOKEY_LE)
  • No src/crypto/x11/aes_helper.c found
  • Header included at src/crypto/x11/echo.cpp:52 and src/crypto/x11/shavite.cpp:60
src/crypto/x11/sph_keccak.h (5)

77-78: Typed Keccak-512 init prototype: OK.


87-88: Typed Keccak-512 update prototype: OK.


98-99: Typed Keccak-512 close prototype: OK.


113-115: Typed Keccak-512 addbits-and-close prototype: OK.


77-115: Confirm C linkage not required — manual verification needed

I ran a search for .c files including "sph_keccak.h" or referencing sph_keccak512 symbols; no matches were found in .c files. Absence of matches in the repo isn't definitive — please confirm no C code or external C consumers depend on these headers/symbols.

Files to check:

  • src/crypto/x11/sph_keccak.h (removed extern "C" guards)

If any C consumers exist, reintroduce extern "C" around the public declarations in that header.

src/crypto/x11/luffa.cpp (8)

73-103: Guarding non-parallel RC tables: OK.

Conditionally excluding RC00/RC14 paths is consistent with forcing the parallel variant.


123-154: Guarding additional non-parallel RC tables: OK.

RC20/RC34 are now compiled out under the forced parallel mode.


652-652: C++-safe cast in luffa5_close: OK.

static_cast<unsigned char*> is the right fix in C++.


689-696: Typed Luffa-512 init: OK.

Initialization logic unchanged, just typed context.


700-703: Typed Luffa-512 update: OK.


707-710: Typed Luffa-512 close: OK.


714-718: Typed Luffa-512 addbits-and-close: OK.

Re-init after close preserves reusability semantics.


38-38: Force-enable SPH_LUFFA_PARALLEL — confirm intent

The file hard-defines the parallel path and still contains excluded scalar branches, so the non-parallel implementation will be removed at compile time.

  • src/crypto/x11/luffa.cpp
    • Line ~38: #define SPH_LUFFA_PARALLEL 1
    • Lines 73 and 123: #if !SPH_LUFFA_PARALLEL (scalar code paths that will be excluded)

Please verify that no other files or build configurations expect the scalar path and that all target platforms/benchmarks support the parallel-only implementation.

src/crypto/x11/jh.cpp (4)

392-397: LGTM! Clean initialization implementation.

The function correctly initializes the JH context with the provided IV, resets the pointer and block counter.


399-436: LGTM! Well-structured core processing function.

The buffering logic correctly handles partial blocks and processes complete blocks with the compression function. The block counter increment is properly placed after processing.


438-464: LGTM! Correct finalization with proper padding.

The close function correctly handles padding, appends the bit length in big-endian format, and produces the final hash output from the upper half of the state as per JH-512 specification.


467-492: LGTM! Clean public API implementation.

The public API functions properly wrap the internal functions with typed context pointers, following the consistent pattern across the SPH library.

src/crypto/x11/simd.cpp (4)

1029-1034: LGTM! Consistent API update for typed contexts.

The function signature correctly uses typed sph_simd_big_context* instead of void*, and properly updates all field references from sc-> to cc->.


1037-1056: LGTM! Consistent context usage throughout update function.

All references to context fields have been properly updated to use cc->.


1070-1087: LGTM! Proper type safety with static_cast.

The finalize function correctly uses static_cast<unsigned char*> for the destination pointer, improving type safety.


1089-1112: LGTM! Clean public API with typed contexts.

The public 512-bit SIMD functions now use typed sph_simd512_context* instead of void*, following the consistent pattern across the SPH library and improving type safety.

src/crypto/x11/blake.cpp (3)

42-44: LGTM! Simplified 64-bit configuration.

The conditional compilation for SPH_COMPACT_BLAKE_64 has been simplified to only depend on SPH_SMALL_FOOTPRINT_BLAKE, removing the complex SPH_64 dependency check as the code now assumes 64-bit targets.


603-603: LGTM! Improved type safety with static_cast.

The use of static_cast<unsigned char*> for the destination pointer improves type safety.


609-635: LGTM! Consistent typed context API.

The public Blake-512 functions now use typed sph_blake512_context* instead of void*, following the consistent pattern across the SPH library and improving type safety.

src/Makefile.am (3)

76-77: LGTM! Proper integration of SPH crypto library.

The SPH crypto library is correctly added as a separate static library component.


723-756: LGTM! Well-structured SPH library configuration.

The SPH crypto library build configuration is properly set up with:

  • Static library configuration consistent with other crypto libraries
  • Optional O3 optimization via SPHLIB_CXXFLAGS
  • Small footprint flags for CubeHash and JH to optimize binary size
  • Complete list of SPH sources for X11 hashing

1022-1022: LGTM! Correct inclusion of SPH sources in consensus library.

The SPH crypto sources are properly included in the dash consensus library to ensure X11 hashing is available for consensus validation.

src/crypto/x11/aes_helper.h (5)

55-93: LGTM! Clean constexpr implementation of GF(2^8) arithmetic.

The GF(2^8) arithmetic utilities are well-implemented using modern C++ constexpr features, with proper compile-time table generation for multiplication operations.


96-110: LGTM! Correct AES S-box generation.

The constexpr S-box generation correctly implements the AES S-box using the affine transformation over GF(2^8).


117-134: LGTM! Efficient T-table implementation with validation.

The AES T-tables are correctly generated for optimized round computations, and the static_assert provides compile-time validation of the table values.


137-152: LGTM! Clean inline AES round helpers.

The AES round helper functions are properly implemented as inline functions for efficient code generation. The implementation correctly uses the precomputed T-tables for fast AES round evaluation.


46-47: Header guard OK — consistent with other SPH headers

The header guard in src/crypto/x11/aes_helper.h is SPH_AES_HELPER_H__, matching the SPH_H_ pattern used across other SPH headers. No change required.

  • Confirmed: src/crypto/x11/aes_helper.h — SPH_AES_HELPER_H__ — OK
  • Files checked (examples): src/crypto/x11/sph_types.h, sph_skein.h, sph_simd.h, sph_shavite.h, sph_luffa.h, sph_keccak.h, sph_jh.h
src/crypto/x11/sph_types.h (6)

59-62: Fixed-width type aliases look good

Switching to explicit fixed-width aliases reduces ambiguity and matches the project’s C++20 guidelines.


70-73: Good move to std::rotl/std::rotr

Using std::rotl/rotr guarantees well-defined rotation semantics across compilers and architectures.


75-75: Confirm ALWAYS_INLINE semantics across compilers

SPH_INLINE now aliases ALWAYS_INLINE. Ensure ALWAYS_INLINE expands to include “inline” (or equivalent) on all supported toolchains to preserve inlining of these tiny helpers. If ALWAYS_INLINE already contains inline/__forceinline attributes, ignore.


295-305: Byte-swap consolidation is correct

Delegating to internal_bswap_32/64 centralizes platform specifics and should be optimal with modern compilers.

Also applies to: 313-317


64-69: SPH_C32/64 braced-init and SPH_T32/T64 — verification complete, no action required

Short summary:

  • I inspected src/crypto/x11. SPH_C32/64 are defined as list-initialization (sph_u32{x}/sph_u64{x}) and SPH_T32/T64 are identity macros.
  • I searched usages across the x11 code: SPH_C32/64 are used almost exclusively with integer literals, and SPH_T32/T64 are applied to values already of the correct width or to expressions explicitly cast to sph_u32/sph_u64.

Key locations checked:

  • Definition: src/crypto/x11/sph_types.h (SPH_C32/64 and SPH_T32/T64).
  • Literal-heavy uses (no narrowing risk): src/crypto/x11/luffa.cpp, keccak.cpp, blake.cpp, skein.cpp, bmw.cpp, groestl.cpp, jh.cpp (many SPH_C32/SPH_C64 calls with hex literals).
  • SPH_T32/T64 applied to correctly-typed values or explicit casts: e.g. luffa.cpp (lines ~244–247: SPH_T32((sph_u32)(u)) / (u>>32)), shavite.cpp (~872–878: SPH_T32 on sc->count*), skein.cpp (~279–307: SPH_T64 on 64-bit additions).

Conclusion:

  • No call sites requiring implicit truncation were found, and no non-literal/wider arguments to SPH_C32/64 were detected in the x11 sources scanned. No behavioral change or narrowing error is expected from the current definitions.

55-57: C++20 header usage — verified (no C / pre-C++20 TUs include sph_types.h)

Brief: I searched the repo for direct includers and build settings. sph_types.h is only included by C++ headers in crypto/x11 and the build is configured for C++20.

Files referencing sph_types.h (direct includes):

  • src/crypto/x11/sph_echo.h
  • src/crypto/x11/sph_skein.h
  • src/crypto/x11/sph_shavite.h
  • src/crypto/x11/sph_luffa.h
  • src/crypto/x11/sph_jh.h
  • src/crypto/x11/sph_keccak.h
  • src/crypto/x11/sph_blake.h
  • src/crypto/x11/sph_simd.h
  • src/crypto/x11/sph_cubehash.h
  • src/crypto/x11/sph_bmw.h
  • src/crypto/x11/aes_helper.h
  • src/crypto/x11/sph_groestl.h
    (also src/crypto/x11/sph_types.h itself)

Build-system evidence C++20 is enabled:

  • CMakeLists.txt:11 — set(CMAKE_CXX_STANDARD 20)
  • depends/Makefile:52 — CXX_STANDARD ?= c++20

Conclusion: no C or pre-C++20 translation units directly include this header and the project defaults to C++20. No change required.

Comment on lines +381 to +383
dnl If not debugging, enable more aggressive optimizations for sphlib sources
AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Respect user overrides when adding -O3 to SPHLIB_CXXFLAGS

Currently, -O3 is appended whenever available, even if SPHLIB_CXXFLAGS is pre-set by the user. Guard the optimization so users can override it by setting SPHLIB_CXXFLAGS in the environment.

Apply:

-  dnl If not debugging, enable more aggressive optimizations for sphlib sources
-  AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])
+  dnl If not debugging, enable more aggressive optimizations for sphlib sources
+  dnl Respect user overrides: only add -O3 if SPHLIB_CXXFLAGS is unset
+  if test -z "$SPHLIB_CXXFLAGS"; then
+    AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])
+  fi

Optional: expose SPHLIB_CXXFLAGS in help and make intent clearer:

  • Add near other ARG_VARs:
    AC_ARG_VAR([SPHLIB_CXXFLAGS], [Extra C++ flags applied only to SPH sources (default: try -O3 if supported when not debugging)])
  • Print SPHLIB_CXXFLAGS in the summary to ease diagnostics.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dnl If not debugging, enable more aggressive optimizations for sphlib sources
AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])
dnl If not debugging, enable more aggressive optimizations for sphlib sources
dnl Respect user overrides: only add -O3 if SPHLIB_CXXFLAGS is unset
if test -z "$SPHLIB_CXXFLAGS"; then
AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])
fi

Comment on lines 327 to 331
sph_enc16be(void *dst, unsigned val)
{
((unsigned char *)dst)[0] = (val >> 8);
((unsigned char *)dst)[1] = val;
WriteBE16(static_cast<unsigned char*>(dst), val);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

16-bit encode/decode: prefer uint16_t for clarity and to avoid unintended narrowing

These APIs take/return “unsigned” but call Read/WriteBE/LE16 which operate on uint16_t. While implicit narrowing to 16 bits is likely intentional, the signatures could be clearer and safer if they used uint16_t consistently. At minimum, add a short note that values are truncated to 16 bits.

Also, please double-check the return type of ReadLE16/ReadBE16 in src/crypto/common.h. The provided snippet for ReadLE16 shows a void return, which would be a compile error. Validate both are declared to return uint16_t:

Also applies to: 339-343, 353-356, 365-368


🏁 Script executed:

#!/usr/bin/env bash
set -euo pipefail
rg -nA3 'Read(BE|LE)16\s*\(' src/crypto/common.h

Length of output: 297


🏁 Script executed:

#!/usr/bin/env bash
set -euo pipefail
echo "---- sph_types.h (around reported lines) ----"
sed -n '320,380p' src/crypto/x11/sph_types.h || true

echo
echo "---- grep for 16-bit sph helpers in repo ----"
rg -n --hidden --no-ignore 'sph_enc16|sph_dec16' src || true

echo
echo "---- common.h ReadLE16/ReadBE16 declarations ----"
sed -n '1,120p' src/crypto/common.h || true

Length of output: 4337


Prefer uint16_t for the 16-bit sph enc/dec helpers (avoid implicit truncation)

Verified: ReadLE16/ReadBE16 in src/crypto/common.h return uint16_t (and WriteLE16/WriteBE16 take uint16_t), so the sph helpers using plain unsigned cause implicit narrowing. Change the API to use uint16_t or at minimum document truncation.

Files/locations to update:

  • src/crypto/x11/sph_types.h
    • sph_enc16be — lines ~327-331: change parameter type to uint16_t
    • sph_dec16be — lines ~339-343: change return type to uint16_t
    • sph_enc16le — lines ~353-356: change parameter type to uint16_t
    • sph_dec16le — lines ~365-368: change return type to uint16_t
  • Call site to verify after change:
    • src/crypto/x11/echo.cpp:396 — sph_enc16le(buf + (sizeof sc->buf) - 18, out_size_w32 << 5); ensure the value fits in 16 bits or perform an explicit cast/truncation.

Suggested replacements (minimal):

  • sph_enc16be:
    static SPH_INLINE void
    sph_enc16be(void dst, uint16_t val)
    {
    WriteBE16(static_cast<unsigned char
    >(dst), val);
    }

  • sph_dec16be:
    static SPH_INLINE uint16_t
    sph_dec16be(const void src)
    {
    return ReadBE16(static_cast<const unsigned char
    >(src));
    }

  • sph_enc16le:
    static SPH_INLINE void
    sph_enc16le(void dst, uint16_t val)
    {
    WriteLE16(static_cast<unsigned char
    >(dst), val);
    }

  • sph_dec16le:
    static SPH_INLINE uint16_t
    sph_dec16le(const void src)
    {
    return ReadLE16(static_cast<const unsigned char
    >(src));
    }

Reason: using uint16_t matches the common.h helpers, makes intent clear, and prevents accidental promotion/narrowing surprises. Please apply the changes and review call sites that pass wider integer expressions.

@UdjinM6
Copy link

UdjinM6 commented Aug 14, 2025

macbook m1pro, --enable-debug --enable-crash-hooks --enable-werror

develop:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              247.60 |        4,038,748.96 |    1.2% |      0.01 | `X11_0032b_single`
|               99.31 |       10,069,225.93 |    1.6% |      0.01 | `X11_0080b_single`
|               64.64 |       15,470,418.64 |    0.8% |      0.01 | `X11_0128b_single`
|               16.65 |       60,077,527.50 |    0.1% |      0.01 | `X11_0512b_single`
|                9.20 |      108,738,453.20 |    2.4% |      0.01 | `X11_1024b_single`
|                1.33 |      751,103,182.80 |    1.4% |      0.01 | `X11_1M`
|                5.14 |      194,458,106.14 |    0.3% |      0.01 | `X11_2048b_single`

this PR:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              454.24 |        2,201,461.69 |    0.0% |      0.01 | `Pow_X11_0032b`
|              236.19 |        4,233,832.86 |    5.4% |      0.01 | :wavy_dash: `Pow_X11_0080b` (Unstable with ~49.0 iters. Increase `minEpochIterations` to e.g. 490)
|              177.41 |        5,636,699.32 |    2.9% |      0.01 | `Pow_X11_0128b`
|               46.88 |       21,329,210.67 |    0.4% |      0.01 | `Pow_X11_0512b`
|               25.31 |       39,504,360.41 |    1.1% |      0.01 | `Pow_X11_1024b`
|                2.45 |      408,712,441.37 |   12.8% |      0.03 | :wavy_dash: `Pow_X11_1M` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
|               12.17 |       82,186,375.17 |   13.8% |      0.01 | :wavy_dash: `Pow_X11_2048b` (Unstable with ~35.5 iters. Increase `minEpochIterations` to e.g. 355)

2-3x slower for me 🤷‍♂️

@kwvg kwvg marked this pull request as draft August 14, 2025 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants