Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use embedded broadcast for avx512 constants. #139

Closed
wants to merge 4 commits into from

Conversation

Shark64
Copy link
Contributor

@Shark64 Shark64 commented Apr 11, 2024

As asked in the closed pull request (#123) i've reworked the patch to just two commits: the first one is to replicate the constants using embedded broadcast for EVEX-coded instructions. This saves quite a lot of file size, especially for SHA512.
The second patch is for SM3: basically i've tried to use VPTERNLOG wherever possibile, removing a couple of instructions from the critical path in each macro.
On my system (Linux_x86-64, RocketLake CPU) `make test' passes all the tests without any errors, but I haven't tested it on other platforms.

Use embedded broadcast for constants & VPTERNLOG for 3-way bitwise
logical operations.
@pablodelara
Copy link
Contributor

Hi @Shark64. There is only one commit with all the changes proposed. Could you have multiple commits or better, multiple PRs to review this independently? Thanks! (also, the sign-off line should be at the end of the commit)

@Shark64
Copy link
Contributor Author

Shark64 commented May 8, 2024

Hi, ok, you mean a different PR for each hashing function? I made just one commit because it's the same kind of change for all the hashes, but i can make a separate PR for each one if you prefer.
Thanks!

@pablodelara
Copy link
Contributor

No need to do it for every hash. What I meant is that you are also doing more than using broadcasts.
For instance, in SM3, you are also adding ternary logic. You are also replacing some instructions with others or using different addressing modes.

@Shark64
Copy link
Contributor Author

Shark64 commented May 13, 2024

Oh yeah now i understand what you meant. I did that because they seemed small enough changes that a different pull request and commit would clutter the git log; Do you prefer a separate pull request for them? I'll try to see if i can revert them in this branch, i'm not that expert with git wizardry but i'll give it a try ;)

@pablodelara
Copy link
Contributor

Oh yeah now i understand what you meant. I did that because they seemed small enough changes that a different pull request and commit would clutter the git log; Do you prefer a separate pull request for them? I'll try to see if i can revert them in this branch, i'm not that expert with git wizardry but i'll give it a try ;)

Hi @Shark64. Right, "git add -p" is your friend :) You can rebase your branch on top of latest code and add your changes one by one, keeping only the ones for the broadcast first, create a commit and then same for the other changes.
I'll submit comments on this.

We have pushed a fix for the clang-format, so no need for your other commit and we are going to push changes on sm3_mb, so I suggest waiting for this change (will be done shortly).

@@ -354,7 +354,7 @@ func(mh_sha1_murmur3_x64_128_block_avx512)
VMOVPS HH3, [mh_digests_p + 64*3]
VMOVPS HH4, [mh_digests_p + 64*4]
;a mask used to transform to big-endian data
vmovdqa64 SHUF_MASK, [PSHUFFLE_BYTE_FLIP_MASK]
vbroadcasti32x4 SHUF_MASK, [PSHUFFLE_BYTE_FLIP_MASK]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these broadcasts can be part of the first commit.

@@ -219,7 +219,7 @@ section .text
;; H becomes T2, then add T1 for A
;; D becomes D + T1 for E

vpaddd T1, H, TMP3 ; T1 = H + Kt
vpaddd T1, H, [TBL + ((%%ROUND)*4)]{1to16} ; T1 = H + Kt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you better leave this out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but I think that we need this to make broadcast work, otherwise we'll end loading the wrong constants (after removing the duplicated values in the .data section).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you are right, sorry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries :D

@@ -341,7 +341,7 @@ section .text
%define %%WT %1
%define %%OFFSET %2
mov inp0, [IN + (%%OFFSET*8)]
vmovups %%WT, [inp0+IDX]
vmovdqu64 %%WT, [inp0+IDX]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be another commits, turning vmovups into vmovdqu64

vmovups W13,[inp5+IDX]
vmovups W14,[inp6+IDX]
vmovups W15,[inp7+IDX]
lea IDX, [IN] ;; save the base address so the next load have 8bits offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this as it is for now.

@@ -162,6 +162,9 @@ FIELD _rsp, 8, 8
%define %%t0 %17
%define %%t1 %18

; keep the transpose masks in registers
vmovdqa32 TMP2, [PSHUFFLE_TRANSPOSE16_MASK1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid potential conflicts, I would pass these TMP2 and TMP3 as parameters of the macro

@@ -335,7 +334,7 @@ func(mh_sha256_block_avx512)

vmovdqa32 TMP3, [TBL] ; First K
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be needed then?

@pablodelara
Copy link
Contributor

Hi @Shark64. We are pretty almost at code freeze now. OK to leave it for next release?

@Shark64
Copy link
Contributor Author

Shark64 commented May 23, 2024

Hi @pablodelara, I was trying yesterday to rebase the code as you asked. Is the best way to just rebase my branch to the lastest code, save my patches somwhere else and then reapply them one by one? I ask because i ended up with having both my old commits and the new duplicate ones in the `git log', and i wasn't sure if it's the correct way of doing it. Thanks, sorry for the delay

@pablodelara
Copy link
Contributor

Hi @pablodelara, I was trying yesterday to rebase the code as you asked. Is the best way to just rebase my branch to the lastest code, save my patches somwhere else and then reapply them one by one? I ask because i ended up with having both my old commits and the new duplicate ones in the `git log', and i wasn't sure if it's the correct way of doing it. Thanks, sorry for the delay

Yes, that's what I do. Apply them one by one and resolve the conflicts.

@Shark64
Copy link
Contributor Author

Shark64 commented May 23, 2024

Yes, that's what I do. Apply them one by one and resolve the conflicts.
Ok, I was wondering if there was some kind of git trick that i was missing ;) .
I'll give it a try later today, just the changes to use embedded broadcast for avx512 in a single commit, right?

@pablodelara
Copy link
Contributor

Yes, that's what I do. Apply them one by one and resolve the conflicts. Ok, I was wondering if there was some kind of git trick that i was missing ;) . I'll give it a try later today, just the changes to use embedded broadcast for avx512 in a single commit, right?

I am sure there are other tricks, but I don't know them :) Yes, those for now, thanks!

@Shark64
Copy link
Contributor Author

Shark64 commented May 23, 2024

I made another branch with just the embedded broadcast changes, this way the git log looks less polluted to me. If it's ok with you I'll open another pull request to merge that?

@pablodelara
Copy link
Contributor

Closing this PR, in favour of the new one.

@Shark64 Shark64 deleted the embedded_broadcast2 branch May 24, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants