Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use VPTERNLOG for 3-operands boolean functions. #148

Closed
wants to merge 1 commit into from

Conversation

Shark64
Copy link
Contributor

@Shark64 Shark64 commented May 24, 2024

Here are the changes for SM3. I've switched the boolean macros to vpternlog and for the 3-way xor in the body.
Two more minor changes ;) : there's no need to "jump the jmp" at the end of the main loop, just loop back if count !=0.
Also a couple of vprold reg1, reg1, IMM immediatly followed by vmovups reg2, reg1 can be encoded simply as vprold reg2, reg1, IMM.
On my PC this made sm3_mb_vs_ossl_perf go from ~4.4GB/s to ~5.1GB/s :)

@pablodelara
Copy link
Contributor

Here are the changes for SM3. I've switched the boolean macros to vpternlog and for the 3-way xor in the body. Two more minor changes ;) : there's no need to "jump the jmp" at the end of the main loop, just loop back if count !=0. Also a couple of vprold reg1, reg1, IMM immediatly followed by vmovups reg2, reg1 can be encoded simply as vprold reg2, reg1, IMM. On my PC this made sm3_mb_vs_ossl_perf go from ~4.4GB/s to ~5.1GB/s :)

Which CPU are you using? That throughput is pretty high! :)

@Shark64
Copy link
Contributor Author

Shark64 commented May 24, 2024

Which CPU are you using? That throughput is pretty high! :)

Rocketlake, an i7-11700k, perhaps it's the low latency DDR4 that helps more than the CPU core itself

@pablodelara
Copy link
Contributor

Which CPU are you using? That throughput is pretty high! :)

Rocketlake, an i7-11700k, perhaps it's the low latency DDR4 that helps more than the CPU core itself

No, these tests use warm data. You must be using turbo boost, so your CPU frequency goes to 5GHz.

@Shark64
Copy link
Contributor Author

Shark64 commented May 24, 2024

No, these tests use warm data. You must be using turbo boost, so your CPU frequency goes to 5GHz.

Yeah you're right, i hadn't checked turbostat frequency but now i noticed the single core goes up to 5.1GHz for a brief time. So it's the CPU after all :)

@pablodelara
Copy link
Contributor

Code is now merged, thanks for the work @Shark64!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants