Skip to content

Begin transition to making a release version #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 24, 2025
Merged

Conversation

gwoltman
Copy link
Collaborator

Made some existing #define options available as -use options instead. Feel free to come up with better names for these options.

gwoltman added 14 commits March 18, 2025 01:05
…xposed MIDDLE_LDS_TRANSPOSE setting with -use.

Intel Battlemage reportedly prefers LDS transpose off.
…g every 100K or 1M iterations.

I did not add this to the --help output in case you want to keep the option hidden.
Added -use ZEROHACK_H=1 in tailSquare to mirror the code in carryFused.
This is the best setting on Intel, AMD, and nVidia (at last until the next rocm optimizer change :)
Wrote alternate chainmul8 that uess fewer F64 ops (faster on low DP GPUs) but has worse roundoff error.
We will need data from some of these GPUs to decide if this chainMul8 version should be made an official FFT spec option.
Cleaned up terminology in math.cl csq and ccube macros.  Eliminated FancyUpdate macros.
There may be slight improvement in Z values.
This version uses fewer F64 ops, but is slower on Radeon 7 -- probably the rocm optimizer acting up.
New version is disabled.  I'll ask some users to see if it will be beneficial on other GPUs.
@preda preda merged commit d12753e into preda:master Mar 24, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants