Optimized PTX IntrinsicMath implementation to use LibDevice. #1151
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Depends on #1148.
Restructured ILGPU and ILGPU.Algorithms, moving the
IntrinsicMath
implementations into ILGPU itself.XMath
has other functions that are not part ofIntrisicMath
, so it will stay as-is for now.CLMath
in ILGPU.Algorithms only needed to supportRcp
andLog(x,y)
. These have been moved into ILGPU, andCLMath
has been removed.PTXMath
in ILGPU.Algorithms provided a number of math functions using Cordic implemenentations. Now that pre-generated LibDevice is available in ILGPU, switched all the IntrinsicMath functions to call LibDevice for Cuda GPUs. The pre-generated LibDevice PTX code only works on>= SM_60
, so the Cordic functions in ILGPU.Algorithms have been modified to only register for< SM_60
. Otherwise, they are no longer used.Unit Tests for
IntrinsicMath
have not been implemented. These are currently running via the ILGPU.Algorithms unit tests.