You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If FMA is detected we should allocate 4x buffer and to the duplication in ResizeKernelMap.Calculate, which should be much cheaper than doing it in every convolution:
antonfirsov
changed the title
Pre-duplicate kernels in ResizeKernelMap for faster FMA convolution
Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution
Jan 21, 2021
As @saucecontrol pointed out in his comment, we can get rid of
VPERMS
in the following code:ImageSharp/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernel.cs
Lines 104 to 112 in e2211c3
If FMA is detected we should allocate 4x buffer and to the duplication in
ResizeKernelMap.Calculate
, which should be much cheaper than doing it in every convolution:ImageSharp/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernelMap.cs
Lines 115 to 120 in e2211c3
The text was updated successfully, but these errors were encountered: