Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

antonfirsov · 2021-01-21T10:13:02Z

As @saucecontrol pointed out in his comment, we can get rid of VPERMS in the following code:

ImageSharp/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernel.cs

Lines 104 to 112 in e2211c3

    
           result256_0 = Fma.MultiplyAdd( 
        
               Unsafe.As<Vector4, Vector256<float>>(ref rowStartRef), 
        
               Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)bufferStart).AsSingle(), mask), 
        
               result256_0); 
        
           result256_1 = Fma.MultiplyAdd( 
        
               Unsafe.As<Vector4, Vector256<float>>(ref Unsafe.Add(ref rowStartRef, 2)), 
        
               Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)(bufferStart + 2)).AsSingle(), mask), 
        
               result256_1);

If FMA is detected we should allocate 4x buffer and to the duplication in ResizeKernelMap.Calculate, which should be much cheaper than doing it in every convolution:

ImageSharp/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernelMap.cs

Lines 115 to 120 in e2211c3

    
           public static ResizeKernelMap Calculate<TResampler>( 
        
               in TResampler sampler, 
        
               int destinationSize, 
        
               int sourceSize, 
        
               MemoryAllocator memoryAllocator) 
        
               where TResampler : struct, IResampler

The text was updated successfully, but these errors were encountered:

antonfirsov added needs triage area:performance and removed needs triage labels Jan 21, 2021

antonfirsov added this to the Future milestone Jan 21, 2021

antonfirsov added the up-for-grabs label Jan 21, 2021

antonfirsov changed the title ~~Pre-duplicate kernels in ResizeKernelMap for faster FMA convolution~~ Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution Jan 21, 2021

antonfirsov mentioned this issue Jan 28, 2021

Speed improvements to resize convolution (no vpermps w/ FMA) #1518

Closed

4 tasks

JimBobSquarePants linked a pull request Aug 15, 2024 that will close this issue

WIP - Speed improvements to resize convolution (no vpermps w/ FMA) #2793

Draft

4 tasks

lizard-boy linked a pull request Nov 1, 2024 that will close this issue

WIP - Speed improvements to resize convolution (no vpermps w/ FMA) grepdemos/ImageSharp#3

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

antonfirsov commented Jan 21, 2021

Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

Comments

antonfirsov commented Jan 21, 2021