-
Notifications
You must be signed in to change notification settings - Fork 80
Description
Hi Roman, sorry to ask you something maybe naïve, but I have add a little class to a FIR Benchmark produce by jatinchowdhury18 repo, you can find the issue here : New promising benchmark using Fastor C++
Fastor outperform the other inner_product implementation except with small kernel size and I'm sure that I don't use Fastor correctly. In the main processing loop (over the sample buffer), I can't call Fastor::inner directly with a subview like that:
buffer[n] = Fastor::inner(z(Fastor::seq(zPtr, zPtr + N)), h);where N is the templated FIR order, h is the FIR coefficients tensor of the impulse response, z a double-buffer state tensor related of the z-N essence of the FIR equation and buffer the sample buffer that receive the discrete convolution result.
I need to cast the subview like that to allow compilation:
Fastor::Tensor<float, N> zn = z(Fastor::seq(zPtr, zPtr + N, 1));
buffer[n] = Fastor::inner(zn, h);Even if the method outperform the other method on kernel > 32 (in the benchmark of power of 2), I'm pretty sure that the assignment operator in the main loop is a bottleneck for smaller sizes kernels.
Why can I directly call Fastor::inner with the subview ? What is wrong with my code ?
Thank you very much for you answer and your time !!!