-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving normalisation problems #84
Conversation
…le factor after convolution. - Adding .idea to the .gitignore
…nserving constant Q factor) windows given by a tuple. - Information added to the documentation regarding the possibility of using the gaussian window and the need of give the attenuation.
I also change the behavior of get_windows to handle gaussian window. |
That's interesting, I didn't know that. At that time, I was implmenting nnAudio by keeping the output as close to librosa as possible. I ran a quick test comparing your pull request with the librosa output and existing nnAudio output and it looks like this Am I missing anything? Here's the script for my quick test, you may want to change some of the paths.
|
When I test the Gaussian window, the result is different from librosa. I am using |
Indeed, your normalization coincides with the librosa one! |
I see, then it does seem that librosa is having some issue. Given that there will be people switching from librosa to nnAudio and want to keep the same result, how about adding an argument to allow the user to use the librosa version of normalisation or your normalisation? Probably in the |
- Normalization is a parameter of the forward method of CQT1992v2. The options are 'librosa' to fit the normalization of librosa, 'convolutional' to preserve the inequalities of the CQT viewed as a wavelet transform, and 'wrap' to wrap positive and negative frequencies into the positive ones.
I have commited the inclusion of the parameter for the normalization; I hope it works for you. |
I noticed that my code may raise errors because I added the frequencies as an output of the create_cqt_kernels function. I have two options:
|
You are right, adding this cqt frequencies as an output and store as the I think that is just a trivial modification, I can do the rest. Thanks for your contribution! |
I solved a normalisation problem in the CQT:
The essential inequality of convolution is that for all p in [1, \infty] you have
|| f * g ||_p <= || f ||_p · || g ||_1 ( and, because of the commutativity of the convolution, you also have || f * g ||_p = || g * f ||_p <= || f ||_1 · || g ||_p )
Since CQT is convolution based, we shall keep this inequality.
However, even if in the code there was indeed a normalisation of the kernels, after, during the convolution, there was another scaling factor (torch.sqrt(self.lenghts.view(-1,1))) dependent on the length of the window. I think this is not correct.
I added another scaling factor of 2, instead: the idea is that CQT only cares about positive frequencies so a sinus of amplitude 1 will have 0.5 in the bin of its frequency. It will be nice to conserve the equality (i.e.: having 1 in its bin frequency) so I added the scaling factor 2. This is, in a sense like combining positive and negative frequencies' contribution.
I hope this will be useful.