Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving normalisation problems #84

Merged
merged 5 commits into from
Feb 28, 2021
Merged

Conversation

Manza12
Copy link
Contributor

@Manza12 Manza12 commented Feb 18, 2021

I solved a normalisation problem in the CQT:
The essential inequality of convolution is that for all p in [1, \infty] you have
|| f * g ||_p <= || f ||_p · || g ||_1 ( and, because of the commutativity of the convolution, you also have || f * g ||_p = || g * f ||_p <= || f ||_1 · || g ||_p )
Since CQT is convolution based, we shall keep this inequality.
However, even if in the code there was indeed a normalisation of the kernels, after, during the convolution, there was another scaling factor (torch.sqrt(self.lenghts.view(-1,1))) dependent on the length of the window. I think this is not correct.
I added another scaling factor of 2, instead: the idea is that CQT only cares about positive frequencies so a sinus of amplitude 1 will have 0.5 in the bin of its frequency. It will be nice to conserve the equality (i.e.: having 1 in its bin frequency) so I added the scaling factor 2. This is, in a sense like combining positive and negative frequencies' contribution.
I hope this will be useful.

…le factor after convolution.

- Adding .idea to the .gitignore
…nserving constant Q factor) windows given by a tuple.

- Information added to the documentation regarding the possibility of using the gaussian window and the need of give the attenuation.
@Manza12
Copy link
Contributor Author

Manza12 commented Feb 18, 2021

I also change the behavior of get_windows to handle gaussian window.
And I added the frequencies of the central bins as an attribute of the CQT object.

@KinWaiCheuk
Copy link
Owner

I solved a normalisation problem in the CQT:
The essential inequality of convolution is that for all p in [1, \infty] you have
|| f * g ||_p <= || f ||_p · || g ||_1 ( and, because of the commutativity of the convolution, you also have || f * g ||_p = || g * f ||_p <= || f ||_1 · || g ||_p )
Since CQT is convolution based, we shall keep this inequality.
However, even if in the code there was indeed a normalisation of the kernels, after, during the convolution, there was another scaling factor (torch.sqrt(self.lenghts.view(-1,1))) dependent on the length of the window. I think this is not correct.
I added another scaling factor of 2, instead: the idea is that CQT only cares about positive frequencies so a sinus of amplitude 1 will have 0.5 in the bin of its frequency. It will be nice to conserve the equality (i.e.: having 1 in its bin frequency) so I added the scaling factor 2. This is, in a sense like combining positive and negative frequencies' contribution.
I hope this will be useful.

That's interesting, I didn't know that. At that time, I was implmenting nnAudio by keeping the output as close to librosa as possible.

I ran a quick test comparing your pull request with the librosa output and existing nnAudio output and it looks like this

image

Am I missing anything?

Here's the script for my quick test, you may want to change some of the paths.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import chirp, sweep_poly
from librosa import cqt,stft, note_to_hz, pseudo_cqt
from librosa.feature import melspectrogram
import sys
from scipy.io import wavfile
from librosa import amplitude_to_db
from nnAudio import Spectrogram # Importing nnAudio 0.2.1

sys.path.insert(0,'../nnAudio_new_version/Installation/nnAudio') # path to the files of this pull request
import Spectrogram as Spectrogram2 # importing the pull request version
import torch
import torch.nn as nn

cmap = 'magma'
fmin = note_to_hz('A1') # for CQT use

fs = 44100
t = 10
f0 = 55
f1 = 22050

# parameters for CQT
bins_per_octave = 24
n_bins = int(bins_per_octave*7.5)

epsilon = 1e-5 # for taking log
device = 'cpu'

# Getting a log sine sweep

s = np.linspace(0,t, fs*t)

x1 = chirp(s, f0, t, f1, method='logarithmic')
x2 = chirp(s, f0/2**(4/12), t, f1/2**(4/12), method='logarithmic')
x=(x1+x2)/2*np.sqrt(1/2)
x = x.astype(dtype=np.float32)

plt.figure(figsize=(6,1.5))
plt.plot(x[:10000])
plt.axis('off')

# CQT

fmin = note_to_hz('A2') # for CQT use
r=2
bins_per_octave = 12*r
n_bins = 80*r-1
CQT_layer_current = Spectrogram.CQT1992v2(fs, fmin=fmin, n_bins=n_bins, bins_per_octave=bins_per_octave)
CQT_layer_new = Spectrogram2.CQT1992v2(fs, fmin=fmin, n_bins=n_bins, bins_per_octave=bins_per_octave)

cqt_x = CQT_layer_current(torch.tensor(x[None, None, :]), output_format='Magnitude')
cqt_x_new = CQT_layer_new(torch.tensor(x[None, None, :]), output_format='Magnitude')
output_lib = cqt(x, sr=fs, fmin=fmin, n_bins=n_bins, bins_per_octave=bins_per_octave)

fig , ax = plt.subplots(1,2, dpi=200, figsize=(10,5))

im1 = ax[0].imshow(20*np.log10(cqt_x[0]), aspect='auto', origin='lower', cmap=cmap, vmax=30, vmin=-100)
im2 = ax[1].imshow(20*np.log10(abs(output_lib)), aspect='auto', origin='lower', cmap=cmap, vmax=30, vmin=-100)
ax[1].set_yticks([])

ax[1].set_yticks([])
ax[0].title.set_text("nnAudio")
ax[1].title.set_text("Librosa")
fig.subplots_adjust(wspace = 0.05)
# fig.colorbar(im1,ax=ax[0])

timestep = 600
plt.plot(amplitude_to_db(cqt_x[0][:,timestep].cpu().numpy()))
plt.plot(amplitude_to_db(abs(output_lib)[:,timestep]), '--')
plt.plot(amplitude_to_db(cqt_x_new[0][:,timestep].cpu().numpy()))
plt.legend(['nnAudio 0.2.1','librosa', 'pull request'])
plt.title('Spectral magnitude slicing at time step 450')

@KinWaiCheuk
Copy link
Owner

I also change the behavior of get_windows to handle gaussian window.

When I test the Gaussian window, the result is different from librosa. I am using window=('gaussian',51).
Since I haven't used this kind of window function before, I have no idea which output (the librosa one or the nnAudio) is the correct one.

I believe you are more knowledgable regarding this issue
index

@Manza12
Copy link
Contributor Author

Manza12 commented Feb 23, 2021

Indeed, your normalization coincides with the librosa one!
However, I have checked the librosa reference for the CQT and in the second part (signal model) I found the CQT original definition.
They don't say anything about normalization, but if we admit that the windows are normalized in the L1 space, the librosa results doesn't hold: I attach a pdf with the computations in order to prove that the magnitude of the CQT coefficients should be lower than 1 if the signal is comprised between -1 and 1 (which is, I think, the case in the example). This means that the CQT should be inferior than 0 in dB scale. However, in the plot it appears to be some coefficients with magnitude greater than 0 in dB scale.

@KinWaiCheuk
Copy link
Owner

This means that the CQT should be inferior than 0 in dB scale. However, in the plot it appears to be some coefficients with magnitude greater than 0 in dB scale.

I see, then it does seem that librosa is having some issue. Given that there will be people switching from librosa to nnAudio and want to keep the same result, how about adding an argument to allow the user to use the librosa version of normalisation or your normalisation?

Probably in the forward function inside CQT1992v2, we add an argument called something like librosa_output=True (or you can come up a better name) to let the user to choose what normalisation they want. We will need to explain what librosa_output is for in the doc string.

- Normalization is a parameter of the forward method of CQT1992v2. The options are 'librosa' to fit the normalization of librosa, 'convolutional' to preserve the inequalities of the CQT viewed as a wavelet transform, and 'wrap' to wrap positive and negative frequencies into the positive ones.
@Manza12
Copy link
Contributor Author

Manza12 commented Feb 25, 2021

I have commited the inclusion of the parameter for the normalization; I hope it works for you.
I also added the filter scale factor (parameter from librosa CQT and requested in the issue #54).

@Manza12
Copy link
Contributor Author

Manza12 commented Feb 25, 2021

I noticed that my code may raise errors because I added the frequencies as an output of the create_cqt_kernels function. I have two options:

  • either I remove this output
  • or I can add this output to the other CQT's
    Please tell me what do you prefer (this was actually a minor change I did in the previous commits that we haven't discussed).
    The point of adding it to the CQT objects is that then we can recover the frequencies as an atribute of the CQT layer, instead of recalculating them from librosa.cqt_frequencies.

@KinWaiCheuk
Copy link
Owner

I noticed that my code may raise errors because I added the frequencies as an output of the create_cqt_kernels function. I have two options:

  • either I remove this output
  • or I can add this output to the other CQT's
    Please tell me what do you prefer (this was actually a minor change I did in the previous commits that we haven't discussed).
    The point of adding it to the CQT objects is that then we can recover the frequencies as an atribute of the CQT layer, instead of recalculating them from librosa.cqt_frequencies.

You are right, adding this cqt frequencies as an output and store as the CQT class attribute will come in handy in the future. So, let's keep this feature. i.e. we will need to change other CQT classes to prevent the error raised.

I think that is just a trivial modification, I can do the rest. Thanks for your contribution!

@KinWaiCheuk KinWaiCheuk merged commit 4cd2748 into KinWaiCheuk:master Feb 28, 2021
@KinWaiCheuk KinWaiCheuk linked an issue Apr 8, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is there a param named filter scale factor in nnAudio ???
2 participants