Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error occurs when executing 'enhanced_speech = tg(noisy_speech)' #101

Open
Yizai30 opened this issue Dec 10, 2023 · 3 comments
Open

Error occurs when executing 'enhanced_speech = tg(noisy_speech)' #101

Yizai30 opened this issue Dec 10, 2023 · 3 comments
Assignees

Comments

@Yizai30
Copy link

Yizai30 commented Dec 10, 2023

Traceback (most recent call last):
File "D:\work_directory\Anti-Fraud\audios\scripts\use_noisereduce.py", line 23, in
enhanced_speech = tg(noisy_speech)
File "D:\environment\Python\3.10.11\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\environment\Python\3.10.11\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\environment\Python\3.10.11\lib\site-packages\noisereduce\torchgate\torchgate.py", line 216, in forward
raise Exception(f"x must be bigger than {self.win_length * 2}")
Exception: x must be bigger than 2048

How to make it through? I'll be appreciated if anyone could help.

@Yizai30
Copy link
Author

Yizai30 commented Dec 10, 2023

In my case, the input audio data shape is (513024, 2), and I've solved by swapping the 2 dimensions before processing, then swapping them back after processing.

# swap dimension 0 and 1
print(data.shape)
data = np.swapaxes(data, 0, 1)
print(data.shape)

noisy_speech = torch.from_numpy(data)
noisy_speech = noisy_speech.float().to(device)

# speech processing
enhanced_speech = tg(noisy_speech)

# swap dimension back
print(enhanced_speech.shape)
enhanced_speech = torch.transpose(enhanced_speech, 0, 1)
print(enhanced_speech.shape)

Additionally, I've come into another issue that it generates speech as if it were randomly generated and is accompanied by some of the speaker's original voice.
And it has this warning in my console:

UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at ..\aten\src\ATen\native\Convolution.cpp:1009.)
  conv1d(

How do I fix it?

@nuniz
Copy link
Collaborator

nuniz commented Dec 10, 2023

Hi @Yizai30,

Just wanted to let you know that the input format for this function is [batch, audio_length]. For an example, check out this notebook.

We're also aware of the warning you encountered. This is caused by using "same" padding with an even kernel size, please see this issue.

We're working on a fix for this in a future release, but in the meantime, you can adjust the size of the smoothing filter using the freq_mask_smooth_hz and time_mask_smooth_ms parameters.

For nonstationary gating, ensure the n_movemean_nonstationary parameter is set to an odd value.

@nuniz nuniz self-assigned this Dec 11, 2023
@grzegorz700
Copy link

I've found one solution/workaround to the problem of not matching the shape after applying noisereduce (implications of UserWarning: Using padding='same' ...). To get the exact shape after using the algorithm:

def audio_padding_before_stft(audio_tensor, hop_length, mode='constant'):
    pad_amount = (hop_length - (audio_tensor.size(-1) % hop_length)) % hop_length
    if pad_amount > 0:
        pad_left = pad_amount // 2
        pad_right = pad_amount - pad_left
        audio_tensor = F.pad(audio_tensor, (pad_left, pad_right), mode=mode)
    return audio_tensor


audio_tensor, sr = ...
tg = TorchGate(sr, ...)
audio_tensor = audio_padding_before_stft(audio_tensor, tg.hop_length)

I'm not sure about the best mode for padding, but I think about this (constant) and reflect (default in stft).
The user warning won't disappear, but we got the expected shape in processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants