Error occurs when executing 'enhanced_speech = tg(noisy_speech)' #101

Yizai30 · 2023-12-10T09:39:19Z

Traceback (most recent call last):
File "D:\work_directory\Anti-Fraud\audios\scripts\use_noisereduce.py", line 23, in
enhanced_speech = tg(noisy_speech)
File "D:\environment\Python\3.10.11\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\environment\Python\3.10.11\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\environment\Python\3.10.11\lib\site-packages\noisereduce\torchgate\torchgate.py", line 216, in forward
raise Exception(f"x must be bigger than {self.win_length * 2}")
Exception: x must be bigger than 2048

How to make it through? I'll be appreciated if anyone could help.

Yizai30 · 2023-12-10T10:40:22Z

In my case, the input audio data shape is (513024, 2), and I've solved by swapping the 2 dimensions before processing, then swapping them back after processing.

# swap dimension 0 and 1
print(data.shape)
data = np.swapaxes(data, 0, 1)
print(data.shape)

noisy_speech = torch.from_numpy(data)
noisy_speech = noisy_speech.float().to(device)

# speech processing
enhanced_speech = tg(noisy_speech)

# swap dimension back
print(enhanced_speech.shape)
enhanced_speech = torch.transpose(enhanced_speech, 0, 1)
print(enhanced_speech.shape)

Additionally, I've come into another issue that it generates speech as if it were randomly generated and is accompanied by some of the speaker's original voice.
And it has this warning in my console:

UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at ..\aten\src\ATen\native\Convolution.cpp:1009.)
  conv1d(

How do I fix it?

nuniz · 2023-12-10T20:05:25Z

Hi @Yizai30,

Just wanted to let you know that the input format for this function is [batch, audio_length]. For an example, check out this notebook.

We're also aware of the warning you encountered. This is caused by using "same" padding with an even kernel size, please see this issue.

We're working on a fix for this in a future release, but in the meantime, you can adjust the size of the smoothing filter using the freq_mask_smooth_hz and time_mask_smooth_ms parameters.

For nonstationary gating, ensure the n_movemean_nonstationary parameter is set to an odd value.

grzegorz700 · 2024-08-14T09:49:07Z

I've found one solution/workaround to the problem of not matching the shape after applying noisereduce (implications of UserWarning: Using padding='same' ...). To get the exact shape after using the algorithm:

def audio_padding_before_stft(audio_tensor, hop_length, mode='constant'):
    pad_amount = (hop_length - (audio_tensor.size(-1) % hop_length)) % hop_length
    if pad_amount > 0:
        pad_left = pad_amount // 2
        pad_right = pad_amount - pad_left
        audio_tensor = F.pad(audio_tensor, (pad_left, pad_right), mode=mode)
    return audio_tensor


audio_tensor, sr = ...
tg = TorchGate(sr, ...)
audio_tensor = audio_padding_before_stft(audio_tensor, tg.hop_length)

I'm not sure about the best mode for padding, but I think about this (constant) and reflect (default in stft).
The user warning won't disappear, but we got the expected shape in processing.

nuniz self-assigned this Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error occurs when executing 'enhanced_speech = tg(noisy_speech)' #101

Error occurs when executing 'enhanced_speech = tg(noisy_speech)' #101

Yizai30 commented Dec 10, 2023

Yizai30 commented Dec 10, 2023

nuniz commented Dec 10, 2023 •

edited

Loading

grzegorz700 commented Aug 14, 2024

Error occurs when executing 'enhanced_speech = tg(noisy_speech)' #101

Error occurs when executing 'enhanced_speech = tg(noisy_speech)' #101

Comments

Yizai30 commented Dec 10, 2023

Yizai30 commented Dec 10, 2023

nuniz commented Dec 10, 2023 • edited Loading

grzegorz700 commented Aug 14, 2024

nuniz commented Dec 10, 2023 •

edited

Loading