-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error occurs when executing 'enhanced_speech = tg(noisy_speech)' #101
Comments
In my case, the input audio data shape is (513024, 2), and I've solved by swapping the 2 dimensions before processing, then swapping them back after processing. # swap dimension 0 and 1
print(data.shape)
data = np.swapaxes(data, 0, 1)
print(data.shape)
noisy_speech = torch.from_numpy(data)
noisy_speech = noisy_speech.float().to(device)
# speech processing
enhanced_speech = tg(noisy_speech)
# swap dimension back
print(enhanced_speech.shape)
enhanced_speech = torch.transpose(enhanced_speech, 0, 1)
print(enhanced_speech.shape) Additionally, I've come into another issue that it generates speech as if it were randomly generated and is accompanied by some of the speaker's original voice. UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at ..\aten\src\ATen\native\Convolution.cpp:1009.)
conv1d( How do I fix it? |
Hi @Yizai30, Just wanted to let you know that the input format for this function is [batch, audio_length]. For an example, check out this notebook. We're also aware of the warning you encountered. This is caused by using "same" padding with an even kernel size, please see this issue. We're working on a fix for this in a future release, but in the meantime, you can adjust the size of the smoothing filter using the freq_mask_smooth_hz and time_mask_smooth_ms parameters. For nonstationary gating, ensure the n_movemean_nonstationary parameter is set to an odd value. |
I've found one solution/workaround to the problem of not matching the shape after applying noisereduce (implications of def audio_padding_before_stft(audio_tensor, hop_length, mode='constant'):
pad_amount = (hop_length - (audio_tensor.size(-1) % hop_length)) % hop_length
if pad_amount > 0:
pad_left = pad_amount // 2
pad_right = pad_amount - pad_left
audio_tensor = F.pad(audio_tensor, (pad_left, pad_right), mode=mode)
return audio_tensor
audio_tensor, sr = ...
tg = TorchGate(sr, ...)
audio_tensor = audio_padding_before_stft(audio_tensor, tg.hop_length) I'm not sure about the best mode for padding, but I think about this (constant) and reflect (default in stft). |
Traceback (most recent call last):
File "D:\work_directory\Anti-Fraud\audios\scripts\use_noisereduce.py", line 23, in
enhanced_speech = tg(noisy_speech)
File "D:\environment\Python\3.10.11\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\environment\Python\3.10.11\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\environment\Python\3.10.11\lib\site-packages\noisereduce\torchgate\torchgate.py", line 216, in forward
raise Exception(f"x must be bigger than {self.win_length * 2}")
Exception: x must be bigger than 2048
How to make it through? I'll be appreciated if anyone could help.
The text was updated successfully, but these errors were encountered: