Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying noise reduction on pyaudio stream #44

Open
antondim opened this issue Sep 21, 2020 · 3 comments
Open

Applying noise reduction on pyaudio stream #44

antondim opened this issue Sep 21, 2020 · 3 comments

Comments

@antondim
Copy link

antondim commented Sep 21, 2020

Hello,
First of all, thank you for your great module.

I'm trying to apply real-time noise reduction on incoming audio stream

Settings on stream opening:

  • Mono , sampling_rate = 16kHz, frames_per_buffer= 16000

Settings on stream reading:

  • stream.read(16000, exception_on_overflow = False)

The problem that I'm facing is that a periodical "fan spinning" sound appears, after actively applying noise reduction (while loop), but this sound does not appear on a normal 5 second recording with noise reduction on the np.int16 array afterwards.

What is different is that in the first case (active-ish noise reduction), I append the sound data for each iteration,after noise reduction, whereas in the second case I record for 5 seconds and THEN apply the noise reduction on the whole set of data.

I'm uploading example wavs to give you a better perspective:

normal_case_wavs.zip
active_case_wavs.zip

P.S I noticed that by changing the number of frames I read from the stream buffer, the frequency of this sound changes too. Could this be some kind of edge case where this sound indicates the change of "Sound CHUNK" I am processing (appending to the list for future .wav write) ?

Crucial part of code is here:

stream= p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=16000)

for i in range(0, int(16000 / 16000 * 5)):
    data = stream.read(16000)    
    sound_data_npint16 = np.hstack(np.fromstring(data, dtype=np.int16))
    noisy_frames.append(sound_data_npint16)

    sound_data_float = np.ndarray.astype(sound_data_npint16,float)/32768
    reduced_noise_float = nr.reduce_noise(audio_clip=sound_data_float, noise_clip=noise, verbose=False, n_fft=4096, n_std_thresh=1, pad_clipping=True) #Tried both pad_clipping=True/False
    reduced_noise_npint16 = np.ndarray.astype(np.iinfo(np.int16).max*reduced_noise_float,dtype=np.int16)

    denoised_active_frames.append(reduced_noise_npint16)

total_noisy_frames = np.hstack(noisy_frames) # noisy frames gathered
total_noisy_frames_float = np.ndarray.astype(total_noisy_frames,float)/32768
reduced_total_noisy_frames_float = nr.reduce_noise(audio_clip=total_noisy_frames_float, noise_clip=noise, verbose=False, n_fft=4096, n_std_thresh=1, pad_clipping=True)
reduced_total_noisy_frames_npint16 = np.ndarray.astype(np.iinfo(np.int16).max*reduced_total_noisy_frames_float,dtype=np.int16)

# noise wav comes from  "noisy_frames"
# actively denoised wav comes from "denoised_active_frames"
# denoised wav after 5seconds recording comes from "reduced_total_noisy_frames_npint16"
@jonaslimads
Copy link

Hello,

I did kind of a hack, but it seems to work:

  • I first saved the whole audio, then cut just a noise sample from the the whole audio via Audacity, then stored as WAV file.
  • Then before streaming I would load the noise sample and run reduce_noise against it. Something like:
    def __init__(self, output_to_file=True):
        self.vad = webrtcvad.Vad(int(self.vad_aggressiveness))
        self.deepspeech_model = self.load_deepspeech_model()
        self.noise_sample_data = self.load_noise_sample_data()

    def reduce_audio_noise(self, data: bytes) -> bytes:
        np_data = np.frombuffer(data, np.int16) / 1.0
        reduced_noise_data = reduce_noise(audio_clip=np_data, noise_clip=self.noise_sample_data)
        return reduced_noise_data.astype(np.int16).tobytes()

    def load_noise_sample_data(self) -> np.ndarray:
        path = os.path.join(os.path.dirname(__file__), "../../../assets/deepspeech/noise_sample.wav")
        with wave.open(path, "rb") as wf:
            frames = wf.getnframes()
            return np.frombuffer(wf.readframes(frames), np.int16) / 1.0

I would just stream the bytes return from self.reduce_audio_noise(_bytes) .

Of course, the noise sample is pretty limited because it only recognizes one pattern of noise.

I hope that can help.

@yujie-tao
Copy link

Hello! I am also facing a similar issue on "fan spinning" artifacts when trying to apply noisereduce on real-time microphone input. Curious if people have figured out a walkaround?

@DanTremonti
Copy link

@yujie-tao I faced the same issue and found a workaround to reduce this effect. By streaming audio as chunks with overlap, I was able to reduce this effect significantly. One of the side-effects to this is increase in latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants