Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imagine a UX flow that makes sense #13

Open
sirdarckcat opened this issue May 20, 2023 · 1 comment
Open

imagine a UX flow that makes sense #13

sirdarckcat opened this issue May 20, 2023 · 1 comment

Comments

@sirdarckcat
Copy link
Owner

I imagine having the user click and hold record ⏺️ and then when releasing, stopping the recording would be a good way to collect recordings and the phone position. An alternative is to let the user click to start and stop recording (like a normal video recording).

It's just important to either have a way to record the location of the phone or warn the user if they move too much.

Then let the user hear the sound and provide a UI with 2 sliders:

  • one slider would be about volume. we preselect the average amplitude to let the user select the threshold where the background noise is gone but the sound they want to locate is still hearable
  • one other two-marker slider could define time (beginning and end) of the sound. we can use the previous selection to default some sound.

In addition, it may make sense to let users further narrow down the sound by frequency range, as they may know that. the UI could be similar to time (a min and max freq) and maybe a spectrogram Viz of the sound captured (and a way to listen back the recording with only the frequencies selected).

@sirdarckcat
Copy link
Owner Author

import scipy.io.wavfile as wav
import numpy as np

# Read the WAV file
sample_rate, audio_data = wav.read('input.wav')

# Define the window size and hop size
window_size = 1024
hop_size = 512

# Convert audio data to frequency domain
audio_freq = np.abs(np.fft.fft(audio_data))

# Define the frequency range to attenuate (e.g., 1000-2000 Hz)
lower_freq = 1000
upper_freq = 2000

# Find the corresponding frequency indices
lower_index = int(lower_freq * window_size / sample_rate)
upper_index = int(upper_freq * window_size / sample_rate)

# Apply spectral gating
for i in range(0, len(audio_data) - window_size, hop_size):
    window = audio_data[i:i + window_size]
    window_freq = np.abs(np.fft.fft(window))
    
    # Check if energy is present in the specified range
    if np.sum(window_freq[lower_index:upper_index]) > 0:
        # Attenuate or suppress the frequencies within the range
        window_freq[lower_index:upper_index] *= 0.1  # Adjust the attenuation factor as needed
    
    # Replace the original window with the modified window
    audio_data[i:i + window_size] = np.fft.ifft(window_freq).real

# Write the modified audio to a new WAV file
wav.write('output.wav', sample_rate, audio_data.astype(np.int16))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant