Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with audio drops #33

Open
zygmund2000 opened this issue Feb 6, 2024 · 20 comments
Open

Help with audio drops #33

zygmund2000 opened this issue Feb 6, 2024 · 20 comments

Comments

@zygmund2000
Copy link

zygmund2000 commented Feb 6, 2024

Hi,

Somebody uses this library to stream audio for live fm tuner, in this case fallback mp3 is used, but sound in all browsers has regularly sound drops milliseconds interval like missing samples.
Do you have any idea to improve this issue increase buffer sizes or something? Probably it occurs at client side but I'm not sure.

@JoJoBond
Copy link
Owner

JoJoBond commented Feb 6, 2024

You can tweak the MinDecodeFrames setting that is set in the DefaultSettings method inside 3las.formatreader.ts/3las.formatreader.js.
Don't go below 3, but you can try higher values.

@zygmund2000
Copy link
Author

Thanks, now is much better with value 50 interval between drops are longer but still are, maybe something else yet?

@JoJoBond
Copy link
Owner

JoJoBond commented Feb 6, 2024

Not really anything that comes to mind. The fallback solution is not the greatest. It's best if the client supports WebRTC. The fallback was what I initially started with, but it's super hard to account for all variations on how mp3 is decoded on each client device. Have you tested using WAV fallback only?

@zygmund2000
Copy link
Author

I don't know why only fallback works, tried WebRTC but no sound and btw this is pcm stream yes? So it has high bit rate transfer if I want stereo sound, mp3 should be the best solution because other streams like icecast plays with no drops, of course with longer latency eg. 0.5s is possible and enough.
In this file I se many parameters like compensation of speed or samplerate, I don't understand it so I'm blind to adjust

@JoJoBond
Copy link
Owner

JoJoBond commented Feb 6, 2024

Yes WAV would be raw PCM. Mp3 is optimal for low data rates but is difficult to handle low level. Speed compensation never really worked, better leave it disabled. Icecast uses HTTP long polling, so buffering is 100% up to the browser, you have no control over latency. I would high recommend looking into why WebRTC isn't working properly. Maybe you just need a STUN/TURN server.

@zygmund2000
Copy link
Author

zygmund2000 commented Feb 6, 2024

I'll try but my target is stereo with 48kHz sample rate and PCM is too much for it, unfortunately compression is necessary.
Actually drops interval look like time constant, and sounds like it's not browser dependent.

@NoobishSVK
Copy link

NoobishSVK commented Feb 7, 2024

Hello, since FM webserver is mentioned, I believe it comes down to me.
I have modified your library and included it with with my opensource project FM-DX-Webserver.
I have noticed this crackling mostly when using dshow. Alsa seems to work fine, however for some reason whenever I output the ffmpeg command to an mp3 file, the file seems to have no crackling issues.
The reason for disabling WebRTC was because it didn't seem to behave properly with port forwarding, therefore WS seemed to be a much easier to deploy way for clients who want to use it.

Here's the project:
https://github.com/NoobishSVK/fm-dx-webserver

Would be awesome if we could somehow fix this. Basically what I would like to achieve is an easy to deploy webserver with a low-latency non-crackling audio stream.

If you would like to see how it's being used, please check: http://xdr.noobish.eu:42069/

I was thinking of contacting you privately but sadly you don't seem to have any contacts available on your GitHub profile, therefore i'm trying it this way.

@JoJoBond
Copy link
Owner

JoJoBond commented Feb 7, 2024

Hey @NoobishSVK that's a very cool application. From what I see the crackling also comes when using WebRTC, which is very unusual. Now I did test with Dshow on my dev machine and it works fine, but of course, your mileage may vary. To rule out an issue with dshow, you could test using clean audio file (48kHz, mono) as input into ffmpeg instead of a live capture.

@NoobishSVK
Copy link

NoobishSVK commented Feb 7, 2024

Thank you for your reply, we'll try fiddling around with dshow for a bit and see what could be the issue. I am currently parsing a 192KHz stereo via audio/mpeg to the client on my server (every person who tries to use the server can change that between Mono/Stereo and 128/192k). Also here's a link to a server that runs on alsa if it helps with anything: https://konrad.fmdx.pl/tuner/

By the way, if you don't mind, we'd love to welcome you to our DIscord, if you use it (in the project description). This way we can communicate faster.

@TGCFabian
Copy link

Hi Jojo, As a user of FM-DX-Webserver, i thought i'd pop in too

I've been noticing the popping, trying to get it away myself, but couldnt really find an easy way
One majour thing is when i forced my browser to use WAV, it all sounded fine,
When i use MP3, it'll start popping again

I've tried Mono, Stereo, 16Bit, 24Bit, 48Khz, Lower 44.1Khz, (Even lower when forced through FFMPEG), nothing changed
MP3 Pops, WAV Perfectly stable
I've tested this with an mp3 file as requested (Singular tone at 440Hz)
With it doing the same behaviour as before

To rule-out the final ffmpeg command that converts from s16le to libmp3lame, i recorded the data to an mp3 file,
This file sounded clean, just like the WAV Stream from before

From my understanding it'd have to be something on the client's playback side,

Hope this lil rant of tests helps ^^

@JoJoBond
Copy link
Owner

JoJoBond commented Feb 7, 2024

@TGCFabian Thanks for the feedback. I have a suspicion about what could be wrong, but will not get around to make some changes until the weekend. I think that there is still some artifact in the first few samples of the second granule. We are already skipping the first granule because it's incomplete by mpeg design. But there seems to be some issue with some of the samples that theoretically should be ok. That would explain that when you increase the MinDecodeFrames number it decreases the frequency of the clicks. My idea would be to leave out one additional frame after decoding and keeping it for the next decode block. But, as I said, I will not get to it until the weekend.
Either way, it would still be better if WebRTC was to be used. The networking might be a hassle to setup, but once it's working it's just so much better.

@NoobishSVK
Copy link

Absolutely not a problem, we can just try to implement it right away. Websocket seemed to be way easier to deploy / works more reliably as many people have different installations of the webserver and I wanted to keep it as simple as possible (my goal for the future is just for the users to be able to literally open the packaged binary and run it right away). With WebRTC (i think i mentioned it previously) we had issues with port forwarding and especially audio delay as it seemed to be worse when using free STUN servers (such as Google).
Your solution with the frame stuff sounds very good and I can't wait! I would love to help but sadly I am not expereinced with encoding/decoding media files.
Thank you for your hard work!

@JoJoBond
Copy link
Owner

JoJoBond commented Feb 8, 2024

Can you please try a little experiment.
Try changing the line 213 in 3las.formatreader.mpeg.js from
extractSampleOffset = Math.floor((decodedData.length - extractSampleCount) / 2);
to
extractSampleOffset = decodedData.length - extractSampleCount;

@TGCFabian
Copy link

Can you please try a little experiment. Try changing the line 213 in 3las.formatreader.mpeg.js from extractSampleOffset = Math.floor((decodedData.length - extractSampleCount) / 2); to extractSampleOffset = decodedData.length - extractSampleCount;

This seemed to have worked, we'll monitor it closely!
Thanks for the quick look!
We'll report back incase something happens ^^

@NoobishSVK
Copy link

I have definitely noticed a big improvement as well + other users who have tested this code change reported that there's no crackling anymore.

@NoobishSVK
Copy link

Okay, an update - it seems like the sound drops out very frequently with this change on iPhones using Safari. We have compared servers that run with this fix and without it and it seems like only the servers with the fix have this issue.

I could technically fix it by making a toggle or not enabling this fix for iPhones, however if possible, it would be nice to find a better solution.

@JoJoBond
Copy link
Owner

No good way to determine how the decoder handles the first and last granule. We only know the playback times that we would expect and the one that we actually get. If we get more samples then expected we don't know where samples were added. Could be at the front or back or both. Can you tell what values they get for expectedTotalPlayTime and decodedData.duration inside of OnDecodeSuccess callback?

@JoJoBond
Copy link
Owner

JoJoBond commented Feb 11, 2024

Would you kindly test with commit bb9b4ba

@NoobishSVK
Copy link

From my quick testing (i asked 2 iPhone users) it seems to be fine - no crackling, no sound drops anymore. Looks great!
Also one more question regarding this MPEG formatreader while we're at that, is there a specific reason why the frames are being decoded by 17 for Android phones? Is it performance related? I was thinking of lowering it since phones are way more powerful nowadays.

@JoJoBond
Copy link
Owner

As far as remember it would cause problems below 17 for some devices. Feel free to test with lower values. Theoretical limit would be 2, but I don't recommend going below 3. As for what it does: It defines how many mpeg frames are decoded in one batch. But one frame of each batch has to be reused on the next batch, because of the way mpeg is encoded. So having lower values is a bit inefficient because you get the data worth of two frames for the cost of three. If you use 17 frames per batch you get the data worth of 16 frames for the cost of 17, you also have fewer decoder calls. The downside of using more frames per batch is the increased latency. Though with the default 333ms buffer, it shouldn't matter (See InitialBufferLength in Fallback_Settings).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants