Speed up librespot by using flexible download strategies #630
Replies: 8 comments
-
I just did some digging about the chunks that are requested but not reported as completed in the log I posted above. What happens is, that those chunks are still transmitted to the client. However, AudioFileStreaming isn't expecting them any more, so the data isn't stored and must be downloaded a second time. Thus, we get a double penalty: the second block (that we actually want) is only received after the first block (which we don't need) has been transmitted. In addition, the data of the first block is discarded which means we need to download it a second time later on. Thus, the example above actually loads 15 chunks (1920KB - almost 2MB!) before playback can commence. |
Beta Was this translation helpful? Give feedback.
-
Great work digging in to the inner workings, honestly have never examined it that in depth, surprising how much data is wasted during loads.
This indirectly relates to the transition to CDN for file delivery. we have a few things left to work out, but chunks are requested using the
This is indeed suboptimal. We should look at predicting chunks when trying to seek.
Agree with regards to predicting timestamps. unfortunately Spotify uses VBR, which makes things slightly more challenging. Still, it should be possible even then to create a better solution to bisect, whereby one does something like
This is to be expected. Mercury is very much fire and forget, whereby the server responds later with the response to your original request. I don't believe there's a way to cancel mercury requests once sent, hence we should focus on optimising the calculation for requesting chunks to minimise likelihood of requesting the wrong chunk and the number of requests.
It might be worth examining the non-compliant header Spotify uses in their files. it's possible that they use that to add some data that might offer insight into the file contents that would aid in chunk prediction, which in turn increases loading speed. Particularly, I'm wondering if there is some sort of average bitrate info + variance of bitrate that one could use to influence how large one made the chunk request whilst seeking, though I suspect that this is not the case. |
Beta Was this translation helpful? Give feedback.
-
This Bisection Search is very good compared to the alternatives (a linear scan of the whole file), often taking just a couple of reads to locate the correct location in a file gigabytes in size, but the truly obsessive can out-perform the bisection on average, by using the local bitrate to pick a better target than the half way point used in a bisection search (Secant method). Similar to this, though would have to look at whether it's compatible with streaming file chunks, from a practicality standpoint. |
Beta Was this translation helpful? Give feedback.
-
The OGG files have a proprietary header at the start of the file that librespot skips because libogg and lewton fail to decode it. |
Beta Was this translation helpful? Give feedback.
-
I've done some work on an overhaul of how files are downloaded. The download code can now handle completely arbitrary download ranges and also deal with multiple requests in parallel. Note, that parallel requests still mean, that the data comes back in sequence, but it saves on round trip time if multiple requests can be in flight. The code also makes data available as soon as it's there which means the player can use partial responses. Starting a file from the beginning is almost instantaneous - as soon as the first data comes in, playback can commence. For seek(), the code currently loads 16Kb ranges. Loading and bisect search on the same song as in my initial post takes 3.3 seconds which is quite an improvement given that it took 11 seconds before. I think, that this is about as good as it gets with using bisects. Any improvement over that would need changes to the search algorithm in some form. The code then uses larger ranges to download the bulk of the file. The code is still a bit rough at the moment and requires some cleanup. Also, it is prone to buffer-underruns shortly after playback when bulk-download doesn't come up to speed quick enough because I haven't put a proper heuristic in yet. Once I've cleaned things up a bit, I'll start a pull-request so people can test it. Nb: does anyone know an alternative for RangeSet in the range_map crate (http://jneem.github.io/range-map/range_map/struct.RangeSet.html)? I'm using it at the moment to keep track of which parts of the file are available, but the API of RangeSet doesn't appear to be thought through (or designed for another purpose) which makes things rather clumsy. |
Beta Was this translation helpful? Give feedback.
-
Hows this going @kaymes? This is a great write up and is 100% worth pursuing. If you'd like to open a draft PR, it could be worth getting some more eyes on it before you get a chance to tidy up I must admit I get a bit impatient when seeking with librespot and I'd be happy to poke my nose in here and try to help :) |
Beta Was this translation helpful? Give feedback.
-
It was going quite well until I got busy with other things and forgot about this. I just had a look at the state of the code, merged it with the current dev branch (was a bit messy after all this time) and also managed to find the bug that was stopping the show. I'll put up a pull request now for others to test. |
Beta Was this translation helpful? Give feedback.
-
PR #393 implements items 1-4 of my list of ideas above. (technically, we''re not really doing 2, we're doing something better instead). That's as far as I see it feasible to do at this stage. Please test it and provide feedback. |
Beta Was this translation helpful? Give feedback.
-
Users with a slower internet connection report long load times of 10 seconds or more until playback commences whereas the official Spotify client does the job in less than a second.
In PR #297 (Improve playback speed), I mentioned, that I would like to further speed up the start of playback of librespotify to make it comparable to the official client.
I'm opening this issue to discuss ideas and possibilities for speeding this up and also ask some questions I have about understanding the librespot code I would like to mess around with.
Background information:
I should probably explain a little about how librespot opens a stream.
On the lowest level, the stream gets divided in chunks of 128KB. Whenever the vorbis player attempts to do a read access, the required chunk is downloaded. If no chunk is being requested, the next missing chunk following the last one that was read from is downloaded. It is only possible to read from chunks that have been downloaded in full.
When doing a seek() over the audio stream, the vorbis code does a bisect. That is, it always checks the timestamp of the middle of the seek interval and then decides which half of the interval is the next interval until it finds the correct location.
Note: This division in 128K chunks is librespot's decision. The protocol allows for arbitrary chunk sizes to be downloaded. As an experiment, I lowered the chunk size to 10K and the load time was almost halved. However, I fear, such small chunks can easily lead to throughput errors. And there's also the possibility that Spotify starts to hate us for hammering their servers with too many small requests.
Ideas:
I have some ideas how to speed things up. I'm not necessarily planning to pursue all of them, but I reckon, ideas 1)+2) are pretty much the first step because I think they have the biggest impact and are pretty much prerequisite for the others.
The first and basic idea for all that follows is, that the fixed chunk size should be done away with. Instead blocks of varying size should be downloaded. This could be implemented by making the internal chunk size very small (e.g. 1KB) and then downloading a range of chunks when requesting data from the server. This allows us to implement more flexible download strategies.
One idea would be to start with a small number of chunks when starting to read the file and then incrementally increase the number of chunks read per request. This would cater for quick reads during seek() and at the start of playback whereas the grunt of the file is downloaded with larger chunks that are better for throughput.
Partially downloaded requests could be made available. If we download a range of chunks, we could make the partial range that has been received available for read which the rest is still trickling through the line.
The download heuristic could be made aware of whether the access is due to seek or for playback and then optimise it's strategy accordingly. E.g. during seek it doesn't make much sense to read-ahead to the next chunk because we're jumping around in a random access pattern whereas in playback mode, read-ahead is awesome.
Vorbis seek could be improved. Instead of doing a blind bisect, we could try to predict where the correct timestamp is. If Spotify songs are encoded with constant bitrate (are they?), it should be possible to predict a timestamp's location in the file reasonably well and jump directly to the position we're interested in (or at least to somewhere close by and then fine tune with only few further reads).
Vorbis' seek could be made aware of which chunks have been downloaded already. Then, the seek() could be optimised to first exploit information that has been downloaded already before requesting new chunks. This could be useful together with idea 5) if the jump prediction isn't too accurate.
If I saw it correctly, the connection to the spotify server is some sort of multiplexing of multiple requests. Thus, we could download multiple chunks in parallel. This might be an option if we have issues with sufficient bandwidth but long ping times.
Experiments:
I made a special built of librespot with additional log messages to see what is happening during the load phase. I manually removed log lines to only keep the ones that show how the file is loaded and accessed. The log messages are timestamped by piping them through "ts".
The test below was done on an ADSL line (18396 Kbps down link) in Sydney, Australia and it took 11 seconds to load the track. On the same connection, the official client typically starts playback for similar tasks in less than a second, rarely it takes 1-2 seconds. (times for official client are estimated).
The song I'm loading here is 5:14 minutes long and standard 160K quality is used. About the first 10-11 seconds of the song where played on my phone before librespot was selected as device, so librespot needs to search for the correct starting point before playback. (The optimisation from PR#297 which avoids seek() when playing from the beginning doesn't work here).
The song is split into 48 of the 128K chunks, numbered from 0 to 47.
What can be seen is that it takes about 1.6 seconds to download a chunk. Seven chunks are downloaded which is 896 KB of data before playback can commence. And pretty much all the time goes into downloading those chunks.
The "requesting chunk" messages are created at the first line of fetch.rs::request_chunk().
The "chunk...complete" messages are created by fetch.rs::AudioFileFetch::poll() just before the chunk is marked as complete in the bitmap.
The "reading at" lines are created by AudioFileStreaming::read() before accessing the bitmap. The position is the absolute position in the file, the chunk is the chunk number the position is in, the offset is the position inside the chunk and length is the value of the variable "len" (=requested length to read but potentially truncated at chunk boundaries).
What can be seen is, that at first, the header of the file is examined (bunch of reads in chunk 0). Then, the vorbis seek starts its bisect and reads in blocks 46, 23, 11, 5, 2, 1, 0, 1 until it finds what it is looking for.
It can also be seen, that the code requests follow-up blocks. However, their download seems to get cancelled when another block is requested. I don't know what the performance penalty for requesting and cancelling a block is.
I also repeated this test for a built with a block size of 10KB. A chunk downloads in about 0.3 seconds and the total time to load the file went was about 6 seconds. I'm not showing the detailed log here since I don't think it adds much value.
Beta Was this translation helpful? Give feedback.
All reactions