-
-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU bottlneck when downloading at very high speeds #620
Comments
Very interesting write-up and investigation, personally I will go over it again out of pure interest. But, in what way is this a real world example? Why are you using legendary in what essentially is a datacenter environment? I am not trying to dismiss it or anything. I am just amazed. |
That is my home computer. My Internet provider is Init7. This setup is not that unusual. Not common of course, but there is now number of other places where you can get 10 or 25Gbps, in Europe, US and Asia. Also this CPU bottleneck can be experienced easier on CPU with less cores. I will submit a small patch (removing unnecessary memory view copies), that helps a little. (850MB/s -> 920MB/s).
|
This increases peek download speed from about 850MB/s to 960MB/s on my computer. derrod#620
In addition to the patch you submitted, have you tried But this whole downloader is kind of held together by hopes and dreams and not the most efficient, so to actually hit 10 Gbps maybe it's time to Rewrite it int Rust™️ 😛 |
I tried |
This increases peek download speed from about 850MB/s to 960MB/s on my computer. #620
Platform
Operating system and version: Linux 6.6, amd64
Legendary version (
legendary -V
):First world problem, but I noticed that there is CPU bottlneck when downloading big games on very fast network connection.
Games can be big, and often are the biggest monolithic download one would be downloading.
Example here, downloading 90GB of Alan Wake 2.
My educated guess is that the issue is in OpenSSL. More on that later.
I cannot get more than about 850MB/s per second download.
While my Threadripper 2950X, 16 core CPU struggles.
I do think this is because of OpenSSL, because I do notice something I know OpenSSL is doing similarly when using handling TLS reads.
It is doing 5 byte TLS header read, then some amount of bytes read (corresponding to what TLS header says).
But that is not really too optimal, and causes a lot of syscalls to be done.
A more efficient way would be to read something like up to 128kB from the socket, then hand it to openssl as data.
This results in this:
(Please ignore gnome-shell CPU usage, this is because I was doing a screenshot, and this causes a raise in CPU. During actual download, gnome-shell sits idle)
Disk IO performance is not a major problem. There are probably also various inefficiencies in how data is moved between workers and main thread, and how things are done, but could be a second concern.
All tests were performed on tmpfs (256GB DDR4 RAM). tmpfs write using
dd
using 128KiB blocks, shows 3.4GB/s, and using 4MiB blocks 3.8GB/s. I cannot done more, not because of tmpfs or memory speed limitations, but becausedd
starts using 100% CPU for doing this 4MiB read and writes.I know you are using Python, and
requests
library (which are not most efficient), but the ultimate issue is mostly in openssl from my experience.I also did notice some suspiecious code in file worker:
This
tobytes()
while not obviously documented, can and often will copy data, which is unecassary data copy in this case.Proof:
But,
write
(on essentially all types of files) is perfectly happy to accept memoryview, no need to give it bytes.Some machine specs:
on the host (AMD Ryzen Threadripper 2950X 16-Core Processor) with heroic / legendary,
on the router (AMD Ryzen 7 3700X 8-Core Processor).
FS S5860-20SQ Ethernet switch.
The text was updated successfully, but these errors were encountered: