Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RawTask memory leak #636

Open
greatest-ape opened this issue Jan 6, 2024 · 5 comments
Open

RawTask memory leak #636

greatest-ape opened this issue Jan 6, 2024 · 5 comments

Comments

@greatest-ape
Copy link

greatest-ape commented Jan 6, 2024

I've received a report of memory leaks in aquatic_ws. When running the application under heaptrack, it appears that tasks started to handle connections are in fact never freed. After looking over my application code, I simply can't find a logic bug that would cause the tasks not to finish in the tested scenario. More memory is leaked when more connections were created, so this likely isn't explained by fixed overhead.

Connection tasks are spawned and detached in https://github.com/greatest-ape/aquatic/blob/2d959ee8fc4e8acdd4e345917a7d4770c6977b6f/crates/ws/src/workers/socket/mod.rs#L162. The tasks then spawn and await tasks of their own in https://github.com/greatest-ape/aquatic/blob/2d959ee8fc4e8acdd4e345917a7d4770c6977b6f/crates/ws/src/workers/socket/connection.rs#L197.

I've attacked a screenshot of the heaptrack flamegraph for leaked memory. I can send the full heaptrack file (180 MB) if you're interested.

aquatic_ws_possible_mem_leak

@glommer
Copy link
Collaborator

glommer commented Jan 6, 2024

Definitely sounds like a leak.

A couple of questions: you don't keep a reference to the connection anywhere. Could it be that this is just new connections being opened?

Assuming it is not - I wonder if there's a bug that is triggered by race, where the future that is not ready is not canceled properly at the runtime level. Would you be able to provide a minimal reproducer ?

@greatest-ape
Copy link
Author

greatest-ape commented Jan 6, 2024

I ran the application with heaptrack locally and waited for around a minute after quitting the load test program creating connections before quitting the tracker, so I have a hard time seeing how it could be due to new connections - if it’s about data still in kernel buffers or similar, the task should finish anyway when it notices that the client socket is closed. I can also confirm that the leak occurs even if connection count reaches zero, e.g., connection clean up code is called as many times as connections are opened. In other words, the race future returns, but there is still task state that is not cleaned up, so it might very well be a bug related to it. Could it be related to the tasks being in different priority queues?

I do store task references in a sense, just not the TaskHandle. Since I want to do cleanup after connections are closed, I don’t want to force quit the whole task, but send a message to it instead, so storing the TaskHandle doesn’t bring a lot with the current design.

I’m unsure if I’ll be able to provide a minimal reproducer.

@greatest-ape
Copy link
Author

The leak is “fixed” by awaiting race on plain async blocks instead of on spawned tasks: greatest-ape/aquatic@188da13

FYI running the tasks on the same queue didn’t fix the issue.

@vlovich
Copy link
Contributor

vlovich commented Mar 13, 2024

@greatest-ape is this related to #448 ?

@ozgrakkurt
Copy link

Seems like this might be related? #671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants