Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream concurrency for strfry router #127

Open
braydonf opened this issue Oct 10, 2024 · 2 comments
Open

Stream concurrency for strfry router #127

braydonf opened this issue Oct 10, 2024 · 2 comments

Comments

@braydonf
Copy link
Contributor

I've been reading through the source of src/apps/mesh/cmd_router.cpp and it appears to all run in a single thread with an event loop, I believe this can be an issue when running with hundreds of streams, and would like to add some concurrency to the router.

Reading through other sources such as src/apps/relay/cmd_relay.cpp for strfry relay and multiple ThreadPool instances are used to support concurrency.

Is this a good idea?

@hoytech
Copy link
Owner

hoytech commented Oct 18, 2024

Good observation! Yes you are correct that the router's websocket communication is done in a single thread. It isn't quite as sophisticated as the normal relay. However, note that it does offload some especially CPU-intensive tasks like signature verification to other threads such as the WriterPipeline's validator thread.

TBH I would not attempt to modify this architecture unless it becomes obviously a bottleneck. Note that another workaround is to run multiple router instances. You can have 2+ router processes running at the same time, each with their own set of streaming URLs, but both pointed to the same underlying DB. This is a manual way that you can do horizontal scaling if you have a very large number of streaming sources.

@braydonf
Copy link
Contributor Author

I'm not entirely convinced it's a bottleneck yet, however it appears that the necessity for the increased timeout may be due because of it, as all the streams need to be added from the config before any can become connected. Perhaps a break in the loop that adds streams from the config, so it's non-blocking, could provide an opportunity for other async events to be processed?

I haven't monitored closely a long-lived process yet, however it may not be a bottleneck once they are connected, events are not that frequent to be that blocking. The very long timeout though might mean that a stream might go down for many minutes and not reconnect quickly and may miss some events. This isn't an issue as more relays support syncing with negentropy and strfry sync can catch those events.

I've attempted a work-around using many different processes of strfry router, however I ran into an issue with too many open files. I'm assuming this is from LMDB. Perhaps this could be worked around by bumping up this maximum number in the OS? I think I was trying to open up about 190 routers, one for each pubkey that I follow, and each with multiple relays in one stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants