Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run multiple server threads #10701

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

chris48s
Copy link
Member

@chris48s chris48s commented Nov 23, 2024

This PR may or may not be a good idea, but I'm throwing what I've done out here - partly for for feedback, and partly because it might suddenly become a much better idea on Monday. I'll start off by explaining why I did this:

Firstly
Fly are bringing in throttling for applications using shared CPUs https://community.fly.io/t/predictable-processor-performance/22259 on Monday
We are consistently running over the limit that is going to be imposed
I don't really have a clear view on how bad the performance impact of this will be. We'll find out on Monday. It may or may not be a big deal.

Secondly
If we need to obtain more CPU resources on fly, the cheapest way to do that is to switch from a shared-cpu-1x instance type (which gives you one shared vCPU) to a shared-cpu-2x (which gives you two shared vCPUs).
A shared-cpu-1x with 512Mb RAM is $3.19/month. These are what we currently run in production.
A shared-cpu-2x with 512Mb RAM is $3.89/month.
If the only thing we need is access to more CPU time (and not more memory), then that is way cheaper then running another instance to gain another CPU core.

Thirdly
Shields is a node app, and currently it runs single-threaded.
This is great if you want to scale out horizontally over lots of small machines.
However, if we wanted to run on a shared-cpu-2x, our current architecture doesn't allow us to take advantage of a second core/vCPU on the same server.

So.. tying all those threads together. Depending on how big the impact of being throttled is, it may become advantageous to us to be able to run on a shared-cpu-2x instance type and take advantage of the second vCPU. This PR is a first bash at doing that.


Some notes on this, having actually had a go at it:

  1. As this PR stands, the application will run in multi-threaded mode locally (assuming your local machine has >1 core). This does add mean startup and hot reloads take a bit longer. Simultaneously, if we're running mutli-threaded in production, its probably sensible to run mutli-threaded in dev. We could also consider making it possible to override numCPUs with a setting so you can run single-threaded in dev, or maybe make that the default. The local ergonomics of this would be a useful thing to get some feedback on.
  2. It seems that even if numCPUs is only one, use of cluster increases our memory overhead. I tried spinning up a shared-cpu-1x with 256Mb RAM (which is our standard review app config) and the app immediately crashed and ran out of memory, so we would need 512 for a review app with this. I don't really know what this would look like serving production traffic. I think we'd have to try it and see.
  3. I've had to make some tradeoffs with regards to testing. I will call these out in inline comments.

@chris48s chris48s added the core Server, BaseService, GitHub auth, Shared helpers label Nov 23, 2024
Copy link
Contributor

github-actions bot commented Nov 23, 2024

Messages
📖 ✨ Thanks for your contribution to Shields, @chris48s!

Generated by 🚫 dangerJS against 00c4033

Copy link
Contributor

🚀 Updated review app: https://pr-10701-badges-shields.fly.dev

Comment on lines -11 to +12
serverModule = await import('./server.js')
server = await startServer()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core problem I ran into here was that cluster.fork() attempts to spawn a new worker process that executes the same script as the primary process. When a worker process is spawned, it executes the same Node.js script that started the primary process, but this didn't have the desired effect under test because the primary process is the test runner.
I worked round this by
a) exporting a function we can test from server.js
b) only starting in multi-threaded mode if we are not under test
This means the actual multi-threaded server startup is not under test, but I couldn't find any way around this. I think it is a tradeoff we would have to accept if we go down this road.

}
if (process.argv[3]) {
config.public.bind.address = process.argv[3]
if (process.env.NODE_CONFIG_ENV !== 'test') {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching from a dynamic await import to a module-level import also meant that this was picking up the test runner args here

run: npm run test:entrypoint -- --reporter json --reporter-option 'output=reports/entrypoint.json'
under test.

@chris48s chris48s changed the title WIP run multiple server threads run multiple server threads Nov 23, 2024
@chris48s
Copy link
Member Author

chris48s commented Nov 26, 2024

This might not actually be a good idea. Reposting from discord thread:

Node is single-threaded so my assumption was that if we deployed on a machine with 2 vCPUs, what we would see is one vCPU doing all the work and the other one sitting around doing virtually nothing (and we would need to do something like I've proposed in this PR to actually utilize 2 cores). Turns out this is wrong. With no code changes, we actually balance across 2 cores reasonably well.

Screenshot_at_2024-11-26_19-40-35

Having done some reading and experimenting, it turns out the OS scheduler can distribute the work of a single-threaded application across multiple vCPUs. Even though node itself uses a single thread for the main event loop, the thread can be switched between the available vCPUs.

My current thinking on this is that doing something like this might actually be counter-productive. What we're aiming to do isn't to max out 2 cores, it is to sparingly spread out our CPU usage across 2 cores (each of which has a restrictive usage quota) so what we're doing now (running single threaded on a shared-2x) might be a better way to achieve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Server, BaseService, GitHub auth, Shared helpers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant