-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swarm balancing logic issues #389
Comments
Hi @fadenb, What you're saying is 100% reasonable, we just didn't have time to do that since it would require additional complexity on the server-side. If you can help with this feature, let us know - we'd be happy to have such a pull request. |
mine is doing the same thing: Jul 24 18:26:43 danserver petals[1297]: Jul 24 18:26:43.749 [INFO] Swarm balance quality: 62.8% i'm fixing this with these arguments: --block_indices 28:60 --balance_quality 0.0 @borzunov in creating test to make a better algorithm for choosing other blocks, are there any examples in tests/ of setting up several mock CPU servers that can talk to each other in a test swarm and mock blocks? should the method that chooses blocks always return sequential blocks? |
Hi @iateadonut, Yes, a server should host a set of sequential blocks. Re mock CPU servers, you can create a private swarm with a really small model like |
is dht_utils.get_remote_module_infos() - is that supposed to return only information about remote servers? when running several CPU servers on my localhost, it returns my own server information. i ask because block_selection._choose_best_start and and block_selection.should_choose_other_blocks use throughputs derived from get_remote_module_infos(), but get_remote_module_infos() returns a throughput that includes the own server's blocks, there's bound to be some problems. second, i'm writing tests as unit tests for some of the block selection functions, including _choose_best_start and should_choose_other_blocks. i did not see either of those in the test suite and will add more as necessary as i'm working to figure this out. |
Hi @iateadonut,
A good example for using this function is the source code of https://health.petals.dev - see the place where Re tests for swarm balancing, they are indeed missing at the moment - I'd appreciate if you add them in some form. Please note that our CI doesn't connect to the public swarm and launches a tiny isolated swarm with BLOOM-560m instead - you'd have to write your tests with this constraint in mind. |
Thanks. Is there a method that gets only 'remote' module infos? |
@iateadonut No, but you can filter out your local peer_id to keep only remote infos, like we do in |
@fadenb @iateadonut For the record, another reason why downloading blocks is slow is that StableBeluga2 weights are distributed in float32 and Llama weights are distributed in float16, while we host them in 4-bit (nf4). This means that we download 8x/4x data than necessary (same for disk space and disk reading time). So an alternative is to implement functionality allowing to download (or load from disk) the model in nf4 right away. @mryab was working on this functionality for int8 in #273, we may need to revive this PR and prioritize this feature. |
i'm working now on creating a test for block_selection: the test above works and for the simple mock of 2 servers both running blocks 1-16 of a 24 block model, it passes the tests. I'm going to work to get the current module_infos from the live server so I can mock it's setup and see if I can find the problem. do you think we should move this block in https://github.com/bigscience-workshop/petals/blob/main/src/petals/server/block_selection.py to its own function (if necessary) for easier testing? if so, should it be called _new_throughput()?:
|
@iateadonut Yes, you can extract it into a separate function if it's useful. |
i have a set of module_infos that includes 80 sets of block-server data info dumps; it is used to mock this test: https://github.com/iateadonut/petals/blob/danO/tests/test_block_selection.py#L18 the throughput of the server looks like this: the throughput of the server minus the local server looks like this: it yields - Swarm balance quality: 47.7% - then it restarts the service, which holds 33 blocks, and starts again at the same place it started last time, at block 1. I will do some more work on this this week. I wanted to share the throughput and modified throughput in case anything from those points to a solution I might not see so easily. |
@borzunov Can you explain this: It looks like you're trying to check the new throughput on the swarm if the local server changes the blocks served AND all other servers change their blocks served as well. Is that correct? If that's the case, I wonder if this can work well in a live environment, where you have at least a few minutes between each time each server runs should_choose_other_blocks. What do you think? Should we figure out a different way to find swarm balance quality? Any ideas? |
@iateadonut, in this code, a server simulates what others would do if it moves. This is necessary so that we can know the final throughput it is possible to reach after moving. For example, imagine that we have 30 blocks and 3 servers hosting blocks 0:10. The total throughput is zero since nobody hosts blocks 20:30. If we only consider the throughput after the current server moves, then no server will ever move (since if anyone moves to 10:20, the total throughput will be still zero). So the servers simulate that if they move to 10:20, some other server is likely to move to 20:30, and we'll have non-zero throughput in the end. Then they can decide that moving is actually worth it. Please refer to a draft of our new paper to find details of how it works: https://openreview.net/pdf?id=HLQyRgRnoXo (pages 19-20, Appendices D-E) |
I'm running some tests and here's one thing I found - these are only a few minutes apart: These log should_choose_new_blocks where it compares local_span.start == new_start at https://github.com/bigscience-workshop/petals/blame/063e94b4c8027e1e8d47061681007e9db292734f/src/petals/server/block_selection.py#L87 : These logs are just a few minutes apart. I'm running more tests now so I can get timestamps module_infos logs to investigate further. My suspicion is that, when a single server decides to choose new blocks, by the time it does, the start block is different. I'll be working to get real time module_infos data to mock and test. |
i think an easy way to solve this might be to recalculate 'throughputs' 2 times after new_start = _choose_best_start() in a loop waiting 1 minute between each calculation. return False if new_start isn't the same after each calculation. I have a feeling there may be some problems with this, though. If the problem is two servers colliding would each go through the process at the same time and turn out to have the same problem anyway? I'm testing this now on the live swarm to see if the bug crops while running the server this way:
|
These are some logs I've taken from running the above within server.py: '-- start new_start' You can see here that it has been working well to make sure unnecessary restarts do not happen. The start new_start line in the logs is from should_choose_other_blocks that shows the current start and suggested new start. It did fail a rebalancing here: as it ended up rebalancing twice. I don't know why that happened, but otherwise, this small change prevented unnecessary rebalancing at least 15x in a few days. I'll continue to use this in the newest versions on my server and keep logs with time stamps moving forward. I've created a pull request: Let me know if there should be any changes or other ways to move forward. |
just updating with some more logs: $ grep -E '--retry|choose_best' -B5 -A10 ./log-1693523246
-- choose best blocks is in the log and represents when 'choose_best_blocks' is run, when the blocks are reloaded. As you can see, over 5 days continually online, this edit has stopped this server from unnecessarily reloading. The last time this happened, that was probably because the swarm balance had already improved. |
Hey 👋,
I am opening this issue to discuss the current swarm balancing approach.
Recently I have seen that the public swarm hosting
enoch/llama-65b-hf
is unbalanced.This by itself is not a surprise nor a problem. The issue is then remediated by the server loading other blocks. All good so far.
Today I noticed that my server is loading the same blocks it had before. As the loading process is quite slow (often around 10 minutes), this basically takes away the compute capacity of that server from the swarm for 10 minutes without providing any benefit.
A log excerpt might explain the situation better:
Notice that
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
is loaded initially and also the exact same[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
to rebalance it.While this is an extreme example of the problem, I have seen (more often) that parts of the block lists overlap. In such cases, the overlapping blocks are still loaded from scratch instead of being reused.
Are there any obvious fixes for this behavior besides adjusting the
--balance_quality
setting or pinning blocks?Should we reorder the actions so that the new blocks will be selected before the decision is made to unload the blocks?
The text was updated successfully, but these errors were encountered: