-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid potential place where the world map could be modified after its iterator is created #8315
Conversation
…et that indicates the levels are being ticked.
Hello, any idea when this will be implemented? |
I don't know when it'll be implemented, whenever the Paper devs pick it up I suppose. It's not really an important fix though. All it does is avoid a potentially problematic situation where a world might be ticked after it is unloaded. Thanks to the Spigot "fix" mentioned above this PR might actually be redundant, though following #8300, I'm erring on the safe side of avoiding any potential unexpected problems. With or without this PR, the server will probably not crash. Since you're asking about it, I have to ask: did you encounter a concrete issue this PR would fix? It'd be nice to be able to point at a real situation this PR could measurably improve. |
I own a plugin by the name of CyberWorldReset, which resets worlds via command or scheduler/timer with a bunch of preset values. I recently had reports of an issue where the server would freeze/crash after a thread dump. This occurred when the world was created. Being a somewhat responsible dev, I decided to research this issue. After a few Google searches, I ran into Multiverse/Multiverse-Core#2800. It just so happened that you posted and referred to this PR. Since then, I have been waiting for this PR to be merged. Am I in the right place? |
The linked thread post is 100% irrelevant to this, the linked post is down to the server taking too long to tick due to loading a world due to sync thread loads which creating a world creates if you don't mitigate that; This PR solves an individual concern, out of the many dozen, in regards to unloading worlds mid-tick |
Hey, thanks for the reply. What should I do in this scenario? I really want to fix this issue. |
This doesn't prevent any CME, because the CME is already catch by the flag, this exception occur when you modify and iterate the world list in the same time. But in this case, even if a plugin load/unload a world into the process queue, there are no way to iterate while modify the list because the field is local and that the process queue is sync to main thread. In short the iteration of the list is already safe (see iterator#next/time loop) and happens later into the server tick. But it prevent well the unsafe unload which is a good point but not really related to the issue. |
If the server isn't crashing (it's still running once the error messages go away) then that is a different "problem". Creating a world can sometimes just be expensive and take a while. #8177 might be the PR you want to watch. I think the third point in the TODO list is most helpful. It sounds like there are plans to make chuck generation asynchronous to free the main thread of that burden, which should help world creation not stall the main thread. Right now, you can try increasing the settings.timeout-time value in your spigot.yml. From your log, it looks like it took 22 seconds to load the world, so if you increase timeout-time to more than that, you can at least avoid the watchdog errors. This won't make the world load any faster, but it should remove the error messages. |
I think I understand what you mean and I disagree, but correct me if what I'm about to say actually agrees with you. Multiverse/Multiverse-Core#2800 is an example that shows that a world unload in the process queue did cause a CME on older versions of Paper (1.19 build 91 at least). Something in their code triggered
which is because the levels LinkedHashMap was modified after it's iterator was created (it didn't matter that this was the first call: the iterator was created, the original map was modified, then the iterator was used, so it throws a CME). In the latest Paper build, this CME dose not happen because of Spigot's fix. The Even though the server dose not crash, #8300 established that Paper is wary of accepting Spigot's fix and just allowing a world to be ticked after it is unloaded. This PR should make it impossible for a plugin to unload a world after the iterator is created since it makes it so there is no time where the iterator is created and the flag is not set. Bringing it back to your comment, the flag dose not catch this CME, the Spigot fix dose. However, according to Paper this still doesn't make the iterator over the levels map safe, as they're not just going to use the Spigot fix for the CME. |
After relooking at this nevermind you've right the CME was masked by the spigot fix during my test, and the CME can occur when i have tried in the PlayerChatEvent. But this is still a niche case bc this event is deprecated (for it's main thread use) and it's strange to load/unload a world in this event |
What's the consensus on the necessity of this PR? I wouldn't guess pulling this is a question of whether or not Paper wants to align with Spigot on the issue of #8300 since either way this PR is equally useful/useless. Currently, Paper is not aligned, so this PR helps avoid the 'dangerous' situation of a world being ticked after its resources are closed by something in the process queue. What I got from #8300 is that Paper wants to be cautious here, and this PR helps Paper be cautious. If Paper (eventually) decides to align with Spigot, it would still be as easy as removing the single patch file. In the meantime, this PR would still help Paper be cautious while they figure out whether or not to align. Speaking of Paper deciding to align with Spigot, what would it take to address the concerns of this comment and finalize the decision there? Is it even possible to definitively prove loading/unloading worlds while they are being ticked safe or unsafe? The only way I can think of proving that is to test every conceivable situation or read every line of related code. As this comment describes, that probably isn't possible, and there's no guarantee that behavior would continue to hold in the future. Coming back to this issue, the logic for pulling this PR now assumes this PR is useful. As Lulu13022002 said, it might be strange to load/unload a world during the PlayerChatEvent or equivalent. I disagree, using Multiverse/Multiverse-Core#2800 as a counterexample to that. I think a good chat-based guided command system can be very intuitive to use, and it's reasonable that a plugin developer might try to create such a system to manage and unload worlds. As this comment mentions, other events could probably also be used to cause this dangerous state, and making this change would document that for plugin developers (I still think updating the documentation of Bukkit#createWorld and Bukkit#unloadWorld might be useful for plugin developers as well). What do other people think? Is this PR too niche to be pulled as Lulu13022002 suggests, or does it help further the cause of #8300 by being cautious with allowing worlds to tick after having their resources shut down? How does resolving Spigot alignment on this issue continue? Can a decision be made now or is there a concrete path forwards to make progress with a final decision? |
having had a bit of a deeper look, there is some good news here in that mojangs mailbox mess has allowed them to isolate the work between worlds, which means that there is much less concerns around unloading worlds mid tick as the chunk system is more contained as they can just shutdown the isolated mailbox rather than dealing with the global work queues we had to deal with; the fact that stuff is flushed out, things are unloaded, etc, appears to help mitigate some other further concerns having tested for a few minutes having disabled the check, even without these set of changes, it seems to cope; I'd imagine in part mostly for the improvements mojang has made over the past few dozen releases in isolating this mess. I think in part we should expose the iterating value and let plugin authors decide what to do, there's 0 promise ever that this will be safe, I feel iffy on this given that these types of issues will end up at our door before the plugin devs un/loading worlds in precarious places, which is ofc the biggest headache in this type of stuff |
The code in net.minecraft.server.MinecraftServer previously looked like this:
The flag added by Paper,
isIteratingOverLevels
, is used to preventCraftServer#createWorld
andCraftServer#unloadWorld
from adding or removing from the levels map while the levels map is being iterated through. However, in between the Iterator being created and the flag being set, the process queue and time updates are processed. The time updates are probably not an issue, but the code run by the process queue could potentially try to load or unload a world. As Multiverse/Multiverse-Core#2800 discovered, this causes a crash on older versions of Paper.With this PR, the code in net.minecraft.server.MinecraftServer now looks like this:
Simple fix,
Iterator iterator = this.getAllLevels().iterator();
was just moved down to after the processQueue and time updates. Now, there is no instance where the levels iterator has been created butisIteratingOverLevels
has been set. Following a minimal diff policy, I did not replace the Iterator+while loop with an enhanced for loop.A note on the importance of this change:
I said earlier that the current code had the potential to crash on old versions of Paper. The example from Multiverse was using Paper#91. On the current version of Paper, doing the same thing that the Multiverse user did would likely not cause a crash. This is because this problem was already "fixed" by SPIGOT-7089, which was resolved by making the levels map effectively copy-on-write, avoiding the ConcurrentModificationException that crashed the server. This fix has been in Paper since Paper#126.
However, following the precedence of #8300, Paper aims to avoid the unsafe operation of unloading a world while it is being ticked for its potential to cause crashes in the future. Without this PR, it is possible for a world to be unloaded by a Runnable in the processQueue, then ticked after it has been unloaded. To avoid this potential problem, this PR should be included.