Import a large glb file (778MB) which contains 800 models will crash the editor. #93587

AllenDang · 2024-06-25T07:27:09Z

Tested versions

4.2 stable

System information

macOS 14.5 - forward+ - godot 4.2 stable

Issue description

Import a large glb file (778MB) which contains 800 models will crash the editor.

Steps to reproduce

Create a new project.
Drag and drop the large glb file into editor.

Minimal reproduction project (MRP)

Here is the glb file
https://drive.google.com/file/d/1f74-29422AmZQJohng74ySdELGJptgSA/view?usp=sharing

fire · 2024-06-25T11:19:51Z

Can you check 4.3? The cow data size was increased to a larger number

Sluggernot · 2024-06-25T12:49:05Z

Tried on latest from github (4 or 5 days ago). I hang on import. Restarting the editor restarts and re-hangs the import, automatically.
For some reason my Attach to Process is being disconnected and reattaching it doesnt show me the Call Stack. (Mind currently blown.)
Just pulled latest and recompiling.

lvcivs · 2024-06-25T13:05:04Z

I tried this on 4.3.beta2.official and although it was very slow, it did eventually load after about 6 minutes (during the whole time it appeared stuck at 0%).

Opening the scene took a couple more minutes:

This was on Ubuntu 24.04. Edit: Godot uses about 9 GB of RAM with this scene open.

AllenDang · 2024-06-25T14:02:57Z

@lvcivs I created this file just for testing purpose, want to see how godot will handle it :P

JekSun97 · 2024-06-25T19:58:20Z

After transferring the model to Godot 4.3 beta2, it still didn’t load for me, I waited 28 minutes, then closed it.
I also tested this on Blender 3.6.2, waited 3 minutes and Blender closed itself, which didn't happen with Godot.

Godot v4.3.beta2 - Windows 10.0.19045 - Vulkan (Mobile) - dedicated Radeon RX 560 Series (Advanced Micro Devices, Inc.; 31.0.14001.45012) - Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz (4 Threads)

fire · 2024-06-26T01:15:46Z

Next steps is to get profiles for the load.

My recommendations is use either https://github.com/mstange/samply or https://superluminal.eu/

Sluggernot · 2024-06-26T01:42:02Z

Yes. I have been able to load the file. I did some quick benchmarking with Visual Studio and have a couple of very small efficiencies made locally. I need to benchmark the before and after when I get some really good changes made to this.
Main finding is that _parse_meshes is the main function loading this file. My changes are to GenerateSharedVerticesIndexList and one small one to static SVec3 GetPosition().

fire · 2024-06-26T01:45:54Z

I will try to review any pull requests that can improve load times on the 777mb glb with nothing broken.

Sluggernot · 2024-06-26T01:47:30Z

Oh... Nothing broken? Ah, nevermind then.
Really, yes my first challenge is proving that it is faster.
Thanks!

Created in response to: godotengine#93587 Large glb file (778 MB) would hang, crash or load after extremely long time. This first set of optimizations focuses on sending SVec3 objects by reference instead of by value. Local tests in debug mode caused one iteration of GenerateSharedVerticesIndexList to go from 552ms to 470ms, on average. Unsure of the performance gains on release mode.

Sluggernot · 2024-06-26T04:04:44Z

Ok, I didnt know github would add these comments from my own fork because I referenced the Issue in the description. I will be avoiding that in the future.

zeux · 2024-06-28T03:17:43Z

Since I ended up looking into this a little bit, I'll share my findings in hopes that it will help.

Measured by clicking "Reimport" on the scene in an otherwise empty project, --verbose says import took 276 seconds (that's a little under 5 minutes).
Note that the scene has ~800 meshes that add up to ~39.3M triangles (~50k each, looks reasonably uniformly distributed). Overall I would have expected one mesh per scene here, but I'm not familiar with how Godot workflows work, and it's a good stress test regardless.

perf profile on Linux / editor build with default settings with fno-omit-frame-pointers -- please note that timings add up to 45% (perf doesn't normalize them):

Renormalizing the percentages by dividing by 0.45, and focusing on significant underlying components, we get:

5% scene save
14% tangent space generation
25% normal reprojection after LOD generation (raycasts)
29% simplification (meshopt_simplify)
24% the rest of generate_lods (it's inlined here so hard to see from the profile exactly)

In aggregate, LOD generation takes ~78% here, so definitely good to focus on that. When looking at something like a 5-minute import though, my expectations are usually that small gains are not terribly exciting, so something more significant needs to happen.

A note on the scale here: each mesh gets approximately 6 LOD levels generated. The work for meshopt_simplify scales with that; the work for normal reprojection scales with the total number of rays, which scales with the total number of triangles in all LODs, times the area factor - looks like we cast 16..64 rays which is a lot of rays :)

If I were tackling this problem, I would entertain the following projects:

For scenes with many large meshes like this, my first goal would be to process meshes in parallel. I'm not familiar with the details of ImporterMesh code but superficially nothing should prevent fully generating each mesh in parallel. Maybe that requires refactoring some of this code to actually be thread-safe. It would also require making sure that the dependent code is thread-safe internally - meshopt definitely is, I assume so is Embree, but some care would be required. That alone would probably get this to be under a minute on an 8-core system if we discount tangent space generation.
I'm skeptical that tangent space generation is efficient here. For a sense of scale, meshopt_simplify does a fair bit more work per call, and it's called ~6 times per mesh here and still only takes twice as much time. I would assume tangent space generation has internal algorithmic inefficiencies and could be improved, but I haven't looked at that code myself.

I would not advise trying to optimize the internals of meshopt_simplify (trust me...). Some small future performance improvements are planned here in meshoptimizer but largely speaking unless this runs into some edge case, which it doesn't look like it does to me, it should be very well tuned already. Same for Embree - I would assume it's impractical to optimize that to the degree that is relevant here. However:

I would certainly think of, at the minimum, reducing the amount of requested work from both meshopt_simplify and Embree here. Notably, meshopt_simplify is called approximately 6 times per mesh here and is asked to generate larger and larger meshes. Because of this, it does more or less the same amount of work: simplifying the mesh 2x is almost the same effort as simplifying the mesh 10x (... well, not quite, but it gets there quickly). However, in LOD chain generations you can usually generate the LODs in the opposite direction: start by requesting a ~1.5x smaller mesh, if that target is reached, ask for ~1.5x smaller mesh again, etc. I don't recall why the order here is reversed but I would consider flipping it and simplifying from the last LOD. I don't think that's going to reduce the work here 6x, but I would expect something like 3-4x improvement in cost to call simplify.
In a similar vein, casting 16-64 rays per triangle is a lot, especially for higher levels of detail. I would probably reduce this in general or at least scale this as the LOD levels get closer to original mesh: in the limit, we're casting at least 16 rays per triangle here for something that only has 1.5x fewer triangles than original mesh, and that just feels wasteful. This has a risk of reducing the quality of the resulting normals because there's a higher chance of missing the mesh or hitting a wrong triangle. Maybe ray casts here aren't the right fit and averaging triangle normals from triangles that are in a bounding sphere of the generated triangle is better, but this brings me to my final point:
We've already discussed this at some point in another issue, but overall I'm not 100% sure the current normal processing in the importer for LODs is generally beneficial. With the normal aware simplifier with the recent fixes, generally speaking I'd expect decent normals to come out of the simplifier itself. Sometimes that's not the case, but I'm not sure the ray cast logic is perfect either, and it's just a lot of complexity to always keep in mind. I do think the reindexing that happens in this code is beneficial for some faceted meshes though. So a good use of time would be to perhaps introduce an option for normal reprojection that would disable the ray cast based normal recreation (I'd expect that alone cuts half of the overhead of LOD generation here), test the option in a release, then maybe default it to skip the normal recreation and see if this comes up.

Hopefully this is helpful :) I would be happy to discuss (3)/(5) further and/or maybe contribute a patch or two as I'm generally interested in making sure simplification integration is working well for Godot; I'll leave 1/2/4 to others if they are motivated to work on this.

zeux · 2024-06-28T23:16:26Z

On "I'm not 100% sure the current normal processing in the importer for LODs is generally beneficial", I decided to do a quick comparison on the scene from this file. It looks like it's easy to disable normal override, basically just need to disable the ray caster creation (as mentioned earlier, I believe current splitting logic to be generally beneficial for faceted meshes). I then look at a few low LODs (where the risk of picking a bad normal due to ray casts is maximized), by tuning the LOD bias to be a very small value.

On the left (yes, left, I double checked!) is the import without using the raycaster. On the right is current master (raycaster enabled). Both levels are at ~2200 triangles. I see somewhat similar issues on a few other models - this is not universal, this happened to be the first model I checked, and some models from this scene look about the same with or without the raycaster enabled. But this to me is strong evidence that raycaster should be optional, and probably opt-in.

I've switched to using a smaller version of the scene from the original post (that one has 800 meshes but each mesh is duplicated 8 times, I've switched to a deduplicated version where there's only 100 meshes, easier to work with and faster to reimport). Reimport takes 37 seconds on master and 22 seconds without raycaster enabled.

Sluggernot · 2024-06-28T23:22:07Z

Wow, well that is surprising. Are there any examples where the raycaster was better in visual fidelity. (I understand that's somewhat subjective but your above screenshot feels fairly objective as to which is "better.")
I've been diving further into this section of code throughout the day today, attempting to rally myself before trying multithreading. I really appreciate your write-up. This is absolutely great to see!

fire · 2024-06-28T23:55:23Z

As someone who works on this, I am supporting changes that improves quality and performance. Can review and help test.

Saul2022 · 2024-08-20T16:31:16Z

After trying to import this glb file on a s23 + mobile it ends up crashing after some time, so this does not look to be the cow fault , using #93064
as it the fastest when loading big project along with the other pr which still causes it to crash on reimport.

fire · 2024-08-20T16:38:24Z

@Saul2022 does it also crash on your pc?

Edited:

I would expect like 10-20 gigabytes of cpu ram to be used too.

Saul2022 · 2024-08-20T16:45:57Z

@Saul2022 does it also crash on your pc?

Can't test on pc, sorry , it dead, only black screen despite the light thing is working, so prob screen issue.

Edit: Also tried without lods or shadow mesh, lightbake enabled , by adjudting it on import defaults, and still crashes , so it not lods..

anderlli0053 · 2024-08-26T22:01:51Z

I've tried this with v4.3.stable.official [77dcf97] and this is the resulted Godot's memory crash dump:

godot.exe.14296.zip

My specs:

fire · 2024-08-26T22:54:45Z

I suspect that developers loading that 3d asset require more than 16GB of ram.

We can check how big the difference is. If the requirements is closer to 32 gb then it's a lot harder than like 18gb.

Godot Engine 4.3-stable

Edited:

I'll try to get a cpu usage chart via samply or https://superluminal.eu/ using a custom build of 4.3-stable

Edited:

Download https://drive.google.com/file/d/1f74-29422AmZQJohng74ySdELGJptgSA/view?usp=sharing
Apple M2 Pro with 32GB of ram.
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/mstange/samply/releases/download/samply-v0.12.0/samply-installer.sh | sh
scons production=yes debug_symbols=yes @ https://github.com/godotengine/godot/releases/tag/4.3-stable
./bin/godot.macos.editor.arm64 #create a new-game-project
rm -rf ~/Documents/new-game-project/.godot
samply record ./bin/godot.macos.editor.arm64 -e --path ~/Documents/new-game-project/
Drag asset gltf file.
Open asset gltf file as a scene.
Firefox Profiler with stack traces! https://share.firefox.dev/3AH8zLh
I saw around 19 GB of max usage, but I don't have logging.

Godot Engine master

Edited:

Download https://drive.google.com/file/d/1f74-29422AmZQJohng74ySdELGJptgSA/view?usp=sharing
Apple M2 Pro with 32GB of ram.
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/mstange/samply/releases/download/samply-v0.12.0/samply-installer.sh | sh
scons production=yes debug_symbols=yes @ db76de5
./bin/godot.macos.editor.arm64 #create a new-game-project
rm -rf ~/Documents/new-game-project/.godot
samply record ./bin/godot.macos.editor.arm64 -e --path ~/Documents/new-game-project/
Drag asset gltf file.
Open asset gltf file as a scene.
Around 18 GB of max ram usage during import
Around 9GB when using the internal godot engine formats and loaded the 3d asset in the editor.
Firefox Profiler with stack traces! https://share.firefox.dev/3X4zE2k

Notes

Disable normal raycaster for LOD generation by default #93727 is expected to reduce memory usage.
May be able to optimize import runtime by making GLTFDocument::_parse_image_save_image parallel @ db76de5

fire · 2024-08-26T23:40:48Z

What is the expected behaviour if we exceed the ram (like 19gb usage on a 16 gb - 14gb free) on the system?

Edited:

Personally I think requiring more ram and crashing is expected on large datasets.

We can attempt to use less memory, but there will always be a dataset that exceeds a limit.

Saul2022 · 2024-08-27T07:19:07Z

We can attempt to use less memory, but there will always be a dataset that exceeds a limit.

Ye i guess, i tried with multithread import off, vsync dissable and continous update ,but still crash. Though the image files did import though, except the glb scene. Maybe to avoid crash instead of crashing the engine, make it so before a crash happens, quit the import process and print an error message about not enough ram to import the scene.

Chaosus added bug topic:import crash topic:3d labels Jun 25, 2024

Sluggernot mentioned this issue Jun 26, 2024

Optimizations to mikktspace.c #93615

Open

zeux mentioned this issue Jun 28, 2024

A potential performance improvement to rankEdgeCollapses zeux/meshoptimizer#714

Closed

zeux mentioned this issue Jun 29, 2024

Disable normal raycaster for LOD generation by default #93727

Merged

Saul2022 mentioned this issue Aug 23, 2024

Godot crashes when importing project data #95918

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import a large glb file (778MB) which contains 800 models will crash the editor. #93587

Import a large glb file (778MB) which contains 800 models will crash the editor. #93587

AllenDang commented Jun 25, 2024

fire commented Jun 25, 2024

Sluggernot commented Jun 25, 2024

lvcivs commented Jun 25, 2024 •

edited

Loading

AllenDang commented Jun 25, 2024 •

edited

Loading

JekSun97 commented Jun 25, 2024

fire commented Jun 26, 2024

Sluggernot commented Jun 26, 2024

fire commented Jun 26, 2024

Sluggernot commented Jun 26, 2024

Sluggernot commented Jun 26, 2024

zeux commented Jun 28, 2024

zeux commented Jun 28, 2024

Sluggernot commented Jun 28, 2024

fire commented Jun 28, 2024 •

edited

Loading

Saul2022 commented Aug 20, 2024

fire commented Aug 20, 2024 •

edited

Loading

Saul2022 commented Aug 20, 2024 •

edited

Loading

anderlli0053 commented Aug 26, 2024 •

edited

Loading

fire commented Aug 26, 2024 •

edited

Loading

fire commented Aug 26, 2024 •

edited

Loading

Saul2022 commented Aug 27, 2024

Import a large glb file (778MB) which contains 800 models will crash the editor. #93587

Import a large glb file (778MB) which contains 800 models will crash the editor. #93587

Comments

AllenDang commented Jun 25, 2024

Tested versions

System information

Issue description

Steps to reproduce

Minimal reproduction project (MRP)

fire commented Jun 25, 2024

Sluggernot commented Jun 25, 2024

lvcivs commented Jun 25, 2024 • edited Loading

AllenDang commented Jun 25, 2024 • edited Loading

JekSun97 commented Jun 25, 2024

fire commented Jun 26, 2024

Sluggernot commented Jun 26, 2024

fire commented Jun 26, 2024

Sluggernot commented Jun 26, 2024

Sluggernot commented Jun 26, 2024

zeux commented Jun 28, 2024

zeux commented Jun 28, 2024

Sluggernot commented Jun 28, 2024

fire commented Jun 28, 2024 • edited Loading

Saul2022 commented Aug 20, 2024

fire commented Aug 20, 2024 • edited Loading

Saul2022 commented Aug 20, 2024 • edited Loading

anderlli0053 commented Aug 26, 2024 • edited Loading

fire commented Aug 26, 2024 • edited Loading

Godot Engine 4.3-stable

Godot Engine master

Notes

fire commented Aug 26, 2024 • edited Loading

Saul2022 commented Aug 27, 2024

lvcivs commented Jun 25, 2024 •

edited

Loading

AllenDang commented Jun 25, 2024 •

edited

Loading

fire commented Jun 28, 2024 •

edited

Loading

fire commented Aug 20, 2024 •

edited

Loading

Saul2022 commented Aug 20, 2024 •

edited

Loading

anderlli0053 commented Aug 26, 2024 •

edited

Loading

fire commented Aug 26, 2024 •

edited

Loading

fire commented Aug 26, 2024 •

edited

Loading