Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLTF loading and animation performance regression #17535

Closed
aloucks opened this issue Jan 25, 2025 · 7 comments · Fixed by #17687
Closed

GLTF loading and animation performance regression #17535

aloucks opened this issue Jan 25, 2025 · 7 comments · Fixed by #17687
Labels
A-Animation Make things move and change over time A-ECS Entities, components, systems, and events A-Transform Translations, rotations and scales C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-Critical This must be fixed immediately or contributors or users will be severely impacted P-Regression Functionality that used to work but no longer does. Add a test for this! S-Needs-Investigation This issue requires detective work to figure out what's going wrong
Milestone

Comments

@aloucks
Copy link
Contributor

aloucks commented Jan 25, 2025

Bevy version

main
deb135c

Relevant system information

$ cargo --version -vvv
cargo 1.83.0 (5ffbef321 2024-10-29)
release: 1.83.0
commit-hash: 5ffbef3211a8c378857905775a15c5b32a174d3b
commit-date: 2024-10-29
host: x86_64-pc-windows-msvc
libgit2: 1.8.1 (sys:0.19.0 vendored)
libcurl: 8.9.0-DEV (sys:0.4.74+curl-8.9.0 vendored ssl:Schannel)
os: Windows 10.0.22631 (Windows 11 Professional) [64-bit]

What you did

cargo run --example scene_viewer --features animation assets/models/UndeadMale/scourgemale.gltf

What went wrong

The gltf file never finishes loading when running with the dev profile (or at least it takes a very very long time - I've killed the process after a few minutes). It will eventually load when running it with --release but it's slow and the animation playback is very choppy.

Additional information

The file loads nearly instantly with v0.15.1 and animation playback is smooth, even with the dev profile.

UndeadMale.zip

@aloucks aloucks added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Jan 25, 2025
@rparrett rparrett added C-Performance A change motivated by improving speed, memory usage or compile times P-Regression Functionality that used to work but no longer does. Add a test for this! A-Animation Make things move and change over time S-Needs-Investigation This issue requires detective work to figure out what's going wrong and removed S-Needs-Triage This issue needs to be labelled labels Jan 25, 2025
@rparrett
Copy link
Contributor

Are you able to bisect?

@aloucks
Copy link
Contributor Author

aloucks commented Jan 25, 2025

Are you able to bisect?

The first bad commit is 21f1e30

@rparrett rparrett added this to the 0.16 milestone Jan 27, 2025
@rparrett
Copy link
Contributor

rparrett commented Jan 28, 2025

This also seems reproducible with many_foxes.

commit fps
21f1e30~1 300
21f1e30 90

This seems to be mostly from a particular parallel query operation in transform propagation.

par_for_each{query="(bevy_ecs::entity::Entity, &bevy_ecs::hierarchy::Children, bevy_ecs::change_detection::Ref<bevy_transform::components::transform::Transform>, &mut bevy_transform::components::global_transform::GlobalTransform)" filter="bevy_ecs::query::filter::Without<bevy_ecs::hierarchy::Parent>"}

Image

It's possible that profiling the original scene_viewer + undead male repro might yield something else. I haven't tried yet.

edit: Seeing most time spent in the same place with the single undead male.

@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events A-Transform Translations, rotations and scales labels Jan 28, 2025
@cart
Copy link
Member

cart commented Jan 28, 2025

I was able to repo. This is the result of propagating over a million transforms per frame, when prior to #17398 we were propagating 28,000 per frame. We have the same number of transforms on both branches. This is actually caused by Children containing duplicates. I suspect that this is caused by buggy "scene spawn" code that spawns both a fully populated Children collection and a ChildOf, resulting in duplicates. This isn't possible via "normal" Rust flows, as you can't spawn a populated Children collection.

@cart
Copy link
Member

cart commented Jan 28, 2025

Looking at the source, Children is stored in the Scene's World, and the scene spawn logic duplicates that into the main World using FromReflect (which means the full Children are written on insert, bypassing the normal flow and resulting in duplicates).

@pcwalton
Copy link
Contributor

pcwalton commented Feb 2, 2025

If, like me, you're blocked because of this bug, the following diff will work around it:

diff --git a/crates/bevy_scene/src/scene.rs b/crates/bevy_scene/src/scene.rs
index 232acf345..df59a3bc0 100644
--- a/crates/bevy_scene/src/scene.rs
+++ b/crates/bevy_scene/src/scene.rs
@@ -1,7 +1,10 @@
+use core::any::TypeId;
+
 use crate::{DynamicScene, SceneSpawnError};
 use bevy_asset::Asset;
 use bevy_ecs::{
     entity::{hash_map::EntityHashMap, Entity, SceneEntityMapper},
+    hierarchy::Children,
     reflect::{AppTypeRegistry, ReflectComponent, ReflectMapEntities, ReflectResource},
     world::World,
 };
@@ -112,6 +115,10 @@ impl Scene {
                         .get_info(component_id)
                         .expect("component_ids in archetypes should have ComponentInfo");

+                    if component_info.type_id() == Some(TypeId::of::<Children>()) {
+                        continue;
+                    }
+
                     let registration = type_registry
                         .get(component_info.type_id().unwrap())
                         .ok_or_else(|| SceneSpawnError::UnregisteredType {

@rparrett rparrett added the P-Critical This must be fixed immediately or contributors or users will be severely impacted label Feb 2, 2025
@cart
Copy link
Member

cart commented Feb 3, 2025

As a heads up I'm working on a full fix, which is in the same category of thing as @pcwalton's fix above, just more generic (relies on expanding the "entity cloning" system so we can read "ignore" behavior here for all relationship targets). However in doing this, I identified various cases of tech debt that I figured we should pay the bill on. In addition to solving the clear logic/performance bug, this will also cut down on the boilerplate required to map entities for scene spawning and entity cloning (by making entity visiting a first class component behavior), improve the usability (and possibly performance) of EntityCloner, and open the doors to using EntityCloner infrastructure when spawning Scene(World), which should significantly improve scene spawning performance (although that bit will need to come later).

github-merge-queue bot pushed a commit that referenced this issue Feb 6, 2025
Fixes #17535

Bevy's approach to handling "entity mapping" during spawning and cloning
needs some work. The addition of
[Relations](#17398) both
[introduced a new "duplicate entities" bug when spawning scenes in the
scene system](#17535) and made the weaknesses of the current mapping
system exceedingly clear:

1. Entity mapping requires _a ton_ of boilerplate (implement or derive
VisitEntities and VisitEntitesMut, then register / reflect MapEntities).
Knowing the incantation is challenging and if you forget to do it in
part or in whole, spawning subtly breaks.
2. Entity mapping a spawned component in scenes incurs unnecessary
overhead: look up ReflectMapEntities, create a _brand new temporary
instance_ of the component using FromReflect, map the entities in that
instance, and then apply that on top of the actual component using
reflection. We can do much better.

Additionally, while our new [Entity cloning
system](#16132) is already pretty
great, it has some areas we can make better:

* It doesn't expose semantic info about the clone (ex: ignore or "clone
empty"), meaning we can't key off of that in places where it would be
useful, such as scene spawning. Rather than duplicating this info across
contexts, I think it makes more sense to add that info to the clone
system, especially given that we'd like to use cloning code in some of
our spawning scenarios.
* EntityCloner is currently built in a way that prioritizes a single
entity clone
* EntityCloner's recursive cloning is built to be done "inside out" in a
parallel context (queue commands that each have a clone of
EntityCloner). By making EntityCloner the orchestrator of the clone we
can remove internal arcs, improve the clarity of the code, make
EntityCloner mutable again, and simplify the builder code.
* EntityCloner does not currently take into account entity mapping. This
is necessary to do true "bullet proof" cloning, would allow us to unify
the per-component scene spawning and cloning UX, and ultimately would
allow us to use EntityCloner in place of raw reflection for scenes like
`Scene(World)` (which would give us a nice performance boost: fewer
archetype moves, less reflection overhead).

## Solution

### Improved Entity Mapping

First, components now have first-class "entity visiting and mapping"
behavior:

```rust
#[derive(Component, Reflect)]
#[reflect(Component)]
struct Inventory {
    size: usize,
    #[entities]
    items: Vec<Entity>,
}
```

Any field with the `#[entities]` annotation will be viewable and
mappable when cloning and spawning scenes.

Compare that to what was required before!

```rust
#[derive(Component, Reflect, VisitEntities, VisitEntitiesMut)]
#[reflect(Component, MapEntities)]
struct Inventory {
    #[visit_entities(ignore)]
    size: usize,
    items: Vec<Entity>,
}
```

Additionally, for relationships `#[entities]` is implied, meaning this
"just works" in scenes and cloning:

```rust
#[derive(Component, Reflect)]
#[relationship(relationship_target = Children)]
#[reflect(Component)]
struct ChildOf(pub Entity);
```

Note that Component _does not_ implement `VisitEntities` directly.
Instead, it has `Component::visit_entities` and
`Component::visit_entities_mut` methods. This is for a few reasons:

1. We cannot implement `VisitEntities for C: Component` because that
would conflict with our impl of VisitEntities for anything that
implements `IntoIterator<Item=Entity>`. Preserving that impl is more
important from a UX perspective.
2. We should not implement `Component: VisitEntities` VisitEntities in
the Component derive, as that would increase the burden of manual
Component trait implementors.
3. Making VisitEntitiesMut directly callable for components would make
it easy to invalidate invariants defined by a component author. By
putting it in the `Component` impl, we can make it harder to call
naturally / unavailable to autocomplete using `fn
visit_entities_mut(this: &mut Self, ...)`.

`ReflectComponent::apply_or_insert` is now
`ReflectComponent::apply_or_insert_mapped`. By moving mapping inside
this impl, we remove the need to go through the reflection system to do
entity mapping, meaning we no longer need to create a clone of the
target component, map the entities in that component, and patch those
values on top. This will make spawning mapped entities _much_ faster
(The default `Component::visit_entities_mut` impl is an inlined empty
function, so it will incur no overhead for unmapped entities).

### The Bug Fix

To solve #17535, spawning code now skips entities with the new
`ComponentCloneBehavior::Ignore` and
`ComponentCloneBehavior::RelationshipTarget` variants (note
RelationshipTarget is a temporary "workaround" variant that allows
scenes to skip these components. This is a temporary workaround that can
be removed as these cases should _really_ be using EntityCloner logic,
which should be done in a followup PR. When that is done,
`ComponentCloneBehavior::RelationshipTarget` can be merged into the
normal `ComponentCloneBehavior::Custom`).

### Improved Cloning

* `Option<ComponentCloneHandler>` has been replaced by
`ComponentCloneBehavior`, which encodes additional intent and context
(ex: `Default`, `Ignore`, `Custom`, `RelationshipTarget` (this last one
is temporary)).
* Global per-world entity cloning configuration has been removed. This
felt overly complicated, increased our API surface, and felt too
generic. Each clone context can have different requirements (ex: what a
user wants in a specific system, what a scene spawner wants, etc). I'd
prefer to see how far context-specific EntityCloners get us first.
* EntityCloner's internals have been reworked to remove Arcs and make it
mutable.
* EntityCloner is now directly stored on EntityClonerBuilder,
simplifying the code somewhat
* EntityCloner's "bundle scratch" pattern has been moved into the new
BundleScratch type, improving its usability and making it usable in
other contexts (such as future cross-world cloning code). Currently this
is still private, but with some higher level safe APIs it could be used
externally for making dynamic bundles
* EntityCloner's recursive cloning behavior has been "externalized". It
is now responsible for orchestrating recursive clones, meaning it no
longer needs to be sharable/clone-able across threads / read-only.
* EntityCloner now does entity mapping during clones, like scenes do.
This gives behavior parity and also makes it more generically useful.
* `RelatonshipTarget::RECURSIVE_SPAWN` is now
`RelationshipTarget::LINKED_SPAWN`, and this field is used when cloning
relationship targets to determine if cloning should happen recursively.
The new `LINKED_SPAWN` term was picked to make it more generically
applicable across spawning and cloning scenarios.

## Next Steps

* I think we should adapt EntityCloner to support cross world cloning. I
think this PR helps set the stage for that by making the internals
slightly more generalized. We could have a CrossWorldEntityCloner that
reuses a lot of this infrastructure.
* Once we support cross world cloning, we should use EntityCloner to
spawn `Scene(World)` scenes. This would yield significant performance
benefits (no archetype moves, less reflection overhead).

---------

Co-authored-by: eugineerd <[email protected]>
Co-authored-by: Alice Cecile <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Feb 8, 2025
This PR makes Bevy keep entities in bins from frame to frame if they
haven't changed. This reduces the time spent in `queue_material_meshes`
and related functions to near zero for static geometry. This patch uses
the same change tick technique that #17567 uses to detect when meshes
have changed in such a way as to require re-binning.

In order to quickly find the relevant bin for an entity when that entity
has changed, we introduce a new type of cache, the *bin key cache*. This
cache stores a mapping from main world entity ID to cached bin key, as
well as the tick of the most recent change to the entity. As we iterate
through the visible entities in `queue_material_meshes`, we check the
cache to see whether the entity needs to be re-binned. If it doesn't,
then we mark it as clean in the `valid_cached_entity_bin_keys` bit set.
If it does, then we insert it into the correct bin, and then mark the
entity as clean. At the end, all entities not marked as clean are
removed from the bins.

This patch has a dramatic effect on the rendering performance of most
benchmarks, as it effectively eliminates `queue_material_meshes` from
the profile. Note, however, that it generally simultaneously regresses
`batch_and_prepare_binned_render_phase` by a bit (not by enough to
outweigh the win, however). I believe that's because, before this patch,
`queue_material_meshes` put the bins in the CPU cache for
`batch_and_prepare_binned_render_phase` to use, while with this patch,
`batch_and_prepare_binned_render_phase` must load the bins into the CPU
cache itself.

On Caldera, this reduces the time spent in `queue_material_meshes` from
5+ ms to 0.2ms-0.3ms. Note that benchmarking on that scene is very noisy
right now because of #17535.

![Screenshot 2025-02-05
153458](https://github.com/user-attachments/assets/e55f8134-b7e3-4b78-a5af-8d83e1e213b7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Animation Make things move and change over time A-ECS Entities, components, systems, and events A-Transform Translations, rotations and scales C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-Critical This must be fixed immediately or contributors or users will be severely impacted P-Regression Functionality that used to work but no longer does. Add a test for this! S-Needs-Investigation This issue requires detective work to figure out what's going wrong
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants