-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Fixed cluster sampling issues for volumetric point and spot lights #23387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -254,7 +254,7 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> { | |
|
|
||
| // Calculate where we are in the ray. | ||
| let P_world = Ro_world + Rd_world * f32(step) * step_size_world; | ||
| let P_view = Rd_view * f32(step) * step_size_world; | ||
| let P_view = view_start_pos + Rd_view * f32(step) * step_size_world; | ||
|
|
||
| var density = density_factor; | ||
| #ifdef DENSITY_TEXTURE | ||
|
|
@@ -336,35 +336,36 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> { | |
| } | ||
|
|
||
| // Point lights and Spot lights | ||
| let view_z = view_start_pos.z; | ||
| let is_orthographic = view.clip_from_view[3].w == 1.0; | ||
| let cluster_index = clustering::view_fragment_cluster_index(frag_coord.xy, view_z, is_orthographic); | ||
| var clusterable_object_index_ranges = | ||
| clustering::unpack_clusterable_object_index_ranges(cluster_index); | ||
| for (var i: u32 = clusterable_object_index_ranges.first_point_light_index_offset; | ||
| i < clusterable_object_index_ranges.first_reflection_probe_index_offset; | ||
| i = i + 1u) { | ||
| let light_id = clustering::get_clusterable_object_id(i); | ||
| let light = &clustered_lights.data[light_id]; | ||
| if (((*light).flags & POINT_LIGHT_FLAGS_VOLUMETRIC_BIT) == 0) { | ||
| continue; | ||
| } | ||
|
|
||
| // Reset `background_alpha` for a new raymarch. | ||
| background_alpha = 1.0; | ||
| // Reset `background_alpha` for a new raymarch. | ||
| background_alpha = 1.0; | ||
|
|
||
| // Start raymarching. | ||
| for (var step = 0u; step < step_count; step += 1u) { | ||
| // As an optimization, break if we've gotten too dark. | ||
| if (background_alpha < 0.001) { | ||
| break; | ||
| } | ||
| // Start raymarching. | ||
| for (var step = 0u; step < step_count; step += 1u) { | ||
| // As an optimization, break if we've gotten too dark. | ||
| if (background_alpha < 0.001) { | ||
| break; | ||
| } | ||
|
|
||
| // Calculate where we are in the ray. | ||
| let P_world = Ro_world + Rd_world * f32(step) * step_size_world; | ||
| let P_view = Rd_view * f32(step) * step_size_world; | ||
| // Calculate where we are in the ray. | ||
| let P_world = Ro_world + Rd_world * f32(step) * step_size_world; | ||
| let P_view = view_start_pos + Rd_view * f32(step) * step_size_world; | ||
|
|
||
| var density = density_factor; | ||
| var density = density_factor; | ||
| var sample_color = vec3(0.0); | ||
|
|
||
| let cluster_index = clustering::view_fragment_cluster_index(frag_coord.xy, P_view.z, is_orthographic); | ||
| var clusterable_object_index_ranges = clustering::unpack_clusterable_object_index_ranges(cluster_index); | ||
| for (var i: u32 = clusterable_object_index_ranges.first_point_light_index_offset; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if flipping these loops around between for each cluster and for each step introduces any performance regression. while it's definitely the right approach, this requires recomputing the cluster each step meaning more work per fragment. I wonder how/if we can save this. Optimization could be left for later anyway in a later PR.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for looking at this! You are right, there is likely some overhead related to finding the right cluster slice for every sample. I think a relatively easy optimization would be to cache the cluster-able object indices, computing them only once per cluster instead of for every sample. Then I re-compute the cluster index + object indices as soon as the ray exits the current cluster's z-boundary. I can look into it and try to add it to this PR.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I looked into this and came up with a solution that avoids the (expensive) view -> cluster lookups altogether, but it requires some pre-computation:
Inside the
This way we only increase the Let me know what you think. Also if we want to go ahead with it, should it be part of this PR or a new one?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for looking into this - I think it's best if we keep this change in a separate PR and keep the scope of this one small. Since it's a visual bugfix first I'm going to approve this one knowing theres a way forward for performance.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. on the volumetric fog example on mac I see these numbers (doesn't have many lights) for a proper performance comparison I suggest a stress test volumetric fog with many lights in this case 32: I see a clear performance regression, took 2 measurements per branch this branch: 192.77ms 186.10ms
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Posted a branch might help with your testing https://github.com/mate-h/bevy/tree/m/volume-many-lights
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Ill have a look at it. |
||
| i < clusterable_object_index_ranges.first_reflection_probe_index_offset; | ||
| i = i + 1u) | ||
| { | ||
| let light_id = clustering::get_clusterable_object_id(i); | ||
| let light = &clustered_lights.data[light_id]; | ||
| if (((*light).flags & POINT_LIGHT_FLAGS_VOLUMETRIC_BIT) == 0) { | ||
| continue; | ||
| } | ||
|
|
||
| let light_to_frag = (*light).position_radius.xyz - P_world; | ||
| let V = Rd_world; | ||
|
|
@@ -386,7 +387,6 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> { | |
| if ((*light).flags & POINT_LIGHT_FLAGS_SPOT_LIGHT_Y_NEGATIVE) != 0u { | ||
| spot_dir.y = -spot_dir.y; | ||
| } | ||
| let light_to_frag = (*light).position_radius.xyz - P_world; | ||
|
|
||
| // calculate attenuation based on filament formula https://google.github.io/filament/Filament.html#listing_glslpunctuallight | ||
| // spot_scale and spot_offset have been precomputed | ||
|
|
@@ -402,13 +402,6 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> { | |
| local_light_attenuation *= spot_attenuation * shadow; | ||
| } | ||
|
|
||
| // Calculate absorption (amount of light absorbed by the fog) and | ||
| // out-scattering (amount of light the fog scattered away). | ||
| let sample_attenuation = exp(-step_size_world * density * (absorption + scattering)); | ||
|
|
||
| // Process absorption and out-scattering. | ||
| background_alpha *= sample_attenuation; | ||
|
|
||
| let light_attenuation = exp(-density * bounding_radius * (absorption + scattering)); | ||
| let light_factors_per_step = fog_color * light_tint * light_attenuation * | ||
| scattering * density * step_size_world * light_intensity * exposure; | ||
|
|
@@ -418,9 +411,16 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> { | |
| let light_color_per_step = (*light).color_inverse_square_range.rgb * light_factors_per_step; | ||
|
|
||
| // Accumulate the light. | ||
| accumulated_color += light_color_per_step * local_light_attenuation * | ||
| background_alpha; | ||
| sample_color += light_color_per_step * local_light_attenuation; | ||
| } | ||
|
|
||
| // Calculate absorption (amount of light absorbed by the fog) and | ||
| // out-scattering (amount of light the fog scattered away). | ||
| let sample_attenuation = exp(-step_size_world * density * (absorption + scattering)); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks like the correct fix compared to the directional path above. Same discrete Beer–Lambert extinction per step as before, the atmosphere-side is more analytic along the actual path, which could be a separate improvement. |
||
|
|
||
| // Process absorption and out-scattering. | ||
| background_alpha *= sample_attenuation; | ||
| accumulated_color += sample_color * background_alpha; | ||
| } | ||
|
|
||
| // We're done! Return the color with alpha so it can be blended onto the | ||
|
|
@@ -444,7 +444,7 @@ fn fetch_point_shadow_without_normal(light_id: u32, frag_position: vec4<f32>, fr | |
| let offset_position = frag_position.xyz + depth_offset; | ||
|
|
||
| // similar largest-absolute-axis trick as above, but now with the offset fragment position | ||
| let frag_ls = offset_position.xyz - (*light).position_radius.xyz ; | ||
| let frag_ls = offset_position.xyz - (*light).position_radius.xyz; | ||
| let abs_position_ls = abs(frag_ls); | ||
| let major_axis_magnitude = max(abs_position_ls.x, max(abs_position_ls.y, abs_position_ls.z)); | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix has a similar theme to this PR #23406. Upon cross checking they are separate fixes, both needed. the other one is for fog AABB traversal / far-plane intersection.
this one fixes per-sample view position which affects both clustering and the shadow cascades sampled. so worth testing the shadows as well to see.