Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 36 additions & 36 deletions crates/bevy_pbr/src/volumetric_fog/volumetric_fog.wgsl
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> {

// Calculate where we are in the ray.
let P_world = Ro_world + Rd_world * f32(step) * step_size_world;
let P_view = Rd_view * f32(step) * step_size_world;
let P_view = view_start_pos + Rd_view * f32(step) * step_size_world;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix has a similar theme to this PR #23406. Upon cross checking they are separate fixes, both needed. the other one is for fog AABB traversal / far-plane intersection.
this one fixes per-sample view position which affects both clustering and the shadow cascades sampled. so worth testing the shadows as well to see.


var density = density_factor;
#ifdef DENSITY_TEXTURE
Expand Down Expand Up @@ -336,35 +336,36 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> {
}

// Point lights and Spot lights
let view_z = view_start_pos.z;
let is_orthographic = view.clip_from_view[3].w == 1.0;
let cluster_index = clustering::view_fragment_cluster_index(frag_coord.xy, view_z, is_orthographic);
var clusterable_object_index_ranges =
clustering::unpack_clusterable_object_index_ranges(cluster_index);
for (var i: u32 = clusterable_object_index_ranges.first_point_light_index_offset;
i < clusterable_object_index_ranges.first_reflection_probe_index_offset;
i = i + 1u) {
let light_id = clustering::get_clusterable_object_id(i);
let light = &clustered_lights.data[light_id];
if (((*light).flags & POINT_LIGHT_FLAGS_VOLUMETRIC_BIT) == 0) {
continue;
}

// Reset `background_alpha` for a new raymarch.
background_alpha = 1.0;
// Reset `background_alpha` for a new raymarch.
background_alpha = 1.0;

// Start raymarching.
for (var step = 0u; step < step_count; step += 1u) {
// As an optimization, break if we've gotten too dark.
if (background_alpha < 0.001) {
break;
}
// Start raymarching.
for (var step = 0u; step < step_count; step += 1u) {
// As an optimization, break if we've gotten too dark.
if (background_alpha < 0.001) {
break;
}

// Calculate where we are in the ray.
let P_world = Ro_world + Rd_world * f32(step) * step_size_world;
let P_view = Rd_view * f32(step) * step_size_world;
// Calculate where we are in the ray.
let P_world = Ro_world + Rd_world * f32(step) * step_size_world;
let P_view = view_start_pos + Rd_view * f32(step) * step_size_world;

var density = density_factor;
var density = density_factor;
var sample_color = vec3(0.0);

let cluster_index = clustering::view_fragment_cluster_index(frag_coord.xy, P_view.z, is_orthographic);
var clusterable_object_index_ranges = clustering::unpack_clusterable_object_index_ranges(cluster_index);
for (var i: u32 = clusterable_object_index_ranges.first_point_light_index_offset;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if flipping these loops around between for each cluster and for each step introduces any performance regression. while it's definitely the right approach, this requires recomputing the cluster each step meaning more work per fragment. I wonder how/if we can save this. Optimization could be left for later anyway in a later PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking at this! You are right, there is likely some overhead related to finding the right cluster slice for every sample. I think a relatively easy optimization would be to cache the cluster-able object indices, computing them only once per cluster instead of for every sample. Then I re-compute the cluster index + object indices as soon as the ray exits the current cluster's z-boundary. I can look into it and try to add it to this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and came up with a solution that avoids the (expensive) view -> cluster lookups altogether, but it requires some pre-computation:

  • One time: pre-compute a view_z_planes[] array storing the min view_z (entry plane) for every cluster z_slice. This buffer only needs to be updated if the projection matrix changes.
  • Bind the new view_z_planes[] array as a uniform / storage buffer.

Inside the volumetric_fog fragment shader:

  • Compute x and y cluster indices once per fragment (trivial).
  • Compute initial z_slice (usually 0, except when the camera is outside of the volume).
  • For every sample, check if we intersected one or more z_plane(s) if (P_view.z > view_z_planes[z_slice + 1]) (exit plane)? If yes:
    • Increase z_slice
    • Re-compute cluster index from x,y and z indices using existing function: fragment_cluster_index(p: vec3<u32>, cluster_dimensions: vec4<u32>) -> u32
    • Get the new light indices using updated cluster_index.

This way we only increase the z_slice index as we march, avoiding the expensive logarithmic view_z -> z_slice calculations completely.

Let me know what you think. Also if we want to go ahead with it, should it be part of this PR or a new one?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this - I think it's best if we keep this change in a separate PR and keep the scope of this one small. Since it's a visual bugfix first I'm going to approve this one knowing theres a way forward for performance.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the volumetric fog example on mac I see these numbers (doesn't have many lights)
this branch: 7.21ms
main: 7.37ms

for a proper performance comparison I suggest a stress test volumetric fog with many lights in this case 32:

I see a clear performance regression, took 2 measurements per branch

this branch: 192.77ms 186.10ms
main: 160ms 161.77ms

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted a branch might help with your testing https://github.com/mate-h/bevy/tree/m/volume-many-lights

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Ill have a look at it.

i < clusterable_object_index_ranges.first_reflection_probe_index_offset;
i = i + 1u)
{
let light_id = clustering::get_clusterable_object_id(i);
let light = &clustered_lights.data[light_id];
if (((*light).flags & POINT_LIGHT_FLAGS_VOLUMETRIC_BIT) == 0) {
continue;
}

let light_to_frag = (*light).position_radius.xyz - P_world;
let V = Rd_world;
Expand All @@ -386,7 +387,6 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> {
if ((*light).flags & POINT_LIGHT_FLAGS_SPOT_LIGHT_Y_NEGATIVE) != 0u {
spot_dir.y = -spot_dir.y;
}
let light_to_frag = (*light).position_radius.xyz - P_world;

// calculate attenuation based on filament formula https://google.github.io/filament/Filament.html#listing_glslpunctuallight
// spot_scale and spot_offset have been precomputed
Expand All @@ -402,13 +402,6 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> {
local_light_attenuation *= spot_attenuation * shadow;
}

// Calculate absorption (amount of light absorbed by the fog) and
// out-scattering (amount of light the fog scattered away).
let sample_attenuation = exp(-step_size_world * density * (absorption + scattering));

// Process absorption and out-scattering.
background_alpha *= sample_attenuation;

let light_attenuation = exp(-density * bounding_radius * (absorption + scattering));
let light_factors_per_step = fog_color * light_tint * light_attenuation *
scattering * density * step_size_world * light_intensity * exposure;
Expand All @@ -418,9 +411,16 @@ fn fragment(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> {
let light_color_per_step = (*light).color_inverse_square_range.rgb * light_factors_per_step;

// Accumulate the light.
accumulated_color += light_color_per_step * local_light_attenuation *
background_alpha;
sample_color += light_color_per_step * local_light_attenuation;
}

// Calculate absorption (amount of light absorbed by the fog) and
// out-scattering (amount of light the fog scattered away).
let sample_attenuation = exp(-step_size_world * density * (absorption + scattering));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the correct fix compared to the directional path above. Same discrete Beer–Lambert extinction per step as before, the atmosphere-side is more analytic along the actual path, which could be a separate improvement.


// Process absorption and out-scattering.
background_alpha *= sample_attenuation;
accumulated_color += sample_color * background_alpha;
}

// We're done! Return the color with alpha so it can be blended onto the
Expand All @@ -444,7 +444,7 @@ fn fetch_point_shadow_without_normal(light_id: u32, frag_position: vec4<f32>, fr
let offset_position = frag_position.xyz + depth_offset;

// similar largest-absolute-axis trick as above, but now with the offset fragment position
let frag_ls = offset_position.xyz - (*light).position_radius.xyz ;
let frag_ls = offset_position.xyz - (*light).position_radius.xyz;
let abs_position_ls = abs(frag_ls);
let major_axis_magnitude = max(abs_position_ls.x, max(abs_position_ls.y, abs_position_ls.z));

Expand Down