Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual shadowmap #681

Open
5 of 10 tasks
Try opened this issue Sep 5, 2024 · 19 comments
Open
5 of 10 tasks

Virtual shadowmap #681

Try opened this issue Sep 5, 2024 · 19 comments

Comments

@Try
Copy link
Owner

Try commented Sep 5, 2024

In general, I'm looking for a new, more sophisticated, solution to replace projective shadowmap.

Goals

  • scaling with resolution (faster on low-res rendering)
  • single solution for any GPU (unlike ray-tracing)
  • high (pixel-perfect) quality

Progress

  • DirectLighting (sun)
    • Planetary occlusion, for sun
  • Fog
    • Optimized fog
  • Point lights
  • Refactor/cleanup: rename *cluster_task into something readale
  • Overall optimization
    • small-core (512 threadgroup) support
interesting to experiment:
  • software-rt (with meshelets binned by pages)
  • software pre-trace, to complement culling. Similar to RT, but without ray-triangle test.

Known solutions

Not a lot actually...
nanite (page 119): https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf
assassins creed (page 55): https://advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf
https://ktstephano.github.io/rendering/stratusgfx/svsm
https://www.cse.chalmers.se/~uffe/ClusteredWithShadows.pdf
https://www.gamedevs.org/uploads/efficient-shadows-from-many-lights.pdf

not vsm, but close enough in concept:
http://lukaskalbertodt.github.io/2023/11/18/tiled-soft-shadow-volumes.html

Very first WIP

изображение
изображение

Some page-data to illustrate
изображение

Initial implementation

Decide to start with common parameters for now:

  • only sun-light for now
  • 4k*4k page atlas
  • 128x128 page size: up to 1024 pages
  • Clipmaps + page table 64x64x32
  • Meshlets are duplicated, as many time as many pages they overlap

Current considerations:

  1. Cluster culling: cull all clusters versus all pages is expensive
    1.1 Coarse culling not possible - need to output exact pageId, for visible meshlets
    1.2 Output size is not deterministic and no good way to react to out-of-memory
  2. Cull versus clip-map (HiZ like) is easier, than versus each page individually
    2.1 Won't be able to use hw-rasterizer to output data (need to use image-atomics + image-less rendering)
    2.2 Image-less rendering limited by maxViewportDimensions = 4k
  3. Software renderer?!
    3.1 Immediate one still requires atomics, and wont be better than render-pass based one
    3.2 Tile-base can be an interesting take, but not valid without bindless
  4. Requires some complementary solution to work with volumetrics
Try added a commit that referenced this issue Sep 5, 2024
@YALdysse
Copy link

YALdysse commented Sep 6, 2024

@Try, Does this mean OpenGothic will work on weaker graphics cards ?

@Try
Copy link
Owner Author

Try commented Sep 6, 2024

@YALdysse by 'weaker' graphics card you mean weaker than what?)

@YALdysse
Copy link

YALdysse commented Sep 6, 2024

@YALdysse by 'weaker' graphics card you mean weaker than what?)

OpenGothic consumes an average of 95% of my graphics card - AMD Radeon RX Vega 7 (CPU - AMD Ryzen 5500U).

@YALdysse
Copy link

YALdysse commented Sep 6, 2024

OpenGothic consumes an average of 95% of my graphics card - AMD Radeon RX Vega 7 (CPU - AMD Ryzen 5500U).

Unfortunately, we are not talking about stable 60 FPS, but I can play.

@Try
Copy link
Owner Author

Try commented Sep 6, 2024

OK, I would like to avoid setting any expectations, as virtual-shadow is pretty-much experimental tech.

AMD Radeon RX Vega 7

With Vega there are 2 major issues:

  • It's way to different to any other gpu and hard to reason about what is going on in this black box
  • it have no mesh-shader. So engine have to fallback to non-indexed draws, what about 3x more expesive

@Try
Copy link
Owner Author

Try commented Sep 9, 2024

Some numbers [RTX3070]:

Protective shadowmap (current solution):

  • 7836 meshlets on tested scene (see starting post)
  • 0.16ms

Virtual shadow

  • 408 pages
  • 43392 meshlets (shorten shadow range; meshlets are duplicated for each page)
  • Meshlets in page: 111 = 27 of paladin with gear + landscape
  • Render time of single page: 0.12ms
    • most of rendering time is PROP (pre-rasterization work in HW)

Numbers for one of smaller pages, close to the camera:
изображение

Try added a commit that referenced this issue Sep 10, 2024
@Try Try closed this as completed in c8c7ddc Sep 10, 2024
Try added a commit that referenced this issue Sep 10, 2024
@Try Try reopened this Sep 10, 2024
@Try
Copy link
Owner Author

Try commented Sep 10, 2024

clip-distance helps with fragment workload, bringing adequate FPS (still slower than regular SM)

изображение

@Try
Copy link
Owner Author

Try commented Sep 10, 2024

Some rendering examples:
изображение
изображение
изображение

@Try
Copy link
Owner Author

Try commented Sep 10, 2024

City:
изображение

Ship:
изображение

Try added a commit that referenced this issue Sep 14, 2024
Try added a commit that referenced this issue Sep 15, 2024
@Try
Copy link
Owner Author

Try commented Sep 15, 2024

Even with large pages look is not good:

vsm.header.pageCount	604
vsm.header.meshletCount	91656 // total to draw for VSM
vsm.header.counterM	10787 // total amount of unique meshlets for VSM, meaning duplication factor is ~x9
vsm.header.counterV	 8733 // meshlets that a drawn in case of good-old shadowmap

non-empty shadow mips:  8

Testing area:
изображение

Non related, just cool screens:
изображение
изображение
изображение

@Try
Copy link
Owner Author

Try commented Sep 15, 2024

Testing is now enabled by command line: -vsm 1

Try added a commit that referenced this issue Sep 18, 2024
Try added a commit that referenced this issue Sep 18, 2024
Try added a commit that referenced this issue Sep 21, 2024
Try added a commit that referenced this issue Sep 22, 2024
Try added a commit that referenced this issue Sep 24, 2024
Try added a commit that referenced this issue Sep 26, 2024
Try added a commit that referenced this issue Sep 30, 2024
Try added a commit that referenced this issue Oct 1, 2024
@Try
Copy link
Owner Author

Try commented Oct 1, 2024

some data on culling:

// baseline
vsm.header.pageCount	546
vsm.header.meshletCount	38316
vsm.header.counterM	13753
vsm.header.counterV	8733

// cull dummy tiles in larger page
vsm.header.pageCount	560
vsm.header.meshletCount	34514
vsm.header.counterM	0
vsm.header.counterV	8733

// hiz
vsm.header.pageCount	558
vsm.header.meshletCount	29547
vsm.header.counterM	0
vsm.header.counterV	8733

About 23% reduction in meshlet count. Now runtime is about 1ms for rendering.

vsm.header.counterV 8733

Amount of meshlets for protective shadow-map. So we still using about 4x meshlets, relative to projective sm

Try added a commit that referenced this issue Oct 1, 2024
@Try
Copy link
Owner Author

Try commented Oct 2, 2024

fog + vsm doesn't quite work (non-resident pages):
изображение

Try added a commit that referenced this issue Oct 2, 2024
@Try
Copy link
Owner Author

Try commented Oct 4, 2024

experimenting with vsm-fog:
изображение

Try added a commit that referenced this issue Oct 4, 2024
Try added a commit that referenced this issue Oct 5, 2024
Try added a commit that referenced this issue Oct 6, 2024
Try added a commit that referenced this issue Oct 8, 2024
@Try Try pinned this issue Oct 10, 2024
Try added a commit that referenced this issue Oct 10, 2024
Try added a commit that referenced this issue Oct 13, 2024
Try added a commit that referenced this issue Oct 15, 2024
@Try
Copy link
Owner Author

Try commented Oct 15, 2024

Some frame case-study on draw-amount.
(note: I do prefer to measure amount of meshlets, instead of draw-time as it doesn't depend on gpu-model/temperature/etc)

projectiveSM: 8.7k
virtualSM: 20.6k

Now breakdown per clipmap:

mip:  all, land, obj
 0  1041   590   455
 1  1516   784   732
 2  1263   603   660
 -
 3  2152   996  1156
 4  2516  1238  1282
 5  4731  1016  3715
 6  3900   865  3035
 -
 7  3148   740  2408
 8  156     82    78

Chart Title(1)

Fog adds roughly another 5k to mip5 (and a bit to others)

Try added a commit that referenced this issue Oct 20, 2024
Try added a commit that referenced this issue Nov 4, 2024
@Try
Copy link
Owner Author

Try commented Nov 4, 2024

vsm: epipolar fog in progress

fog is back to reasonable timings: 1.51ms -> 0.57ms. However comes at cost of some flicker, that need to be worked on

@Try Try mentioned this issue Nov 10, 2024
@Try
Copy link
Owner Author

Try commented Nov 10, 2024

L'Hiver. Nice details on individual rocks, ~30% gpu load - similar to vanilla game.

изображение

Try added a commit that referenced this issue Nov 10, 2024
* split page listing and page alocation

#681

* fixup

* 8k vsm pool

* build

* new fog pages

#681

* fixup vms cluster culling

#681

* some tuning for vsm fog

#681

* Update vsm_cluster_task.comp

* epipolar fog initial

* vsm: epipolar fog in progress

#681

* fixup

* fixup

* fixup

* epipolar fog

* CI

* CI
@Try
Copy link
Owner Author

Try commented Nov 25, 2024

Some progress on volumetric-fog. I wouldn't call it solved, yet pretty good already:
изображение

изображение

Undersampling artifacts at shafts far from camera:
изображение

Try added a commit that referenced this issue Nov 25, 2024
@Try
Copy link
Owner Author

Try commented Dec 5, 2024

Omnidirectional lights shadowing in progress
изображение
изображение

@Try Try mentioned this issue Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants