Threaded planet computation (and a few more speedups) #3847

gzotti · 2024-08-09T22:31:27Z

Description

This is a revamped version of #3794 which was too experimental and is hereby superseded. (The attempted trial of using std::transform_reduce() showed useless.)

All SolarSystem object (SSO) computations so far were run on the main thread. This branch allows splitting the SSO position loop onto several threads. Likewise, the far-to-near sorting can be split. This is useful in case of thousands of objects. Given the complexity of drawing commands, there is not much use for more than 4 additional worker threads even for 25.000 objects. The additional threads can be configured or even suppressed in GUI.

Further, using the Intel VTune profiler, I have identified a series of sometimes surprising inefficiencies addressed in the later commits of this PR.

With 27.701 SSO objects with some default setting (Preetham atmosphere, a few gridlines, stars, landscape), this improves framerate from about 6 (V24.2) to about 18.

A new feature that is possible by this is a display of symbolic markers for minor bodies. These are plotted regardless of accuracy of orbital elements, and just provide an impression of distribution of the minor bodies. Color coding is applied per pType. The colors are currently hardcoded. However, drawing those many objects, despite adding code similar to StarMgr for mass drawing, is slow. I hope another developer can optimize StelPainter so it could be re-used instead of frequent buffer alloc/delete.

Fixes #1427 (not intended, just happened...)

Screenshots (if appropriate):

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
This change requires a documentation update
Housekeeping

How Has This Been Tested?

Load many SSOs with the SSEditor. At least the first 10.000 asteroids and 1000comets.ini file. Then find new options in View/SSO and Config/Tools tabs.

Test Configuration:

Operating system: Win11 home on i7-12700H
Graphics Card: Geforce RTX 3070 Ti Laptop GPU

Will test on smaller notebook, older Win10 PC and RasPi4.

Checklist:

My code follows the code style of this project.
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (header file)
I have updated the respective chapter in the Stellarium User Guide
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

github-actions · 2024-08-09T22:32:04Z

Great PR! Please pay attention to the following items before merging:

Files matching src/**/*.cpp:

Are possibly unused includes removed?

Files matching guide/**:

Did you remember to update screenshots to match new updates?
Did you remember to grammar check in changed part of documentation?

This is an automatically generated QA checklist based on modified files.

src/core/planetsephems/jpleph.cpp

gzotti · 2024-08-10T23:04:13Z

Test on small notebook: 30fps (V24.2) --> 45fps. Quite a gain!
Test on Raspi4: Still only black window like last time I tried (i.e. black screen is unrelated). I can save screenshots, though. Sorry, I have no time to follow up on this.

src/gui/StelGuiItems.cpp

alex-w · 2024-08-14T08:27:42Z

It's unfortunately not just the long ssystem_minor.ini. Does master run on your RPi4 with Ubuntu 24.4? I only see the black screen reported earlier. Activating wayland in raspi-config makes it only worse, Stellarium fails to launch. I have not tried RaspberryOS lately, though.

Hmm... please run via CLI:
MESA_GL_VERSION_OVERRIDE=3.0 MESA_GLSL_VERSION_OVERRIDE=130 QT_QPA_PLATFORM=xcb ./stellarium --opengl-compat

gzotti · 2024-08-14T08:36:43Z

Yes, right, I have just found that again this morning, and it works.
Compiling however had shown again the occurrence of "please build with -fPIC". I don't know where to configure this. CXX_OPTS? It is required for ShowMySky, which I have disabled for this test now. @10110111 ?
Also, on Linux/GCC there are still several compiler warnings which MSVC does not show. Does any of you care about those?

10110111 · 2024-08-14T09:53:48Z

Compiling however had shown again the occurrence of "please build with -fPIC". I don't know where to configure this. CXX_OPTS?

Ideally, Qt should provide this via their CMake config. I'm not sure what I'm doing wrong.

alex-w · 2024-08-15T04:19:12Z

RaspberryOS 64bit (10k+ solar system bodies, Qt6-based builds):
[master] Solar System Observer ~ 1.2 FPS
[this branch] Solar System Observer ~ 4.2-6.8 FPS (1 extra thread)

10110111 · 2024-08-18T21:10:25Z

Do the markers enabled by the new checkbox represent all the objects added as you described here? I don't reproduce any serious reduction of performance with 0 extra threads (Core i5-8265U with built-in GPU), and for some reason I only get the green markers but not the white ones closer to the center. What do these white markers represent?

gzotti · 2024-08-18T23:37:32Z

colorMap={
				{isAsteroid,     Vec3f(0.35, 0.35, .35 )},
				{isPlutino,      Vec3f(1   , 1   , 0   )},
				{isComet,        Vec3f(0.25, 0.75, 1   )},
				{isDwarfPlanet,  Vec3f(1   , 1   , 1   )},
				{isCubewano,     Vec3f(1   , 0   , 0.8 )},
				{isSDO,          Vec3f(0.5 , 1   , 0.5 )},
				{isOCO,          Vec3f(0.75, 0.75, 1   )},
				{isSednoid,      Vec3f(0.75, 1   , 0.75)},
				{isInterstellar, Vec3f(1   , 0.25, 0.25)},
				{isUNDEFINED,    Vec3f(1   , 0   , 0   )}};

The thin white ones are the overwhelming number of "ordinary" asteroids, to be expected mostly in the main belt between Mars and Jupiter, and the Jupiter Trojans. Pure white is for the few dwarf planets. SDOs and Sednoids are green. Someone might elaborate on the actual differences, and propose better colors.

Did you load 15.000 objects or more to put a little stress to the SSO computation? (The logfile will likely show many complaints, but will give the number of loaded objects. Maybe your CPU can go very fast with 1 core, so the effect would cancel out?

10110111 · 2024-08-22T07:57:31Z

Did you load 15.000 objects or more to put a little stress to the SSO computation?

I followed your list of things to add, and now I get this in the log:

Loaded 6879 Solar System bodies from  "/home/ruslan/.stellarium/data/ssystem_minor.ini"
Solar System now has 6962 entries.

So apparently, I've missed something. How can I check what's missing?

gzotti · 2024-08-22T09:02:03Z

My final solar system had the first 10.000, second 10.000, plus the "interesting" objects. in total about 27.000 objects. Indeed the loading was terribly slow, in this time the program seemed to be unresponsive, but it was finished an
hour later. (Maybe earlier, but I did something else. ---This would be a place for QFuture and progressbar stuff, but it is a rare call after all.)

Much of the overall speed gain in this (renewed) branch was found from studying profiling diagrams. I am new to this field, but was able to track down and remove a few expensive operations. As a general pattern, if you need GETSTELMODULE in a frequently run method (update, draw, ...), make it static. If you often need to retrieve strings and there is a chance they have not changed, cache them if possible. These fixes helped a lot.

Seeing profile runs with 0 and with 4 extra threads it seems indeed much of the gain from running extra threads is lost by the synchronizing waits which are only even shown when running those threads. If somebody has an idea what to do here, it would clearly help. Still, I consistently see improved framerate with just a few extra threads.

However, now I think (also talking to colleagues) the frequent alloc/free in the many short-lived StelPainter instances may be what currently slows it down most when the object count is large. If we want a fluent SolarSystem with many objects, we should do something about the actual drawing, re-use StelPainter's internal buffers or whatever. The Planet::draw() may need to be rearranged. Maybe using a "modern" (float) Z buffer can be an option, this would also fix the orbit overdrawing.

10110111 · 2024-08-22T09:27:18Z

OK, I've redone the import of the 10000 objects, now it's saved and loads on subsequent runs too, yielding 16860 Solar System objects. My observations with different number of threads are as follows:

Thread count	Frame rate, FPS	CPU usage, %
0	5.3	100
1	5.6	110
2	5.7	113
3	5.9	116
7	5.8	115

So, something seems to be not quite working as expected. This is on Ubuntu 20.04 amd64, 4-core*HT = 8 hyperthreads Intel Core i5-8265U 1.60GHz, location set to the Solar System Observer, markers enabled, atmosphere and landscape disabled, orbits constantly displayed as in the screenshot in the OP.

gzotti · 2024-08-22T10:57:46Z

Hm, the low framerate replicates mine. You see the 10-15% gain with a few threads. Not sure why 7 is shown slightly worse than 3, but it shows that SSO computation is just a tiny fraction of the per-frame operations.

Here are the "flame graphs" from VTune made last week in a terrestrial scene without landscape, atmosphere or gridlines. Capture time was about 20 seconds. (Sorry for my shaky lines, mouse drawing is not my profession. I discovered the arrow tool only later...)

0 extra threads:

This shows the time spent in various methods stacked on top of the calling functions.
Most of the QOpen[GL*] functions shown to be called in the StelPainter sections are constructing/destructing the temporal structures.
Solar system positions make up about 20% of frame time. About 5% are the call to sort the planets by distance before drawing. A similar amount of time is spent with MinorPlanet::getEnglishName, but I think this one cannot be avoided. Time for computing magnitudes and extincted magnitudes have been largely reduced and are no longer labelled. Some frame sync (?) function is visible on the right.

4 extra threads:

The planet position computation fraction has apparently shortened, but some ugly new WaitForSingleObjectEx is visible in lower left which steals most of the gained time. I cannot say if this is caused by the necessary Mutex introduced, or whether there is some other unavoidable reason for this time loss. Mutex is IIRC now only required in the JPL file reading functions, I have to run that again with VSOP. If it's the Mutex effect, it takes a disproportional fraction of time, occurring only for 8 planets opposed to ~15.000 KeplerOrbit runs. If you have other ideas what can cause this, I am willing to learn!

10110111 · 2024-08-22T13:26:52Z

Interesting. Now I've turned off everything: atmosphere, landscape, Milky Way, stars, planets, cardinal direction markers, and here's the result:

Thread count	Frame rate, FPS	CPU usage, %
0	20.5	100
1	30.1	165
2	34	204
3	37.7	234
4	33	207
5	34.5	216
6	35.6	225
7	33.5	216

This looks suspicious. I have thousands of objects that are recomputed on every frame, and still I don't get even 300% CPU usage on a 8-hyperthread CPU. Something seems to not be parallelized correctly. Particularly, if all the objects are in the Solar System, I guess there should be some way to parallelize this much better. I'll try looking later into what's going on and maybe how to improve this.

- also clean up docs - also improve setting thread number - also reactivate parallel sort

Thanks to Ruslan!

Use config.ini:[devel]/compute_positions_algorithm (default:2): 0=old single-threaded 3-loop solution 1=first QtConcurrent::blockingMap() 3-loop solution 2=strided manual multithreading 3-loop solution 3=blocked manual multithreading 1-loop solution Solution 3 would be fastest, but has problems for moons, where position depends on the parent planet's positions which may not have been computed yet.

gzotti · 2024-09-15T14:29:07Z

No idea why, but this rebase -i was difficult. Commit "fix crash" should have gone just after @10110111 's last commit, but it wouldn't go. We can leave it as-is as document that there was a problem.

alex-w

Great work!

Earth axis should be fixed...

10110111 · 2024-09-15T16:42:54Z

Maybe this shouldn't go into 24.3 so late with this potential for breakages.

alex-w · 2024-09-15T16:51:49Z

Maybe this shouldn't go into 24.3 so late with this potential for breakages.

I fear this patch should be postponed to version 24.4

gzotti · 2024-09-15T20:02:37Z

I had not remembered that Ruslan had integrated the final call into his loop. Fixed that in the conservative paths. Now I can run RemoteSync'ed V24.2 and this. 135 vs 73 fps. As far as I can see, result with setting "2" are identical.

github-actions · 2024-09-16T14:19:43Z

Hello @gzotti!

Please check the fresh version (development snapshot) of Stellarium:
https://github.com/Stellarium/stellarium-data/releases/tag/weekly-snapshot

github-actions · 2024-09-22T14:46:58Z

Hello @gzotti!

Please check the latest stable version of Stellarium:
https://github.com/Stellarium/stellarium/releases/latest

gzotti added enhancement Improve existing functionality feature Entirely new feature importance: medium A bit annoying, minor miscalculation, but no crash labels Aug 9, 2024

gzotti added this to the 24.3 milestone Aug 9, 2024

gzotti self-assigned this Aug 9, 2024

github-actions bot requested review from 10110111 and alex-w August 9, 2024 22:31

gzotti mentioned this pull request Aug 9, 2024

Parallelize ephemeris computation. #3794

Closed

14 tasks

This comment was marked as resolved.

Sign in to view

10110111 reviewed Aug 10, 2024

View reviewed changes

src/core/planetsephems/jpleph.cpp Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

10110111 reviewed Aug 11, 2024

View reviewed changes

src/gui/StelGuiItems.cpp Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

gzotti added 7 commits September 15, 2024 16:20

Reactivate blockwise manual thread assignment

a4dcab4

- also clean up docs - also improve setting thread number - also reactivate parallel sort

Improve marker mag threshold setting

248ba99

Fix missing old-fashioned storing of new settings

5700684

Rename button and action for minor body markers

93a011e

SUG: update details, new GUI figures

b69962f

Fix crash

edf0407

Thanks to Ruslan!

gzotti force-pushed the threaded-ephems branch from ffe6252 to 9c8a757 Compare September 15, 2024 14:23

gzotti marked this pull request as ready for review September 15, 2024 14:29

github-actions bot requested review from 10110111 and alex-w September 15, 2024 14:29

alex-w previously approved these changes Sep 15, 2024

View reviewed changes

This comment was marked as resolved.

Sign in to view

alex-w self-requested a review September 15, 2024 15:08

Restore transformation matrix application in the conservative solutions

320b184

alex-w approved these changes Sep 16, 2024

View reviewed changes

gzotti merged commit 2411237 into master Sep 16, 2024
27 of 29 checks passed

gzotti deleted the threaded-ephems branch September 16, 2024 10:09

alex-w added the state: published The fix has been published for testing in weekly binary package label Sep 16, 2024

alex-w removed the state: published The fix has been published for testing in weekly binary package label Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threaded planet computation (and a few more speedups) #3847

Threaded planet computation (and a few more speedups) #3847

gzotti commented Aug 9, 2024

github-actions bot commented Aug 9, 2024 •

edited by alex-w

Loading

This comment was marked as resolved.

This comment was marked as resolved.

gzotti commented Aug 10, 2024

This comment was marked as resolved.

This comment was marked as resolved.

alex-w commented Aug 14, 2024

gzotti commented Aug 14, 2024

10110111 commented Aug 14, 2024

alex-w commented Aug 15, 2024

10110111 commented Aug 18, 2024

gzotti commented Aug 18, 2024

10110111 commented Aug 22, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

gzotti commented Aug 22, 2024

10110111 commented Aug 22, 2024

gzotti commented Aug 22, 2024

10110111 commented Aug 22, 2024

gzotti commented Sep 15, 2024

alex-w left a comment

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

10110111 commented Sep 15, 2024

alex-w commented Sep 15, 2024 •

edited

Loading

gzotti commented Sep 15, 2024

github-actions bot commented Sep 16, 2024

github-actions bot commented Sep 22, 2024

Threaded planet computation (and a few more speedups) #3847

Threaded planet computation (and a few more speedups) #3847

Conversation

gzotti commented Aug 9, 2024

Description

Screenshots (if appropriate):

Type of change

How Has This Been Tested?

Checklist:

github-actions bot commented Aug 9, 2024 • edited by alex-w Loading

This comment was marked as resolved.

This comment was marked as resolved.

gzotti commented Aug 10, 2024

This comment was marked as resolved.

This comment was marked as resolved.

alex-w commented Aug 14, 2024

gzotti commented Aug 14, 2024

10110111 commented Aug 14, 2024

alex-w commented Aug 15, 2024

10110111 commented Aug 18, 2024

gzotti commented Aug 18, 2024

10110111 commented Aug 22, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

gzotti commented Aug 22, 2024

10110111 commented Aug 22, 2024

gzotti commented Aug 22, 2024

10110111 commented Aug 22, 2024

gzotti commented Sep 15, 2024

alex-w left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

10110111 commented Sep 15, 2024

alex-w commented Sep 15, 2024 • edited Loading

gzotti commented Sep 15, 2024

github-actions bot commented Sep 16, 2024

github-actions bot commented Sep 22, 2024

github-actions bot commented Aug 9, 2024 •

edited by alex-w

Loading

alex-w commented Sep 15, 2024 •

edited

Loading