Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threaded planet computation (and a few more speedups) #3847

Merged
merged 32 commits into from
Sep 16, 2024
Merged

Conversation

gzotti
Copy link
Member

@gzotti gzotti commented Aug 9, 2024

Description

This is a revamped version of #3794 which was too experimental and is hereby superseded. (The attempted trial of using std::transform_reduce() showed useless.)

All SolarSystem object (SSO) computations so far were run on the main thread. This branch allows splitting the SSO position loop onto several threads. Likewise, the far-to-near sorting can be split. This is useful in case of thousands of objects. Given the complexity of drawing commands, there is not much use for more than 4 additional worker threads even for 25.000 objects. The additional threads can be configured or even suppressed in GUI.

Further, using the Intel VTune profiler, I have identified a series of sometimes surprising inefficiencies addressed in the later commits of this PR.

With 27.701 SSO objects with some default setting (Preetham atmosphere, a few gridlines, stars, landscape), this improves framerate from about 6 (V24.2) to about 18.

A new feature that is possible by this is a display of symbolic markers for minor bodies. These are plotted regardless of accuracy of orbital elements, and just provide an impression of distribution of the minor bodies. Color coding is applied per pType. The colors are currently hardcoded. However, drawing those many objects, despite adding code similar to StarMgr for mass drawing, is slow. I hope another developer can optimize StelPainter so it could be re-used instead of frequent buffer alloc/delete.

Fixes #1427 (not intended, just happened...)

Screenshots (if appropriate):

stellarium-003

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • This change requires a documentation update
  • Housekeeping

How Has This Been Tested?

Load many SSOs with the SSEditor. At least the first 10.000 asteroids and 1000comets.ini file. Then find new options in View/SSO and Config/Tools tabs.

Test Configuration:

  • Operating system: Win11 home on i7-12700H
  • Graphics Card: Geforce RTX 3070 Ti Laptop GPU

Will test on smaller notebook, older Win10 PC and RasPi4.

Checklist:

  • My code follows the code style of this project.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (header file)
  • I have updated the respective chapter in the Stellarium User Guide
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@gzotti gzotti added enhancement Improve existing functionality feature Entirely new feature importance: medium A bit annoying, minor miscalculation, but no crash labels Aug 9, 2024
@gzotti gzotti added this to the 24.3 milestone Aug 9, 2024
@gzotti gzotti self-assigned this Aug 9, 2024
Copy link

github-actions bot commented Aug 9, 2024

Great PR! Please pay attention to the following items before merging:

Files matching src/**/*.cpp:

  • Are possibly unused includes removed?

Files matching guide/**:

  • Did you remember to update screenshots to match new updates?
  • Did you remember to grammar check in changed part of documentation?

This is an automatically generated QA checklist based on modified files.

@gzotti gzotti mentioned this pull request Aug 9, 2024
14 tasks
10110111

This comment was marked as resolved.

@gzotti

This comment was marked as resolved.

@gzotti
Copy link
Member Author

gzotti commented Aug 10, 2024

Test on small notebook: 30fps (V24.2) --> 45fps. Quite a gain!
Test on Raspi4: Still only black window like last time I tried (i.e. black screen is unrelated). I can save screenshots, though. Sorry, I have no time to follow up on this.

src/gui/StelGuiItems.cpp Outdated Show resolved Hide resolved
@alex-w

This comment was marked as resolved.

@gzotti

This comment was marked as resolved.

@alex-w
Copy link
Member

alex-w commented Aug 14, 2024

It's unfortunately not just the long ssystem_minor.ini. Does master run on your RPi4 with Ubuntu 24.4? I only see the black screen reported earlier. Activating wayland in raspi-config makes it only worse, Stellarium fails to launch. I have not tried RaspberryOS lately, though.

Hmm... please run via CLI:
MESA_GL_VERSION_OVERRIDE=3.0 MESA_GLSL_VERSION_OVERRIDE=130 QT_QPA_PLATFORM=xcb ./stellarium --opengl-compat

@gzotti
Copy link
Member Author

gzotti commented Aug 14, 2024

Yes, right, I have just found that again this morning, and it works.
Compiling however had shown again the occurrence of "please build with -fPIC". I don't know where to configure this. CXX_OPTS? It is required for ShowMySky, which I have disabled for this test now. @10110111 ?
Also, on Linux/GCC there are still several compiler warnings which MSVC does not show. Does any of you care about those?

@10110111
Copy link
Contributor

Compiling however had shown again the occurrence of "please build with -fPIC". I don't know where to configure this. CXX_OPTS?

Ideally, Qt should provide this via their CMake config. I'm not sure what I'm doing wrong.

@alex-w
Copy link
Member

alex-w commented Aug 15, 2024

RaspberryOS 64bit (10k+ solar system bodies, Qt6-based builds):
[master] Solar System Observer ~ 1.2 FPS
[this branch] Solar System Observer ~ 4.2-6.8 FPS (1 extra thread)

@10110111
Copy link
Contributor

Do the markers enabled by the new checkbox represent all the objects added as you described here? I don't reproduce any serious reduction of performance with 0 extra threads (Core i5-8265U with built-in GPU), and for some reason I only get the green markers but not the white ones closer to the center. What do these white markers represent?

@gzotti
Copy link
Member Author

gzotti commented Aug 18, 2024

colorMap={
				{isAsteroid,     Vec3f(0.35, 0.35, .35 )},
				{isPlutino,      Vec3f(1   , 1   , 0   )},
				{isComet,        Vec3f(0.25, 0.75, 1   )},
				{isDwarfPlanet,  Vec3f(1   , 1   , 1   )},
				{isCubewano,     Vec3f(1   , 0   , 0.8 )},
				{isSDO,          Vec3f(0.5 , 1   , 0.5 )},
				{isOCO,          Vec3f(0.75, 0.75, 1   )},
				{isSednoid,      Vec3f(0.75, 1   , 0.75)},
				{isInterstellar, Vec3f(1   , 0.25, 0.25)},
				{isUNDEFINED,    Vec3f(1   , 0   , 0   )}};

The thin white ones are the overwhelming number of "ordinary" asteroids, to be expected mostly in the main belt between Mars and Jupiter, and the Jupiter Trojans. Pure white is for the few dwarf planets. SDOs and Sednoids are green. Someone might elaborate on the actual differences, and propose better colors.

Did you load 15.000 objects or more to put a little stress to the SSO computation? (The logfile will likely show many complaints, but will give the number of loaded objects. Maybe your CPU can go very fast with 1 core, so the effect would cancel out?

@10110111
Copy link
Contributor

Did you load 15.000 objects or more to put a little stress to the SSO computation?

I followed your list of things to add, and now I get this in the log:

Loaded 6879 Solar System bodies from  "/home/ruslan/.stellarium/data/ssystem_minor.ini"
Solar System now has 6962 entries.

So apparently, I've missed something. How can I check what's missing?

@alex-w

This comment was marked as resolved.

@10110111

This comment was marked as resolved.

@alex-w

This comment was marked as resolved.

@gzotti
Copy link
Member Author

gzotti commented Aug 22, 2024

My final solar system had the first 10.000, second 10.000, plus the "interesting" objects. in total about 27.000 objects. Indeed the loading was terribly slow, in this time the program seemed to be unresponsive, but it was finished an
hour later. (Maybe earlier, but I did something else. ---This would be a place for QFuture and progressbar stuff, but it is a rare call after all.)

Much of the overall speed gain in this (renewed) branch was found from studying profiling diagrams. I am new to this field, but was able to track down and remove a few expensive operations. As a general pattern, if you need GETSTELMODULE in a frequently run method (update, draw, ...), make it static. If you often need to retrieve strings and there is a chance they have not changed, cache them if possible. These fixes helped a lot.

Seeing profile runs with 0 and with 4 extra threads it seems indeed much of the gain from running extra threads is lost by the synchronizing waits which are only even shown when running those threads. If somebody has an idea what to do here, it would clearly help. Still, I consistently see improved framerate with just a few extra threads.

However, now I think (also talking to colleagues) the frequent alloc/free in the many short-lived StelPainter instances may be what currently slows it down most when the object count is large. If we want a fluent SolarSystem with many objects, we should do something about the actual drawing, re-use StelPainter's internal buffers or whatever. The Planet::draw() may need to be rearranged. Maybe using a "modern" (float) Z buffer can be an option, this would also fix the orbit overdrawing.

@10110111
Copy link
Contributor

OK, I've redone the import of the 10000 objects, now it's saved and loads on subsequent runs too, yielding 16860 Solar System objects. My observations with different number of threads are as follows:

Thread count Frame rate, FPS CPU usage, %
0 5.3 100
1 5.6 110
2 5.7 113
3 5.9 116
7 5.8 115

So, something seems to be not quite working as expected. This is on Ubuntu 20.04 amd64, 4-core*HT = 8 hyperthreads Intel Core i5-8265U 1.60GHz, location set to the Solar System Observer, markers enabled, atmosphere and landscape disabled, orbits constantly displayed as in the screenshot in the OP.

@gzotti
Copy link
Member Author

gzotti commented Aug 22, 2024

Hm, the low framerate replicates mine. You see the 10-15% gain with a few threads. Not sure why 7 is shown slightly worse than 3, but it shows that SSO computation is just a tiny fraction of the per-frame operations.

Here are the "flame graphs" from VTune made last week in a terrestrial scene without landscape, atmosphere or gridlines. Capture time was about 20 seconds. (Sorry for my shaky lines, mouse drawing is not my profession. I discovered the arrow tool only later...)

0 extra threads:

2024-08-22 12_16_16-Intel VTune Profiler
This shows the time spent in various methods stacked on top of the calling functions.
Most of the QOpen[GL*] functions shown to be called in the StelPainter sections are constructing/destructing the temporal structures.
Solar system positions make up about 20% of frame time. About 5% are the call to sort the planets by distance before drawing. A similar amount of time is spent with MinorPlanet::getEnglishName, but I think this one cannot be avoided. Time for computing magnitudes and extincted magnitudes have been largely reduced and are no longer labelled. Some frame sync (?) function is visible on the right.

4 extra threads:
2024-08-22 12_01_39-Intel VTune Profiler
The planet position computation fraction has apparently shortened, but some ugly new WaitForSingleObjectEx is visible in lower left which steals most of the gained time. I cannot say if this is caused by the necessary Mutex introduced, or whether there is some other unavoidable reason for this time loss. Mutex is IIRC now only required in the JPL file reading functions, I have to run that again with VSOP. If it's the Mutex effect, it takes a disproportional fraction of time, occurring only for 8 planets opposed to ~15.000 KeplerOrbit runs. If you have other ideas what can cause this, I am willing to learn!

@10110111
Copy link
Contributor

Interesting. Now I've turned off everything: atmosphere, landscape, Milky Way, stars, planets, cardinal direction markers, and here's the result:

Thread count Frame rate, FPS CPU usage, %
0 20.5 100
1 30.1 165
2 34 204
3 37.7 234
4 33 207
5 34.5 216
6 35.6 225
7 33.5 216

This looks suspicious. I have thousands of objects that are recomputed on every frame, and still I don't get even 300% CPU usage on a 8-hyperthread CPU. Something seems to not be parallelized correctly. Particularly, if all the objects are in the Solar System, I guess there should be some way to parallelize this much better. I'll try looking later into what's going on and maybe how to improve this.

- also clean up docs
- also improve setting thread number
- also reactivate parallel sort
Thanks to Ruslan!
Use config.ini:[devel]/compute_positions_algorithm (default:2):

0=old single-threaded 3-loop solution
1=first QtConcurrent::blockingMap() 3-loop solution
2=strided manual multithreading 3-loop solution
3=blocked manual multithreading 1-loop solution

Solution 3 would be fastest, but has problems for moons,
where position depends on the parent planet's positions
which may not have been computed yet.
@gzotti
Copy link
Member Author

gzotti commented Sep 15, 2024

No idea why, but this rebase -i was difficult. Commit "fix crash" should have gone just after @10110111 's last commit, but it wouldn't go. We can leave it as-is as document that there was a problem.

@gzotti gzotti marked this pull request as ready for review September 15, 2024 14:29
alex-w
alex-w previously approved these changes Sep 15, 2024
Copy link
Member

@alex-w alex-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@gzotti

This comment was marked as resolved.

@alex-w

This comment was marked as resolved.

@gzotti

This comment was marked as resolved.

@alex-w

This comment was marked as resolved.

@alex-w alex-w dismissed their stale review September 15, 2024 15:08

Earth axis should be fixed...

@alex-w alex-w self-requested a review September 15, 2024 15:08
@10110111
Copy link
Contributor

Maybe this shouldn't go into 24.3 so late with this potential for breakages.

@alex-w
Copy link
Member

alex-w commented Sep 15, 2024

Maybe this shouldn't go into 24.3 so late with this potential for breakages.

I fear this patch should be postponed to version 24.4

@gzotti
Copy link
Member Author

gzotti commented Sep 15, 2024

I had not remembered that Ruslan had integrated the final call into his loop. Fixed that in the conservative paths. Now I can run RemoteSync'ed V24.2 and this. 135 vs 73 fps. As far as I can see, result with setting "2" are identical.

@gzotti gzotti merged commit 2411237 into master Sep 16, 2024
27 of 29 checks passed
@gzotti gzotti deleted the threaded-ephems branch September 16, 2024 10:09
@alex-w alex-w added the state: published The fix has been published for testing in weekly binary package label Sep 16, 2024
Copy link

Hello @gzotti!

Please check the fresh version (development snapshot) of Stellarium:
https://github.com/Stellarium/stellarium-data/releases/tag/weekly-snapshot

@alex-w alex-w removed the state: published The fix has been published for testing in weekly binary package label Sep 22, 2024
Copy link

Hello @gzotti!

Please check the latest stable version of Stellarium:
https://github.com/Stellarium/stellarium/releases/latest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve existing functionality feature Entirely new feature importance: medium A bit annoying, minor miscalculation, but no crash
Development

Successfully merging this pull request may close these issues.

show comet locations regardless of magnitude
4 participants