Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update get_callback_id, register_callback #97

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mprather
Copy link
Contributor

This commit reflects 2 suggestions that are very closely related to dotnet/aspnetcore#40832. If the signatures within the callback_manager class are currently fixed, then we should consider the following changes:

  1. Remove the use of stringstream in get_callback_id. Instead, use std::to_string.
  2. Use unordered_map's emplace method instead of the insert method in register_callback.

I ran a very simplified mockup of the methods around get_callback_id and register_callback. The tests were run across ARM32 and x64, Windows and Linux, and using multiple language standards. The table also includes testing 3 different ways the callback map can be updated. This was run on my office hardware with compilations that represent real-world use cases.

Here's a table of the results. Note the make-pair column represents using the insert method. The last 6 columns are elapsed time in seconds.

run Platform arch compiler standard iterations to_string
+
make-pair
to_string
+
operator
to_string
+
emplace
stringstream
+
make-pair
stringstream
+
operator
stringstream
+
emplace
a rpi arm32 gcc 8 11 1,000,000 2.319322 2.801457 2.822827 5.603574 5.524647 5.537898
b rpi arm32 gcc 8 17 1,000,000 2.398186 2.884507 2.875789 5.531117 5.480857 5.475786
c ubuntu 20.04 x64 gcc 9 11 1,000,000 0.608750 0.737955 0.559779 0.826313 0.834720 0.822128
d ubuntu 20.04 x64 gcc 9 17 1,000,000 0.627507 0.744456 0.562259 0.817209 0.826818 0.812244
e windows x64 vs 16 14 1,000,000 0.381497 0.371777 0.349323 0.855587 0.888918 0.913379
f windows x64 vs 16 17 1,000,000 0.383457 0.362494 0.356750 0.873124 0.864294 0.892336
g windows x64 vs 17 14 1,000,000 0.363971 0.384013 0.360188 0.894622 0.916926 0.917526
h windows x64 vs 17 17 1,000,000 0.374675 0.385994 0.364625 0.897530 0.898167 0.895440

Using std::to_string in get_callback_id

If you look at the table above with the perspective of "how does utilizing std::to_string affect performance with respect to the original stringstream approach", then the following table is the % difference between the suggested std::to_string technique and the original stringstream implementation (larger is better).

run make-pair operator emplace
a 59% 49% 49%
b 57% 47% 47%
c 26% 12% 32%
d 23% 10% 31%
e 55% 58% 62%
f 56% 58% 60%
g 59% 58% 61%
h 58% 57% 59%

Replacing the stringstream conversion with std::to_string is faster in all cases, no matter what technique is chosen to update the callback map. The average gain is right around the 47.7% mark.

Updating the map in register_callback

In the register callback method, the question of how to add information to the map comes down to 3 possible options: insert with std::make_pair, index operator [], or emplace. This is a little harder to see but if you look at the relative ranking of the total execution time, the following table emerges (lower is better).

run to_string
+
make-pair
to_string
+
operator
to_string
+
emplace
stringstream
+
make-pair
stringstream
+
operator
stringstream
+
emplace
a 1* 2 3 3 1 2
b 1* 3 2 3 2 1
c 2 3 1 2 3 1
d 2 3 1 2 3 1
e 3 2 1 1 2 3
f 3 2 1 2 1 3
g 2 3 1 1 2 3
h 2 3 1 2 3 1
Average rank 2.0 2.6 1.4 2.0 2.1 1.9

In short, emplace tends to perform better in nearly all the tested platforms, whether or not the original stringstream or the suggested to_string approach is used.

I'm a little surprised that the original insert methodology (with make-pair) is somehow better on RPi (see asterisk) as we tend to use that platform quite frequently. Seems using emplace is worse in that case. However, without doing a bunch of preprocessor defines to swap out techniques based on platform, I think the greater good can be served by switching to emplace. If someone wants to help explore the RPi perf difference, that effort may help guide the implementation of a more tailored solution that fits more configurations. It could be an old compiler issue, ARM vs x64, ARM32-specific, etc.

Finally, even though emplace generally works better, the perf gain is small compared to the to_string change. This change, assuming to_string is accepted and the ARM32 results are omitted, yields a modest an average of 6.26% improvement over the original insert technique.

Summary:

Two small changes can help reduce time spent in creating and registering the callback id. The two changes are submitted for review as a unit but they could be taken individually if desired.

Replaced the stringstream technique in get_callback_id with std::to_string.
Changed to emplace to avoid extra work with insert.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant