-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Update libcxx and libcxxabi from LLVM 20.1.8 to 21.1.8 #26058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update libcxx and libcxxabi from LLVM 20.1.8 to 21.1.8 #26058
Conversation
Our prevous `__assertion_handler` was copied from `libcxx/vendor/llvm/default_assertion_handler.in`. This updates out `__assertion_handler` to the new `libcxx/vendor/llvm/default_assertion_handler.in`. For what this file is, see the PR description of emscripten-core#22994, with which the file was added.
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (12) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_cxx_ctors1.json: 149109 => 153388 [+4279 bytes / +2.87%] codesize/test_codesize_cxx_ctors2.json: 148513 => 152783 [+4270 bytes / +2.88%] codesize/test_codesize_cxx_except.json: 194578 => 199006 [+4428 bytes / +2.28%] codesize/test_codesize_cxx_except_wasm.json: 164093 => 168568 [+4475 bytes / +2.73%] codesize/test_codesize_cxx_except_wasm_legacy.json: 161751 => 166220 [+4469 bytes / +2.76%] codesize/test_codesize_cxx_lto.json: 125441 => 120672 [-4769 bytes / -3.80%] codesize/test_codesize_cxx_mangle.json: 258655 => 263309 [+4654 bytes / +1.80%] codesize/test_codesize_cxx_noexcept.json: 151526 => 155961 [+4435 bytes / +2.93%] codesize/test_codesize_cxx_wasmfs.json: 177088 => 181644 [+4556 bytes / +2.57%] codesize/test_codesize_files_wasmfs.json: 56055 => 56518 [+463 bytes / +0.83%] codesize/test_minimal_runtime_code_size_hello_embind.json: 15129 => 15201 [+72 bytes / +0.48%] codesize/test_minimal_runtime_code_size_hello_embind_val.json: 11858 => 11941 [+83 bytes / +0.70%] Average change: +1.58% (-3.80% - +2.93%) ```
sbc100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention in the PR title or description the version you are upgrading from (so it clear this is major update, not just a minor one).
As of LLVM 21 (llvm/llvm-project#122790), libcxx's `std::basic_string`'s constructor is marked as `_Nonnull`: https://github.com/llvm/llvm-project/blob/2078da43e25a4623cab2d0d60decddf709aaea28/libcxx/include/string#L1061 https://github.com/llvm/llvm-project/blob/2078da43e25a4623cab2d0d60decddf709aaea28/libcxx/include/__config#L1252 Comiling this with LLVM 21's libcxx errors out with: ```console ../emscripten/cache/ports/regal/regal-version_7/src/boost/boost/print/string_list.hpp:281:46: error: null passed to a callee that requires a non-null argument [-Werror,-Wnonnull] 281 | PushBack(*this).assign(string ? string : T(NULL)); | ^~~~ ../emscripten/cache/sysroot/include/unistd.h:33:14: note: expanded from macro 'NULL' 33 | #define NULL 0L ``` This suppresses the warnings.
These files use `exit`, which requires `#include <stdlib.h>`, but didn't have the include. These didn't error out because `libcxx/include/math.h` used to contain `#include <stdlib.h>` but not anymore. It was removed in llvm/llvm-project#139586.
This uses `exit`, which requires `#include <stdlib.h>`, but didn't have the include. This didn't error out because `libcxx/include/math.h` used to contain `#include <stdlib.h>` but not anymore. It was removed in llvm/llvm-project#139586.
libcxx used to have `module.modulemap` but it was changed to `module.modulemap.in` in llvm/llvm-project#134699. This recreates `module.modulemap` from that `module.modulemap.in`. Without this `other.test_cxx20_modules_std_headers` fails.
I updated the description. |
This increase seems to be mainly due to the changes of implementaion of `std::num_put` in `system/lib/libcxx/include/__locale_dir/num.h`. This now calls functions, namely `__to_chars_integral` and its helpers.
|
I guess the only blocker for this change now is the codesize changes? Could you rebase on top of the latest llvm 23 changes? |
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (1) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_cxx_lto.json: 125435 => 120666 [-4769 bytes / -3.80%] Average change: -3.80% (-3.80% - -3.80%) ```
Done |
|
Remaining issues:
Since this change is quite big I wonder if its worth splitting out the test/fetch changes and landing those first? |
I looked into them, and while there is no single reason behind the increase, the most influential thing is the one I described in
This PR only updates several directories in llvm-libc that libcxx depend on, and the updated files are newer than the current code. I think we can update them to tot eventually, but I don't see any downside in landing this first. Also, I'd suggest we discuss more on whether we actually need to follow tot. It's harder to maintain and extract our changes.
|
|
@kripken @juj WDYT this libc++ update? It comes with some minor codesize regressions of a few of Kb which for small programs could be a couple of percent I think. This time around there doesn't seem to be any one culprit here, just the many changes to upstream from the last 6 months. I'm tempted to just land this as is, since I don't think we want to be in the business of carrying major downstream patches (or even having to understand the inner workings of libc++ itself). I also don't think we want to be lagging in our libc++ version. |
|
For |
|
I agree with you two, the regressions seem spread out and not worth investigating in detail (unfortunately). |
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (1) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_cxx_lto.json: 125443 => 120674 [-4769 bytes / -3.80%] Average change: -3.80% (-3.80% - -3.80%) ```
|
Thanks! Can anyone LGTM this? |
Thanks for the heads-up. I agree the codesize regression is a bit annoying, but given that this is from a third-party library, it would be far more annoying to not go with the flow. So LGTM to land to keep things simple. It is great to have the code size tests that we can see regressions like this, and their impact. Thanks @aheejin for also investigating in that direction. Given that most of Emscripten system headers do not require/depend on libc++, the regression is not unconditional to all users. So if users are avoiding libc++, the issue does not occur. (The only exception here being wasmfs, sadly)
|
| @@ -110,6 +110,9 @@ def create(final): | |||
| '-Wno-unused-parameter', | |||
| '-Wno-nontrivial-memaccess', | |||
| '-fdelayed-template-parsing', | |||
| # src/boost/boost/print/string_list.hpp calls std::string(NULL), whose | |||
| # consturctor is declared _Nonnull | |||
| '-Wno-nonnull', | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what happens in practice when calling std::string(NULL) in the new libc++? That won't cause an immediate crash?
(consturctor -> constructor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec-wise, it's UB.
In practice, it depends on the hardening mode.
Hardening mode was added in LLVM 19, and it provides additional checks for UB. More you check you pay more in runtime performance. There are four modes:
emscripten/system/lib/libcxx/include/__config
Lines 48 to 68 in 6aa30fb
| // The library provides the macro `_LIBCPP_HARDENING_MODE` which can be set to one of the following values: | |
| // | |
| // - `_LIBCPP_HARDENING_MODE_NONE`; | |
| // - `_LIBCPP_HARDENING_MODE_FAST`; | |
| // - `_LIBCPP_HARDENING_MODE_EXTENSIVE`; | |
| // - `_LIBCPP_HARDENING_MODE_DEBUG`. | |
| // | |
| // These values have the following effects: | |
| // | |
| // - `_LIBCPP_HARDENING_MODE_NONE` -- sets the hardening mode to "none" which disables all runtime hardening checks; | |
| // | |
| // - `_LIBCPP_HARDENING_MODE_FAST` -- sets that hardening mode to "fast". The fast mode enables security-critical checks | |
| // that can be done with relatively little runtime overhead in constant time; | |
| // | |
| // - `_LIBCPP_HARDENING_MODE_EXTENSIVE` -- sets the hardening mode to "extensive". The extensive mode is a superset of | |
| // the fast mode that additionally enables checks that are relatively cheap and prevent common types of logic errors | |
| // but are not necessarily security-critical; | |
| // | |
| // - `_LIBCPP_HARDENING_MODE_DEBUG` -- sets the hardening mode to "debug". The debug mode is a superset of the extensive | |
| // mode and enables all checks available in the library, including internal assertions. Checks that are part of the | |
| // debug mode can be very expensive and thus the debug mode is intended to be used for testing, not in production. |
We disabled it when we updated to LLVM 19 not to pay runtime performance cost:
emscripten/system/lib/libcxx/include/__config_site
Lines 26 to 27 in 6aa30fb
| // Hardening | |
| #define _LIBCPP_HARDENING_MODE_DEFAULT _LIBCPP_HARDENING_MODE_NONE |
std::basic_string has this when the input is null:
emscripten/system/lib/libcxx/include/string
Line 1083 in 6aa30fb
| _LIBCPP_ASSERT_NON_NULL(__s != nullptr, "basic_string(const char*) detected nullptr"); |
But _LIBCPP_ASSERT_NON_NULL is enabled only when the hardening mode is _LIBCPP_HARDENING_MODE_EXTENSIVE or _LIBCPP_HARDENING_MODE_DEBUG. Because we currently use _LIBCPP_hardening_MODE_NONE, it does nothing:
emscripten/system/lib/libcxx/include/__assert
Line 104 in 6aa30fb
| # define _LIBCPP_ASSERT_NON_NULL(expression, message) ((void)0) |
So for us, it won't immediately crash and the program can malfunction.
It'd be better if we can avoid this. I considered emscripten-ports/regal#10, but @sbc100 suggested we probably would't want to maintain the port.
So this suppresses the warning. This at least doesn't make things worse, because that code was already there for years.
| 'emmalloc_O2': (['-sMALLOC=emmalloc', '-O2'], 125000), | ||
| 'dlmalloc_O2': (['-sMALLOC=dlmalloc', '-O2'], 132000), | ||
| 'mimalloc_O2': (['-sMALLOC=mimalloc', '-O2'], 180000), | ||
| 'emmalloc_O2': (['-sMALLOC=emmalloc', '-O2'], 130000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, emmalloc does not utilize libc++ - do you know why this change in size?
|
LGTM once we figure out why the malloc benchmark changed. |
sbc100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, the answer is pretty obvious test_malloc_size uses hello_libcxx.cpp to measure codesize which includes libc++
|
Feel free to land this. Once we do we should probably cut a new release. |
|
Can we land this now? |
This updates libcxx and libcxxabi from 20.1.8 to LLVM 21.1.8:
https://github.com/llvm/llvm-project/releases/tag/llvmorg-21.1.8
This also updates LLVM's libc header directories that libcxx is dependent on:
Additional changes:
Copy
vendor/llvm/default_assertion_handler.into__assertion_handler: 987b28cOur previous
__assertion_handlerwas copied fromlibcxx/vendor/llvm/default_assertion_handler.in. This updates our__assertion_handlerto the newlibcxx/vendor/llvm/default_assertion_handler.in. For what this file is, see the PR description of Update libcxx and libcxxabi to LLVM 19.1.4 #22994, with which the file was added.Build regal with -Wno-nonnull: a1fcbfc and 566c822
After [libc++] Diagnose when nullptrs are passed to string APIs llvm/llvm-project#122790, libcxx's
std::basic_string's constructor is marked as_Nonnull. This suppresses the warnings.Add
module.modulemapto libcxx: 2a01e8blibcxx used to have
module.modulemapbut it was changed tomodule.modulemap.inin [libc++] Make __config_site modular llvm/llvm-project#134699. This recreatesmodule.modulemapfrom thatmodule.modulemap.in. Without thisother.test_cxx20_modules_std_headersfails.Increase size expectations of
test_malloc_size*: 09bc21eThis increase seems to be mainly due to the changes of implementation of
std::num_putinlibcxx/include/__locale_dir/num.hin [libc++] Granularize <locale> llvm/llvm-project#146650. This now calls other functions, namely__to_chars_integraland its helpers.