Staticlib hide internal symbols#155338
Conversation
|
rustbot has assigned @petrochenkov. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
|
This would also need to rename symbols to avoid conflicts between two rust staticlibs ending up getting linked together, right? |
Why exactly is that the case? |
ff707ad to
7ac49d1
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
My primary goal right now is to reduce binary size, so I don't have immediate plans to implement symbol renaming. This means that linking multiple Rust staticlibs together can still result in multiple definition errors. Would you like me to address that in this PR as well? It seems feasible to implement — for example, by rehashing symbols and updating their references accordingly. |
I previously assumed this symbol needed to remain externally visible to support scenarios requiring cross-language exception propagation. Do you think we should also set rust_eh_personality as hidden? |
|
If it isn't too hard it would be nice to do symbol renaming too. I think doing in-place modification isn't going to work for that though. Adding a unique suffix would require growing the size of the string table. |
|
Got it. I will first fix the |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
5e1c3a1 to
c7d4e98
Compare
|
@bors delegate=try |
|
✌️ @cezarbbb, you can now perform try builds on this pull request! You can now post |
|
@bors try |
This comment has been minimized.
This comment has been minimized.
`-Zstaticlib-hide-internal-symbols`: Hide non-exported internal symbols from staticlibs
|
@bors try jobs=x86_64-* |
This comment has been minimized.
This comment has been minimized.
786373a to
40c254b
Compare
-Zstaticlib-hide-internal-symbols and Zstaticlib-rename-internal-symbols: hide/rename internal symbols in staticlibs
This comment has been minimized.
This comment has been minimized.
40c254b to
54a7c4c
Compare
Yes, could you put the legend from #155338 (comment) into the PR description before the tables as well. |
54a7c4c to
303ab30
Compare
|
Enough for now. |
|
Reminder, once the PR becomes ready for a review, use |
|
@rustbot ready |
This comment has been minimized.
This comment has been minimized.
- Symbol collection directly utilizes `exported_symbols` for reverse matching - Using RmetaLink to accurately track archive content instead of only checking `.rcgu.o`
dca392e to
a9c6743
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
|
||
| ab.build(out_filename); | ||
| let exported_symbols = if sess.opts.unstable_opts.staticlib_hide_internal_symbols { | ||
| if !matches!(&*sess.target.archive_format, "gnu" | "bsd" | "darwin") { |
There was a problem hiding this comment.
I suspect the right condition here is "is the target using ELF or Mach-O" rather than something based on archive format?
We should be able to update ELF/Macho-O files in archives regardless of the archive's own format.
The object file format can be obtained from sess.target.binary_format.
| } | ||
| }; | ||
|
|
||
| let is_macho = self.sess.target.llvm_target.contains("-apple-macosx"); |
There was a problem hiding this comment.
| let is_macho = self.sess.target.llvm_target.contains("-apple-macosx"); | |
| let is_macho = self.sess.target.binary_format == BinaryFormat::MachO; |
Also, can change the is_macho checks into matches on sess.target.binary_format, everything except BinaryFormat::{Elf,MachO} should be unreachable!() here.
| if !is_rust_object || exported_symbols.is_none() { | ||
| Box::new(data) | ||
| } else if is_macho { | ||
| match macho_apply_hide(data, exported_symbols.as_ref().unwrap()) { |
There was a problem hiding this comment.
Nit: better use if let Some(exported_symbols) = exported_symbols or match exported_symbols to avoid the unwraps.
| } else if is_macho { | ||
| match macho_apply_hide(data, exported_symbols.as_ref().unwrap()) { | ||
| Cow::Borrowed(_) => Box::new(data), | ||
| Cow::Owned(v) => Box::new(v), |
There was a problem hiding this comment.
Isn't Cow itself AsRef<[u8]>?
Then it would be possible to box the whole Cow without matching.
| /// Offset of `st_other` in a 32-bit ELF symbol table entry. | ||
| const ELF_ST_OTHER_ELF32: usize = 13; | ||
| /// Offset of `n_type` in a Mach-O nlist entry (same for 32-bit and 64-bit). | ||
| const MACHO_N_TYPE_OFFSET: usize = 4; |
There was a problem hiding this comment.
Perhaps could use mem::offset_of!(...) here instead of hardcoding the numbers (if that's not too inconvenient).
| let sections = header.sections(endian, data).ok()?; | ||
| let symtab = sections.symbols(endian, data, elf::SHT_SYMTAB).ok()?; | ||
|
|
||
| if symtab.len() <= 1 { |
There was a problem hiding this comment.
This check is redundant and probably not useful for performance.
| let strings = symtab.strings(); | ||
| let mut patches = Vec::new(); | ||
|
|
||
| for (_, sym) in symtab.enumerate().skip(1) { |
There was a problem hiding this comment.
This null symbols skipping is likely also redundant and not useful for performance (also confuses the reader).
| continue; | ||
| } | ||
| let Ok(name_bytes) = sym.name(endian, strings) else { continue }; | ||
| let Ok(name) = std::str::from_utf8(name_bytes) else { continue }; |
There was a problem hiding this comment.
| let Ok(name) = std::str::from_utf8(name_bytes) else { continue }; | |
| let Ok(name) = str::from_utf8(name_bytes) else { continue }; |
| let Ok(name_bytes) = sym.name(endian, strings) else { continue }; | ||
| let Ok(name) = std::str::from_utf8(name_bytes) else { continue }; | ||
| if !exported.contains(name) { | ||
| let sym_addr = sym as *const Elf::Sym as *const u8 as usize; |
There was a problem hiding this comment.
| let sym_addr = sym as *const Elf::Sym as *const u8 as usize; | |
| let sym_addr = sym as *const Elf::Sym as usize; |
| let Ok(name) = std::str::from_utf8(name_bytes) else { continue }; | ||
| if !exported.contains(name) { | ||
| let sym_addr = sym as *const Elf::Sym as *const u8 as usize; | ||
| let offset = sym_addr.wrapping_sub(data_ptr) + st_other_offset; |
There was a problem hiding this comment.
| let offset = sym_addr.wrapping_sub(data_ptr) + st_other_offset; | |
| let offset = sym_addr - data_ptr + st_other_offset; |
Wrapping shouldn't happen here.
|
Most of the ELF comments apply to Mach-O too, as usual. |
View all comments
According to issue #104707, when building a staticlib, all Rust internal symbols — mangled symbols,
#[rustc_std_internal_symbol]items, allocator shims, etc. — leak out of the static archive. In contrast, cdylib correctly exports only#[no_mangle]symbols via a linker version script.-Zstaticlib-hide-internal-symbolsdirectly post-processes ELF object files in the archive: parsing theSHT_SYMTABsections and settingSTV_HIDDENvisibility on anyGLOBAL/WEAKdefined symbol that is not in the exported symbol set, without changing the binding. This is an in-place modification (only writing the st_other byte per matching entry), with zero overhead.Supported on ELF targets (Linux, BSD, etc.) and Apple targets (macOS, iOS, etc.). On unsupported targets (Windows), a warning is emitted and the flag has no effect.
Update: The rename counterpart (
-Zstaticlib-rename-internal-symbols) is in #156950.The test code are as follows:
1.a std rust staticlib:
1.b downstream c program:
The test results with different compiler flags(which might cause binary size reduction) are as follows:
1.c result with
-Zstaticlib-hide-internal-symbols1.d result with
-Zstaticlib-hide-internal-symbols + -Zstaticlib-rename-internal-symbols2.a no_std rust staticlib
2.b downstream c program
The test results with different compiler flags(which might cause binary size reduction) are as follows:
2.c result with
-Zstaticlib-hide-internal-symbols2.d result with
-Zstaticlib-hide-internal-symbols + -Zstaticlib-rename-internal-symbolsTest results show that this compiler option is beneficial for scenarios where LTO cannot be enabled.
r? @bjorn3 @petrochenkov