Skip to content

ELF: Overhaul metadata extraction#87285

Open
etcwilde wants to merge 6 commits intoswiftlang:mainfrom
etcwilde:ewilde/metadata-extraction
Open

ELF: Overhaul metadata extraction#87285
etcwilde wants to merge 6 commits intoswiftlang:mainfrom
etcwilde:ewilde/metadata-extraction

Conversation

@etcwilde
Copy link
Member

The old static mirror metadata extraction mechanism used section headers to try and find the metadata sections. This is fine for MachO binaries, where section headers are guaranteed to be included in the loaded image. On ELF, this is not the case. As a workaround, the Image::scanELF machinery would attempt to add the entire file as its own segment.

// FIXME: ReflectionContext tries to read bits of the ELF structure that
// aren't normally mapped by a phdr. Until that's fixed,
// allow access to the whole file 1:1 in address space that isn't otherwise
// mapped.
Segments.push_back({HeaderAddress, O->getData()});

This worked when the virtual address and file offsets aligned, but with lld on FreeBSD, these were not always the same and so we would fail to find the metadata, even with the hack in place.

The new design uses a table, stored as a note, which is guaranteed to get its own segment. In executables, the executable will be the first thing loaded, so the addresses of the section start/stop symbols can be pre-computed and populated into that table, but with shared objects, they cannot, so lld leaves them empty and sets the appropriate relocation data for the loader. The object memory reader didn't used to apply the relocations, only track them, so I've taught it to apply the relocations and read from the relocated memory, setting up our table for static mirrors.

The table is nearly identical to the MetadataSections table that the runtime loads. The main difference is that the table contains the start/stop pointer rather than the start pointer and byte count. This is because the loader can apply the relocations and give us the symbol address for each start/stop pointer, but cannot do the subtraction necessary to give us the byte count without the additional constructor function.

For future work, we should be able to adjust the table that swift_addNewDSOImage consumes to take the start/stop pointers instead of the start pointer and the byte count. Many of the metadata registration functions already add the count to the start pointer to get the end pointer as it is, which means that we are creating the constructor function to do the subtraction to get the byte count, and then immediately turning around and inverting it to get the end pointer again. It should help reduce load times on ELF platforms if we could simply take the relocated table and pass that into the runtime registration function directly.

Re-enables:

  • test/Reflection/typeref_decoding_imported.swift
  • test/Reflection/capture_descriptors.sil

Fixes: rdar://159139154

etcwilde and others added 6 commits February 16, 2026 11:22
Section headers aren't available in loaded ELF files. We will be adding
a special note segment to the ELF binaries containing a table of the
metadata sections. This is a fairly invasive change that will diverge
the two, so splitting it now to make it easier to target each
specifically.
This table contains the start/stop pointers for each of the metadata
sections. The table is in approximately the same format as what the
runtime consumes through swift_addNewDSOImage. The main difference is
whether the section ranges are represented as start/stop symbol pointer
or start pointer/byte count. The relocation machinery can handle the
start stop symbols, but cannot run the subtractions needed to compute
the byte counts, hence the constructor registration function that copies
everything over into a read/write table.
Also worth noting, Swift Testing needs to write back a fixed
up image start pointer into the table, so that will need fixing before
we start plumbing in a readonly table.

As of right now, process launch goes through relocating, then
subtracting the start symbol from end symbol to get the byte count.
Then inside of the DSO processing, the registration functions for each
metadata type go and add the byte count to the start pointer to get the
end pointer. We should be able to improve the launch performance on ELF
(and likely COFF) processes by using the pre-computed start/stop tables
directly, though I don't have numbers as to what extent.

This wires the contents of the new table into the runtime loading
mechanism, demonstrating that things are generally working correctly.
Migrating ELF metadata reader from reading section headers to loading
the data from the metadata table. This prevents reading uninitialized
data and avoids crashes.
Extracting the runtime metadata requires reading the note table
containing the addresses to each section. In executables, this table is
pre-computed and is usable directly. When run against libraries, the
linker doesn't know where the image will be mapped into the process
address space, so it requires that relocations are applied in order to
be meaningful.

This teaches the ELF scanner to apply the dynamic relocations to the
memory, not just track them. ReflectionContext readELFSections can load
the table directly from the object file reader once relocations are
applied.
The changes to how metadata is extracted results in the metadata being
found more reliably, so these tests are no longer failing.

Fixes: rdar://159139154
Running git-clang-format over the whole patch set to clean things up.
@etcwilde
Copy link
Member Author

@swift-ci please test

@etcwilde etcwilde moved this to In Progress in Swift on FreeBSD Feb 17, 2026
// header
.header =
{
.namesz = 7, // namesz: "swift6\0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to pick a slightly more specific name in case we decide we want to do more with notes in the future.

Copy link
Member Author

@etcwilde etcwilde Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what the segment type is for. The type is namespaced to the name, so we can use any 4-byte number. I went with the bytes that spell out s5md, though little endian turns it into dm5s in the raw binary.

DECLARE_SWIFT_SECTION(swift5_accessible_functions)
DECLARE_SWIFT_SECTION(swift5_runtime_attributes)

DECLARE_SWIFT_SECTION(swift5_tests)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can mark this section as "maybe not present" too.

(const void *)swift::runtime::backtrace::_swift_backtrace_isThunkFunction;
#endif

// Create empty sections to ensure that the start/stop symbols are synthesized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried, but if we declare the symbols we add as weak, perhaps that would let us resolve the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FreeBSD Platform: FreeBSD

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants