Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kpatch module build could fail when kernel 5.19+ contains dynamic symbols #1284

Open
sumanthkorikkar opened this issue Jul 29, 2022 · 21 comments
Assignees

Comments

@sumanthkorikkar
Copy link
Contributor

sumanthkorikkar commented Jul 29, 2022

Hi All,

when building kpatch module for 5.19+ kernel with -ffunction-sections,
the vmlinux build could fail during link stage.

Reason:
s390 kernel is built with -fPIE and for kpatch purpose built with ARCH_KCFLAGS "-ffunction-sections -fdata-sections"

Output:
ld: .tmp_vmlinux.btf: too many sections: 65614 (>= 65280)
ld: final link failed: nonrepresentable section on output
BTF .btf.vmlinux.bin.o

In this scenario:

  1. gABI doesn't support dynamic symbols in output sections beyond 64k.
    Ref: binutils : check_dynsym (bfd *abfd, Elf_Internal_Sym *sym)
    https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elflink.c;h=2b1450fa4e146936ba4fd6d02691a863f26a88b6;hb=HEAD#l10183

  2. s390 kernel
    readelf --dyn-syms vmlinux | wc
    1556

  3. x86 kernel doesn't seems to have dynamic symbols and hence does not create this problem.
    readelf --dyn-syms vmlinux | wc -l
    0

Possible fix:

  1. Provide the explicit TARGETS eg:
    TARGETS="fs/proc/" KPATCHBUILD_OPTS="-v $vmlinux -s $linux_src -d" ./kpatch-test rhel-9.0/data-new.patch

  2. Change linker script like:

diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 2e526f11b91e..1d3d2d878acb 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -48,7 +48,7 @@ SECTIONS
                IRQENTRY_TEXT
                SOFTIRQENTRY_TEXT
                FTRACE_HOTPATCH_TRAMPOLINES_TEXT
-               *(.text.*_indirect_*)
+               *(.text.*)
                *(.gnu.warning)
                . = ALIGN(PAGE_SIZE);
                _etext = .;             /* End of text section */
  1. Create custom target in kernel top Makefile. This target would build only kernel objects without linking vmlinux target.

Question:
Could you please provide me suggestions, how this could be handled better in kpatch?

  • without generating vmlinux in (original build) and (patched build).

Thank you

Best Regards
Sumanth

@sumanthkorikkar sumanthkorikkar changed the title kpatch module build could fail when kernel 5.19+ contains dynamic symbols & built with -ffunction-sections kpatch module build could fail when kernel 5.19+ contains dynamic symbols Jul 29, 2022
@joe-lawrence
Copy link
Contributor

Hi @sumanthkorikkar, thanks for the detailed report.

Do you happen you know why s390 arch uses dynamic symbols while x86 does not?

Also, I thought (at one time) that kernel LTO efforts leveraged -ffunction-sections as well. I wonder if that project would eventually hit this limitation as well, assuming they are working with an arch that uses dynamic symbols. Perhaps they would be fellow travelers in this space and interested in supporting dynamic symbols in output sections beyond 64k.

As for possible workarounds, a few questions and ideas on proposed solutions:

  1. Provide the explicit TARGETS eg:
    TARGETS="fs/proc/" KPATCHBUILD_OPTS="-v $vmlinux -s $linux_src -d" ./kpatch-test rhel-9.0/data-new.patch

This doesn't seem ideal as the user may not know exactly which target directories need to be rebuilt (ie, kpatch-build is doing that work for us).

  1. Change linker script [ ... filter (.text.indirect*) sections ... ]

I assume this would help as we recently added the external expoline requirement for kpatch? And then does it only buy us only a few less dynamic symbols?

  1. Create custom target in kernel top Makefile. This target would build only kernel objects without linking vmlinux target.

Well, we already slightly modify link-vmlinux.sh and Makefile.modfinal so this idea is not without precedent.

Maybe by modifying a recent top level Makefile like (untested):

diff --git a/Makefile b/Makefile
index 00fd80c5dd6e..6a9afdc3ee73 100644
--- a/Makefile
+++ b/Makefile
@@ -1844,6 +1844,9 @@ $(build-dirs): prepare
        single-build=$(if $(filter-out $@/, $(filter $@/%, $(KBUILD_SINGLE_TARGETS))),1) \
        need-builtin=1 need-modorder=1
 
+.PHONY: kpatch
+kpatch: $(build-dirs)
+
 clean-dirs := $(addprefix _clean_, $(clean-dirs))
 PHONY += $(clean-dirs) clean
 $(clean-dirs):

though adding anything as specific as that runs into code drift maintenance. (I can already see that build-dirs is relatively new and missing from older kernels.) Alternatively, I think the same is achievable by building the kernel with make */

In any case, we'd lose the ability to specify the targets on the kpatch-build command line.

@jpoimboe
Copy link
Member

jpoimboe commented Aug 6, 2022

Do you happen you know why s390 arch uses dynamic symbols while x86 does not?

I have the same question. There will probably be other features in the future which rely on -ffunction-sections, so if there's some way for the s390 kernel to avoid using dynamic symbols then that might be the best way to "fix" the issue.

@sumanthkorikkar
Copy link
Contributor Author

sumanthkorikkar commented Aug 11, 2022

Hi Joe, Josh,

Do you happen you know why s390 arch uses dynamic symbols while x86 does not?

Discussed this with the compiler team.

x86 kernel:

  • The decompressor is compiled with -fPIC flag and it is made relocatable.
  • When CONFIG_RELOCATABLE is selected, the LDFLAGS_vmlinux is set to --emit-relocs. This ensures that the relocs stays in the binary and then the x86 kernel adjusts the final load address
  • Ref:
    • a02150610776 ("x86, relocs: Move ELF relocation handling to C")
    • 968de4f02621 ("[PATCH] i386: Relocatable kernel support")

s390 kernel:

  • kernel is linked as PIE and contains the dynamic relocations so that it can be processed during the bootup.
    (kernel is considered similar to shared libaries)
  • Ref:
    • 805bc0bc238f ("s390/kernel: build a relocatable kernel")

As for possible workarounds, a few questions and ideas on proposed solutions:

  1. Provide the explicit TARGETS eg:
    TARGETS="fs/proc/" KPATCHBUILD_OPTS="-v $vmlinux -s $linux_src -d" ./kpatch-test rhel-9.0/data-new.patch

This doesn't seem ideal as the user may not know exactly which target directories need to be rebuilt (ie, kpatch-build is doing that work for us).

Yes agree. Specifying the TARGETS would be just a temporary workaround

  1. Change linker script [ ... filter _(.text._indirect*) sections ... ]

I assume this would help as we recently added the external expoline requirement for kpatch? And then does it only buy us only a few less dynamic symbols?

With -ffunction-sections, each function would its own .text section. However, As per my understanding the
vmlinux which is created during kpatch build process does not matter. Individual object files would
still have separate text section for each function and kpatch build deals with only those.
Hence, combining all the .text sections during linking stage eliminates the ld: .tmp_vmlinux.btf: too many sections: 65614 (>= 65280) alltogether. This could be one possible approach (A quick fix).

Let me know your thoughts.

  1. Create custom target in kernel top Makefile. This target would build only kernel objects without linking vmlinux target.

Well, we already slightly modify link-vmlinux.sh and Makefile.modfinal so this idea is not without precedent.

Maybe by modifying a recent top level Makefile like (untested):

diff --git a/Makefile b/Makefile
index 00fd80c5dd6e..6a9afdc3ee73 100644
--- a/Makefile
+++ b/Makefile
@@ -1844,6 +1844,9 @@ $(build-dirs): prepare
        single-build=$(if $(filter-out $@/, $(filter $@/%, $(KBUILD_SINGLE_TARGETS))),1) \
        need-builtin=1 need-modorder=1
 
+.PHONY: kpatch
+kpatch: $(build-dirs)
+
 clean-dirs := $(addprefix _clean_, $(clean-dirs))
 PHONY += $(clean-dirs) clean
 $(clean-dirs):

though adding anything as specific as that runs into code drift maintenance. (I can already see that build-dirs is relatively new and missing from older kernels.) Alternatively, I think the same is achievable by building the kernel with make */

In any case, we'd lose the ability to specify the targets on the kpatch-build command line.

I tried this patch and this works in normal scenario. However, module.patch failed, because it couldn't identify the nfsd/export.o (module) and only identified (af_netlink.o) kpatch_string as new function. Will check further.

Thanks

@jpoimboe
Copy link
Member

With -ffunction-sections, each function would its own .text section. However, As per my understanding the vmlinux which is created during kpatch build process does not matter. Individual object files would still have separate text section for each function and kpatch build deals with only those. Hence, combining all the .text sections during linking stage eliminates the ld: .tmp_vmlinux.btf: too many sections: 65614 (>= 65280) alltogether. This could be one possible approach (A quick fix).

Let me know your thoughts.

This may not be a good long term solution. The kernel is moving towards enabling LTO, in which case kpatch-build will have to analyze vmlinux.o rather than individual translation units.

x86 also has recently added IBT, for which kpatch-build might also need to analyze vmlinux.o (not sure about this one yet).

Also, there are other features which use -ffunction-sections (fgkaslr, as one example).

So the s390 kernel needs to figure out a way to support >64k sections.

@jpoimboe
Copy link
Member

Would it be possible for s390 to use --emit-relocs?

@sumanthkorikkar
Copy link
Contributor Author

Hi Josh, Joe

Thank you for the inputs.

Agree. we would definitely like to have emit-relocs or similar support for s390 kernel in long term. But
this might take a while to support based on the complexities.

As a short term solution for s390 kpatch, Hence, It would be necessary to provide either
explicit TARGETS or making this change in the linker script.

@jpoimboe
Copy link
Member

@sumanthkorikkar

After looking at how x86 does it, converting s390 to --emit-relocs actually seems pretty straightforward. I made the following patch, it booted successfully with CONFIG_RANDOMIZE_BASE. I'll try to give it some more testing and post upstream.

s390-reloc.patch.txt

@jpoimboe
Copy link
Member

jpoimboe commented Aug 24, 2022

Hm, I just spotted an obvious bug in handle_relocs(), not sure how it's booting ;-)

EDIT: oops, accidentally tested the wrong kernel! Anyway the patch is rough, but you get the idea.

@jpoimboe
Copy link
Member

jpoimboe commented Aug 25, 2022

Here's a working version of the patch. I haven't tested it with 64k+ symbols and kpatch yet.

s390-reloc.patch.txt
.

@sumanthkorikkar
Copy link
Contributor Author

sumanthkorikkar commented Aug 26, 2022

Hi Josh, Thank you for the patch

Few things:

  • Option -mno-pic-data-is-text-relative would generate R_390_GOTENT. This should be handled in do_reloc().
  • Greater than 64k output sections works, as no dynamic symbols are present.
  • ARCH_KFLAGS+="-fPIC" should be added to s390 kpatch tools, As -mno-pic-data-is-text-relative can be used only with -fPIC.
    kpatch seems to work with these.

I am yet to understand, if other rela types (Other than R_390_64) needs offset adjustment if any.

Also, I will be on vacation for next 4 weeks.

@jpoimboe
Copy link
Member

  • Option -mno-pic-data-is-text-relative would generate R_390_GOTENT. This should be handled in do_reloc().

But that option is only used for the livepatch, for which do_reloc() doesn't run. Instead the module relocation code runs (apply_relocate_add() in arch/s390/kernel/module.c. So I don't see the need for R_390_GOTENT in do_reloc().

  • Greater than 64k output sections works, as no dynamic symbols are present.

  • ARCH_KFLAGS+="-fPIC" should be added to s390 kpatch tools, As -mno-pic-data-is-text-relative can be used only with -fPIC.
    kpatch seems to work with these.

Yes, I discovered that as well. In kpatch-build, ARCH_KCFLAGS needs -fPIC added (along with the existing -mno-pic-data-is-text-relative) to force the use of R_390_GOTENT for text accesses to global data.

@sumanthkorikkar
Copy link
Contributor Author

Hi @jpoimboe

I rebased your changes and tried testing it on v6.2. It looks promising to me. Could you please
send these changes across to the s390 mailing list for maintainers review.

Thanks a lot.

@joe-lawrence
Copy link
Contributor

Hi @jpoimboe , @sumanthkorikkar , we just hit this while rebasing the integration tests to v6.3. Shall we retry with the patch from Josh's Aug 25 comment or has their been any alternate solutions explored on the s390 mailing list? Thanks.

@sumanthkorikkar
Copy link
Contributor Author

Hi Joe, Josh,

I tried Josh Poimboeuf patch series on latest branch and added minor fixup on it.
It is currently under internal review. Will send the rebased Josh patch series to you both soon for your valuable feedback. Thank you Josh, Joe.

@joe-lawrence
Copy link
Contributor

Hi @sumanthkorikkar if you have a WIP, rebased version of the patch for 6.4 would you mind attaching here.. we can throw it into our internal tests at least to give it some runtime and maybe find subsequent kpatch-build issues for s390x. Thanks.

@sumanthkorikkar
Copy link
Contributor Author

sumanthkorikkar commented Jul 13, 2023

Hi Joe, Josh,

Attached rebased Josh-Poimboeuf patch series (master rebase) with fixup.
Rebased it to master from the following source: https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=s390
Seems to work for gcc.
clang has few concerns which is under discussion.

Let me know, if this patch works for you.
Josh-Poimboeuf_series_emit_relocs_rebase_fixup.patch.txt

Thank you Joe & Josh

@github-actions
Copy link

This issue has been open for 30 days with no activity and no assignee. It will be closed in 7 days unless a comment is added.

@github-actions github-actions bot added the stale label Aug 13, 2023
@sumanthkorikkar
Copy link
Contributor Author

In progress.

@github-actions github-actions bot removed the stale label Aug 14, 2023
@github-actions
Copy link

This issue has been open for 30 days with no activity and no assignee. It will be closed in 7 days unless a comment is added.

@github-actions github-actions bot added the stale label Sep 13, 2023
@jpoimboe jpoimboe removed the stale label Sep 13, 2023
@jpoimboe jpoimboe self-assigned this Sep 13, 2023
@jpoimboe
Copy link
Member

@sumanthkorikkar sorry for not being communicative on this issue, it has been a busy time. I will also be out for another two weeks, feel free to keep pinging me after that :-)

@sumanthkorikkar
Copy link
Contributor Author

ok, Thanks Josh. Will do so.

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Feb 21, 2024
On s390, currently kernel uses the '-fPIE' compiler flag for compiling
vmlinux.  This has a few problems:

  - It uses dynamic symbols (.dynsym), for which the linker refuses to
    allow more than 64k sections.  This can break features which use
    '-ffunction-sections' and '-fdata-sections', including kpatch-build
    [1] and Function Granular KASLR.

  - It unnecessarily uses GOT relocations, adding an extra layer of
    indirection for many memory accesses.

Instead of using '-fPIE', resolve all the relocations at link time and
then manually adjust any absolute relocations (R_390_64) during boot.

This is done by first telling the linker to preserve all relocations
during the vmlinux link.  (Note this is harmless: they are later
stripped in the vmlinux.bin link.)

Then use the 'relocs' tool to find all absolute relocations (R_390_64)
which apply to allocatable sections.  The offsets of those relocations
are saved in a special section which is then used to adjust the
relocations during boot.

(Note: For some reason, Clang occasionally creates a GOT reference, even
without '-fPIE'.  So Clang-compiled kernels have a GOT, which needs to
be adjusted.)

On my mostly-defconfig kernel, this reduces kernel text size by ~1.3%.

[1] dynup/kpatch#1284
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html

Compiler consideration:

Gcc recently implemented an optimization [2] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [3] in recent gcc versions. This option has to be used with
future gcc versions.

Older Clang lacks support for handling unaligned symbols generated
by kernel linker scripts when the kernel is built without -fPIE. However,
future versions of Clang will include support for the -munaligned-symbols
option. When the support is unavailable, compile the kernel with -fPIE
to maintain the existing behavior.

In addition to it:
move vmlinux.relocs to safe relocation

When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire
uncompressed vmlinux.bin is positioned in the bzImage decompressor
image at the default kernel LMA of 0x100000, enabling it to be executed
in-place. However, the size of .vmlinux.relocs could be large enough to
cause an overlap with the uncompressed kernel at the address 0x100000.
To address this issue, .vmlinux.relocs is positioned after the
.rodata.compressed in the bzImage. Nevertheless, in this configuration,
vmlinux.relocs will overlap with the .bss section of vmlinux.bin. To
overcome that, move vmlinux.relocs to a safe location before clearing
.bss and handling relocs.

Compile warning fix from Sumanth Korikkar:

When kernel is built with CONFIG_LD_ORPHAN_WARN and -fno-PIE, there are
several warnings:

ld: warning: orphan section `.rela.iplt' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.head.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.init.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.rodata.cst8' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'

Orphan sections are sections that exist in an object file but don't have
a corresponding output section in the final executable. ld raises a
warning when it identifies such sections.

Eliminate the warning by placing all .rela orphan sections in .rela.dyn
and raise an error when size of .rela.dyn is greater than zero. i.e.
Dont just neglect orphan sections.

This is similar to adjustment performed in x86, where kernel is built
with -fno-PIE.
commit 5354e84 ("x86/build: Add asserts for unwanted sections")

[[email protected]: rebased Josh Poimboeuf patches and move
 vmlinux.relocs to safe location]
[[email protected]: merged compile warning fix from Sumanth]
Tested-by: Sumanth Korikkar <[email protected]>
Acked-by: Vasily Gorbik <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Sumanth Korikkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants