Skip to content

lkl: Define symbols for string utilities #587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 6, 2025
Merged

Conversation

lrh2000
Copy link

@lrh2000 lrh2000 commented Apr 3, 2025

The generated lkl.o currently still directly references string utility symbols (e.g., memcpy and memset). This is because some kernel sources use things like __builtin_memcpy and the compiler generates a direct call to the memcpy method.

$ ld -o lkl lkl.o
[ .. ]
lkl.o: in function `virtblk_probe':
drivers/block/virtio_blk.c:1367:(.text+0xa6c6be): undefined reference to `memset'
lkl.o: in function `virtblk_name_format':
drivers/block/virtio_blk.c:1125:(.text+0xa6c97a): undefined reference to `memmove'
drivers/block/virtio_blk.c:1126:(.text+0xa6c99c): undefined reference to `memcpy'
[ .. ]

I suspect this isn't really expected, since LKL shouldn't reference the memcpy symbol directly, it should try to do so via host_ops first (i.e., make a call to lkl_ops->memcpy).

The reason is that we claim __HAVE_ARCH_MEMCPY, but we don't actually provide the memcpy symbol. We should do that. This commit follows much the same approach as the x86 architecture, see arch/x86/lib/memcpy_32.c for the reference.

@lrh2000
Copy link
Author

lrh2000 commented Apr 3, 2025

If this PR can be somehow accepted, we'll leave only a few undefined symbols in lkl.o:

> ld -o lkl lkl.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
ld: lkl.o: in function `thread_sched_jb':
arch/lkl/include/asm/sched.h:21:(.text+0x2ab3e): undefined reference to `lkl_bug'
ld: lkl.o: in function `threads_init':
arch/lkl/kernel/threads.c:204:(.text+0x2ae41): undefined reference to `lkl_printf'
ld: lkl.o: in function `thread_sched_jb':
arch/lkl/include/asm/sched.h:21:(.text+0x2be8a): undefined reference to `lkl_bug'
ld: lkl.o: in function `lkl_cpu_change_owner':
arch/lkl/kernel/cpu.c:90:(.text+0x2c3b8): undefined reference to `lkl_bug'
ld: lkl.o: in function `lkl_cpu_put':
arch/lkl/kernel/cpu.c:119:(.text+0x2c538): undefined reference to `lkl_bug'
ld: lkl.o: in function `thread_sched_jb':
arch/lkl/include/asm/sched.h:21:(.text+0x2c7b1): undefined reference to `lkl_bug'
ld: lkl.o: in function `lkl_cpu_put':
arch/lkl/kernel/cpu.c:131:(.text+0x2c7cb): undefined reference to `lkl_bug'
ld: lkl.o: in function `lkl_cleanup':
arch/lkl/kernel/init.c:15:(.text+0x2caca): undefined reference to `lkl_printf'
ld: lkl.o: in function `kasan_init':
arch/lkl/mm/kasan.c:30:(.text+0x2ce98): undefined reference to `lkl_printf'

lkl_printf and lkl_bug are currently defined in tools/lkl/lib/utils.c. We can solve this problem by moving the definitions to arch/lkl. Existing code in tools/lkl should be able to call printf directly from stdio.h instead of relying on the inefficient lkl_printf. I wonder if there is a reason for not doing this? (EDIT: host_ops->print can be something like lkl_test_log, so I guess lkl_printf exists for a reason. But lkl_printf can be exposed from arch/lkl and used in tools/lkl, which seems to make more sense. I will file a PR later).

My use case is to use lkl.o on a bare machine. There is no libc or anything like that. So having zero undefined symbols should help a lot.

Thank you for taking the time to read and review this PR. I'd love to hear your feedback.

{
return __memcpy(dest, src, count);
}
EXPORT_SYMBOL(memcpy);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this redirect already happen via the preprocessor using the #defines at https://github.com/lkl/linux/blob/master/arch/lkl/include/asm/string.h#L68 ? Are you building with or without CONFIG_KASAN?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this redirect already happen via the preprocessor using the #defines at https://github.com/lkl/linux/blob/master/arch/lkl/include/asm/string.h#L68 ?

There is code that uses __builtin_memcpy directly. Our #define memcpy will have no effect on such usage. The compiler will generate direct calls to memcpy, which is an undefined symbol because we never define it.

__builtin_memcpy(dest, src, len);

Are you building with or without CONFIG_KASAN?

I built without CONFIG_KASAN and found these undefined symbols.

My patch was tested with and without CONFIG_KASAN. It should work either way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I wasn't familiar with how the compiler handled __builtin_memcpy. This change looks reasonable to me, although can't we now drop those #defines?

Copy link
Author

@lrh2000 lrh2000 Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review too!

can't we now drop those #defines?

I think it's okay. The only difference is that if we use static inline void *__memcpy(..) and #define memcpy __memcpy then most of memcpy will be inline, but if we have extern void *memcpy(..) then memcpy won't be inline.

I don't think the difference really matters here. I chose an implementation that requires minimal changes (and preserves the original inline semantics). But if we prefer to move the __memcpy implementation from string.h to string.c and then remove the #define altogether, I think that's perfectly fine and will update the patch accordingly.

{
return __memcpy(dest, src, count);
}
EXPORT_SYMBOL(memcpy);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I wasn't familiar with how the compiler handled __builtin_memcpy. This change looks reasonable to me, although can't we now drop those #defines?

The generated lkl.o currently still directly references string utility
symbols (e.g., memcpy and memset). This is because some kernel sources
use things like __builtin_memcpy and the compiler generates a direct
call to the memcpy method.

	$ ld -o lkl.bin lkl.o
	[ .. ]
	lkl.o: in function `virtblk_probe':
	drivers/block/virtio_blk.c:1367:(.text+0xa6c6be): undefined reference to `memset'
	lkl.o: in function `virtblk_name_format':
	drivers/block/virtio_blk.c:1125:(.text+0xa6c97a): undefined reference to `memmove'
	drivers/block/virtio_blk.c:1126:(.text+0xa6c99c): undefined reference to `memcpy'
	[ .. ]

I suspect this isn't really expected, since LKL shouldn't reference the
memcpy symbol directly, it should try to do so via host_ops first (i.e.,
make a call to lkl_ops->memcpy).

The reason is that we claim __HAVE_ARCH_MEMCPY, but we don't actually
provide the memcpy symbol. We should do that. This commit follows much
the same approach as the x86 architecture, see arch/x86/lib/memcpy_32.c
for the reference.

Signed-off-by: Ruihan Li <[email protected]>
@lrh2000
Copy link
Author

lrh2000 commented Apr 6, 2025

v2:

v3:

  • It seems that different compiler versions used on the CI machine and my local machine cause different behavior. Make the following changes to fix the failed CI:
    #if !defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX)
    /*
    * If CONFIG_KASAN_GENERIC is on but CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX is
    * off, mm/kasan/shadow.c will define the kasan version memcpy and its friends
    * for us. We should not do anything, otherwise the symbols will conflict.
    */

Copy link
Member

@tavip tavip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @lrh2000 !

@tavip tavip merged commit b5fa7a2 into lkl:master Apr 6, 2025
14 checks passed
@lrh2000
Copy link
Author

lrh2000 commented Apr 6, 2025

Thanks for your review too @tavip !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants