Skip to content

Conversation

@etsal
Copy link
Contributor

@etsal etsal commented Dec 4, 2025

We currently attach kprobes along with the rest of the BPF programs in the scx_layered skeleton, but that can lead to the kprobes being attached before the struct_ops of the scheduler itself. This in turns introduces a window of time during which tasks that use the GPU may trigger the krpobes and attempt to retrieve their task context, before said context is created in .init(). This in turn causes spurious lookup failures that falsely imply the kprobe logic is flawed and prevents us from putting in checks to catch any actual latent bugs.

Defer kprobe attach until after the rest of the progs in the skeleton are ready. First, introduce a variant of the kprobe_enable function in scx_utils that allows the kprobe to be loaded but not attached. Then, modify scx_layered to only trigger kprobe attach after the scheduler is up-and-running.

Copy link
Contributor

@hodgesds hodgesds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, linter might need to be appeased

@etsal etsal force-pushed the layered-kprobe-race branch from ff67854 to 00ac500 Compare December 4, 2025 22:02
@etsal etsal force-pushed the layered-kprobe-race branch from 00ac500 to b005d33 Compare December 4, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants