Skip to content

DNM: debugging only #26596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Jul 9, 2025

Does this PR introduce a user-facing change?


@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 9, 2025
Copy link
Contributor

openshift-ci bot commented Jul 9, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Jul 9, 2025
Copy link
Contributor

openshift-ci bot commented Jul 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2025
@Luap99 Luap99 added the No New Tests Allow PR to proceed without adding regression tests label Jul 9, 2025
@Luap99 Luap99 force-pushed the ebpf-debug-output-flake branch 6 times, most recently from b58a100 to a0e53e4 Compare July 10, 2025 17:54
Luap99 added 6 commits July 10, 2025 20:47
io.ReadFull() already returns ErrUnexpectedEOF if there was a short read
so this check is redundant and can be dropped.

Signed-off-by: Paul Holzinger <[email protected]>
DemuxFrame() already returns a byte slice with the correct length so
this makes it simpler and the caller does not need to check this at all.

Signed-off-by: Paul Holzinger <[email protected]>
Do not ignore ErrUnexpectedEOF from DemuxHeader(), if we fail to parse
the header there must have been a clear protocal error between client
and server which should be reported and not silently ignored. I wonder
ig this might explain why we have missing remote exec/attach output
without any error, it is possible we are eating some internal errors due
this.

Commit ba8eba8 added the ErrUnexpectedEOF check but without any
explanation why that would be needed. The tests from that commit pass
without it locally but not in CI. With some debugging best I found the
issue is actually a test bug. The channel is not consumed until it is
closed which means the main test exists before the log reading goroutine
is done. And if the main test exists the first step it does is to kill
the podman service which then can trigger the ErrUnexpectedEOF server on
the still open http connection and thus the test case failed there.

Signed-off-by: Paul Holzinger <[email protected]>
ebpf hook to catch and log the writes to try to see where they disapear.

Signed-off-by: Paul Holzinger <[email protected]>
We don't need a tty here and then we are able to check for the
exact output.

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99 Luap99 force-pushed the ebpf-debug-output-flake branch from a0e53e4 to c7d6e11 Compare July 10, 2025 18:47
@Luap99
Copy link
Member Author

Luap99 commented Jul 11, 2025

Ok I finally got the missing output flake, based on my ebpf script podman-remote never sees that output but the server did read it from conmon so at least I can be sure now that conmon did not loose the output somehow.

conmon 06:52:52.165469 4311     4302     exe          30 random-2-8ZjqXWQEm6lOHIdzrb9o
...
exit   06:52:52.199745 71666    71663    podman-remote 0 0

So somewhere between server writing and client reading the output goes missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. No New Tests Allow PR to proceed without adding regression tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant