Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] kernel oops that happens during provison are marked as panic in the last task #138

Open
bgoncalv opened this issue Aug 6, 2021 · 8 comments
Labels

Comments

@bgoncalv
Copy link

bgoncalv commented Aug 6, 2021

Describe the bug
This was initially reported on https://bugzilla.redhat.com/show_bug.cgi?id=1623729

In case there is kernel oops during provision this oops is marked as a kernel panic in the last beaker task. This causes confusion to understand why that task is shown as panic.

Version-Release number
28.2

To Reproduce
Steps to reproduce the behavior:

  1. Provision an OS that triggers kernel oops at provision, but the provision completes successfully
  2. Run several tasks
  3. The last task will show as panic, because of the oops that happened in the provisioning.

Actual behavior
Last task show as panic

Expected behavior
Last task shouldn't show as panic

@mdujava
Copy link
Member

mdujava commented Aug 9, 2021

Panic detection can be configured by PANIC_REGEX, it's set on LC basis

PANIC_REGEX = "Kernel panic|Oops[\s:[]|general protection fault(?! ip:)|general protection handler: wrong gs|\(XEN\) Panic|kernel BUG at .+:[0-9]+!"

and by default it includes Oops.

You can disable panic detection all together by including <watchdog panic="None"/> in your recipe.

@bgoncalv
Copy link
Author

bgoncalv commented Aug 9, 2021

Note, I don't want to ignore Oops in general. My only problem are with Oops that happened on provision that causes the last task in the recipe to be marked as Panic, this task shouldn't be marked as panic as there was no Oops when the task executed.

@StykMartin
Copy link
Contributor

@bgoncalv is right. I'm aware of this behavior. The problem is that we can't really mark Panic during installation at this moment, therefore most of the time it lands on the first task. But, it may happen that we failed to detect it at first therefore it will land on another (the one which was running when we found Panic).

Yeah. This needs to remain open and we will need to find a better way how to manage this.

@bgoncalv
Copy link
Author

bgoncalv commented Aug 9, 2021

Thanks @StykMartin, do you have any suggestion how can we workaround this? For example, we could create a dummy task to trigger this panic detection and run it as first task. Is there a way for a task force to run the detection?

@StykMartin
Copy link
Contributor

Hello @bgoncalv. My answer never ended on GitHub. I'm sorry about that. So the situation is quite a bit more complicated, but I feel like we can make some compromise if you still need it.

So panic detection is running on background in each region. Then it is proxied to the main server to process. The main server will check tasks assigned to given recipe and iterate over. If iteration is exhausted we will mark last item - because that's what will end up in variable.

If you have any idea how we can help you @bgoncalv to make it saner feel free to shoot.

@bgoncalv
Copy link
Author

@StykMartin thanks for the reply. Indeed that feature would be helpful for us. If the panic detection will assign panic detected during provision always in the last task of the recipe that would help us indeed.

@StykMartin
Copy link
Contributor

This is happening at this moment. All panics are assigned to the last task if there is no running task. So basically if we didn't register any start (for example provisioning) we will always mark last.
However, there is a catch. If restraint will report n-2 task as finished and n-1 is not reported as started then we may report panic in this window to last as well. But changes are quite small that restraint will crash in moments like this.

@StykMartin
Copy link
Contributor

I would suggest you put dummy tasks at the end and then collect panics from there.

I will try to redesign this feature so we can report panics during provisioning to dedicated space and not marking tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants