Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed exception level for new threads #232

Merged
merged 3 commits into from
Mar 29, 2017

Conversation

rafalcieslak
Copy link
Contributor

@rafalcieslak rafalcieslak commented Mar 27, 2017

This is a tiny change, but it fixes a huge bug.

To understand it, please have a look at the ending of kern_exc_leave:

# [...] Restoring state

# ... and finally status register (assume EXL bit is set)
LOAD_REG($k1, SR, $sp)
mtc0    $k1, C0_STATUS

# Restore stack frame.
move    $sp, $k0

sync
eret

If the assumption mentioned in comment is not met, that is: EXL bit of the status register stored in saved context is not set, then between the mtc0 instruction and eret an interrupt may arrive. This is unacceptable, because it would overwrite the just-restored context (in particular, it overwrites EPC address with the address of the move instruction, resulting in a endless loop over the last three instructions). That seems unlikely to happen, so here's a proof:

b3925e8, OVPsimp: test=all seed=823261445 repeat=5

EDIT: You can also see the problem in this build for #234.

Now typically the stored SR has EXL set, because usually we store the SR value in kern_exc_enter. But when a new context is prepared in ctx_init, the contents of SR are based on current SR state - so naturally it is likely the EXL bit is 0.

The reason we've not observed this issue until now is because this problem may only occur on the very first ctx_switch -> kern_exc_leave for a new thread, and it requires the interrupt to come at a very precise moment - which happens very rarely. So it only goes to show how much have our testing procedures improved that we are able to catch such an obscure problem.

This fix ensures SR EXL bit is set for all fresh contexts.

@cahirwpz
Copy link
Owner

Indeed capturing this bug shows how adequate is our testing framework, kudos for you!

While you identified and fixed a serious bug, your solution does not prevent some others weird scenarios to happen. Let's set up status register explicitly, instead of taking it's value from the caller context!

I looked briefly on C0.Status register documentation. I think we should set it up as follows:
{KSU = 0, ERL = 0, EXL = 1, IE = 1, NMI = 0, SR = 0, CU[0..3] = 0} ... and others zero as well. Interrupt configuration i.e. IM0-IM7 and BEV should be taken from caller's context. What do you think?

@cahirwpz cahirwpz merged commit b7ab8a4 into cahirwpz:master Mar 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants