-
Notifications
You must be signed in to change notification settings - Fork 1
powerpc: protect cpu offlining by RCU offline lock #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rcu-dev
Are you sure you want to change the base?
Conversation
Master branch: 2f91146 |
2d20ef4
to
80fc02e
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
Master branch: 80fc02e |
d3be018
to
424f7dd
Compare
d03a7bc
to
80fc02e
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
Master branch: 80fc02e |
424f7dd
to
7edcef4
Compare
Master branch: 80fc02e Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/rcu/list/?series=676744
conflict:
|
80fc02e
to
32fad12
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
Master branch: 32fad12 |
7edcef4
to
55fc136
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
Master branch: 32fad12 |
55fc136
to
98123ae
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
Master branch: 32fad12 |
98123ae
to
876f407
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
Master branch: 32fad12 |
876f407
to
240daf4
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
0855818
to
5de7218
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
5de7218
to
5996f16
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
5996f16
to
8c85071
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
8c85071
to
e0308f0
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
e0308f0
to
94ae7d3
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
94ae7d3
to
7d5f0ac
Compare
When going through the lazy-rcu work, I noticed that rcu_barrier_entrain() does not really wake up the rcuog GP thread in any path after entraining. This means it is possible the GP thread is not awakened soon (say there were no CBs in the cblist after entraining time). Further, nothing appears to be calling the rcu_barrier callback directly in the case the ->cblist was empty which means if the IPI gets delayed enough to make the ->cblist empty and it turns out to be the last CPU holding, then nothing calls completes rcu_state.barrier_completion. Fix both these issues. A note on the wakeup, there are 3 cases AFAICS after the call to rcu_nocb_flush_bypass(): 1. The rdp->cblist has pending CBs. 2. The rdp->cblist has all done CBs. 3. The rdp->cblist has no CBs at all (say the IPI took a long time to arrive and some other path dequeued them in the meanwhile). For #3, entraining a CB is not needed and we should bail. For #1 and needed. But for #2 it is needed. Signed-off-by: Joel Fernandes (Google) <[email protected]>
7d5f0ac
to
46dc483
Compare
46dc483
to
b807111
Compare
87e7989
to
fa70e60
Compare
b2ce6cd
to
53bd5c6
Compare
During the cpu offlining, the sub functions of xive_teardown_cpu will call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will travel RCU protected list, so "WARNING: suspicious RCU usage" will be triggered. Try to protect cpu offlining by RCU offline lock. Tested on PPC VM of Open Source Lab of Oregon State University. (Each round of tests takes about 19 hours to finish) Test results show that although "WARNING: suspicious RCU usage" has gone, but there are more "BUG: soft lockup" reports than the original kernel (10 vs 6), so I add a [RFC] to my subject line. Signed-off-by: Zhouyi Zhou <[email protected]>
53bd5c6
to
8697218
Compare
Pull request for series with
subject: powerpc: protect cpu offlining by RCU offline lock
version: 1
url: https://patchwork.kernel.org/project/rcu/list/?series=676744