Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zookeeper sequence node overflow causes perpetual re-election #730

Open
angxiang opened this issue Oct 9, 2023 · 7 comments · May be fixed by #731
Open

Zookeeper sequence node overflow causes perpetual re-election #730

angxiang opened this issue Oct 9, 2023 · 7 comments · May be fixed by #731

Comments

@angxiang
Copy link

angxiang commented Oct 9, 2023

Expected Behavior

When the sequence zookeeper node counter overflows (as documented in: https://zookeeper.apache.org/doc/r3.6.0/zookeeperProgrammers.html#Sequence+Nodes+--+Unique+Naming), it would be expected that there is one re-election from 2147483647 to -2147483648. Afterwards, incrementing in the negative space should be stable again and not causing unexpected re-elections.

Actual Behavior

When overflowing and reaching the negative space starting from -2147483648, we observed perpetual leader re-elections. With each new added sequence number, the new instance would register itself as the new leader.

When digging into the kazoo code, it seems to be a typing issue, as the sequence is matched using a regex but not cast to an integer, meaning checking for the smallest sequence number is a String comparison, i.e. "-2147483648" > "-2147483647" is True.

Snippet to Reproduce the Problem

N/A

Logs with logging in DEBUG mode

N/A

Specifications

  • Kazoo version: 2.9
  • Zookeeper version: 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f
  • Python version: 3.11
  • OS: SLES 12 SP2
angxiang added a commit to angxiang/kazoo that referenced this issue Oct 9, 2023
…rflow

Once sequence node is incremented beyond 2147483647, the sequence
overflows into the negative space. When getting predecessors, cast
the node sequence from String to int before comparing.

Closes python-zk#730
angxiang added a commit to angxiang/kazoo that referenced this issue Oct 9, 2023
…rflow

Once sequence node is incremented beyond 2147483647, the sequence
overflows into the negative space. When getting predecessors, cast
the node sequence from String to int before comparing.

Closes python-zk#730
angxiang added a commit to angxiang/kazoo that referenced this issue Oct 9, 2023
Once sequence node is incremented beyond 2147483647, the sequence
overflows into the negative space. When getting predecessors, cast
the node sequence from String to int before comparing.

Closes python-zk#730
angxiang added a commit to angxiang/kazoo that referenced this issue Oct 9, 2023
Once sequence node is incremented beyond 2147483647, the sequence
overflows into the negative space. When getting predecessors, cast
the node sequence from String to int before comparing.

Closes python-zk#730
@angxiang
Copy link
Author

angxiang commented Oct 9, 2023

I have created a pull request but it is currently marked with failing checks due to:

  • failing coverage: 96.65% (-0.09%) compared to 92bd0c2 - strange because the change is not adding any new lines, just modifying two existing ones
  • an in progress Test Python 3.10, ZK 3.4.14, which looks stuck at the time of writing this, as the test with tox has not been producing any output since 1h 20min

Please have a look at the PR anyway :) Thanks!

@ceache
Copy link
Contributor

ceache commented Oct 9, 2023 via email

@angxiang
Copy link
Author

angxiang commented Oct 9, 2023

Hi @ceache ,

thank you for the quick reply! I have a question regarding:

once the sequence numbers are negative, the
new contenders would always "win" since they would have a smaller integer
number than the last positive sequence

Please correct me if I am wrong, but this should only happen exactly once when the wrap around happens. Our use case in which we have observed the issue is electing a new leader instance. We hit the overflow case and observed that when >1 instance was running, they were stuck in a perpetual loop of "new leader is forcibly elected -> old leader is killed -> new instance comes online and registers as new leader".

The PR I opened should work as a bug fix for our case, because we would then have exactly one forced re-election when wrapping around from 2147483647 to -2147483648. But after that, with every increment the new negative sequence number only gets larger, and the smallest sequence number still remains at -2147483648. In a concrete example min(2147483646, 2147483647, -2147483648, -2147483647) = -2147483648.

Am I missing a different use case in which my change could become an issue?

@ceache
Copy link
Contributor

ceache commented Oct 9, 2023 via email

@angxiang
Copy link
Author

angxiang commented Oct 9, 2023

Hi,

yes, sorry for the confusing terminology, I should be saying "lock owner" 👍 I kept saying "leader election" because we use it to elect a leader in our system.
I think I understand what you mean, we may be looking at the problem from different angles?

If I am not mistaken, what you are saying is that my change does not produce the correct result in this critical section. With the example you gave above: A = 2147483646, B = 2147483647 and C = -2147483648, the correct owner of the lock is A, but the actual owner is C. Even continuing with this thought, if A is killed and there is a new instance A_new trying to acquire the lock, it would be A_new = -2147483647 and the lock owner would still be C, which would still be an incorrect state because now B should be the owner.

However my approach is from a user perspective, because my goal was to stabilise the whole system's state once this overflow is reached. Without the proposed change, the lock owner is perpetually changed as with every single new instance, the lock is handed over, since we are comparing Strings. This caused our system to become unusable a few days ago until we deleted and recreated the znode, thus resetting the sequence. The change would prevent what we encountered and prevent a major incident, even though the result of who gets to acquire the lock is technically incorrect while in this critical section.

I agree that the serial number arithmetics which you have linked would probably be the better and more correct solution. From our side, this is not a critical fix anymore as it took us 6 years to reach the point of overflow, so I expect that we will have another few years again before it becomes a problem. Nevertheless it could be a good intermediate fix, depending on how long / difficult it is to implement the sequence number arithmetics, as it does improve the situation, does not worsen it in any way and may prevent a major incident in someone else's system.

Best regards,
Angie

PS: There will be a delay in response until tomorrow as it is already past EOB in Central Europe, sorry about that! If there is a new reply, I will get back to you tomorrow 👋
PPS: issue #606 has no linked change, but I would be interested in seeing your draft! Is there a link to it?

@angxiang
Copy link
Author

Hi,

I have taken another look at this issue and have a few more questions.

A gets there first, the sequence number is 2147483646, and he obtains the
lock (enters critical section).
B, in turn, tries to acquire the lock, its sequence number is 2147483647,
it can't since A has a lower sequence number so it waits.
C also tries, its sequence number is now, -2147483648, it thinks it has the
lowest number so it "obtains" the lock.
We now have A and C in the critical section we were trying to protect.
[...]
But the issue, even if it is a "once in a wrap" issue, remains with two
contenders entering the critical section.

Just to confirm: Your issue with my change casting the sequence numbers to an integer is that the result as to who gets to acquire the lock is wrong: The sequence number that is still at 2147483646 should be the lock owner but -2147483648 is the actual lock owner. Or is there another issue that I am overlooking?

I have read the Sequence Number Arithmetic article that you linked, but if I am not mistaken, it doesn't describe a full solution either; it ends with:

All sequence number arithmetic must deal with "wrapping" of sequence numbers; the number 2N−1 is equidistant in both directions, in RFC 1982 sequence number terms. In our math, they are both considered to be "less than" each other:

distance1 = (signed)(0x8000 - 0x0)    == (signed)0x8000 == -32768 < 0
distance2 = (signed)(0x0    - 0x8000) == (signed)0x8000 == -32768 < 0

This is obviously true for any two sequence numbers with distance of 0x8000 between them.

And then does not propose a solution for this edge case. So in any case, even with sequence number arithmetics, we still have an edge case that produces strange results.

When looking at it from a different angle:

I have a proposal based on this comment taken from kazoo/recipe/lock.py

# Only consider contenders with a smaller sequence number.
# A contender with a smaller sequence number has a higher
# priority.

With the integer cast in my PR, this statement is mathematically correct when looking at the number in the lock contender name. From a human sense standpoint, I would expect 2147483647 to be followed by 2147483648, but this is not the case for integers constrained by 32 bit. Taking the string number and then casting to int still fulfils the "contract" stated in this comment.

What do you think?

Best regards,
Angie

@angxiang
Copy link
Author

angxiang commented Oct 8, 2024

Hello,

we have encountered this issue again. If my understanding is correct, the proposed pull request is not being accepted as it is behaving incorrectly during the overflow period, though it would be stable.

The outlined solution in https://en.wikipedia.org/wiki/Serial_number_arithmetic mentions a remaining edge case regarding equidistant numbers, as well as a difference in the implementation depending on the machine's integer size. We're unsure how zookeeper behaves on different machines (i.e. 32-bit vs 64-bit). Is there already a fix in the works on your end? You mentioned having a draft change in relation to #606, but it isn't linked there.

Best regards,
Angie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants