Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should race condition be added as a reason for a signature counter not increasing? #2172

Open
zacknewman opened this issue Oct 1, 2024 · 5 comments · May be fixed by #2176
Open

Should race condition be added as a reason for a signature counter not increasing? #2172

zacknewman opened this issue Oct 1, 2024 · 5 comments · May be fixed by #2176
Assignees
Milestone

Comments

@zacknewman
Copy link
Contributor

zacknewman commented Oct 1, 2024

Currently § 6.1.1. only states the following as reasons for why a signature counter does not increase:

If either is non-zero, and the new signCount value is less than or equal to the stored value, a cloned authenticator may exist, or the authenticator may be malfunctioning.

However it's possible an older response—from the perspective of the authenticator—is processed after a newer one since there is no guarantee that data that is sent from the client before other data sent from the same client will be received let alone processed before the other. This primarily affects passkey flows and not second-factor ones; since for the latter, RPs can either force at most one active ceremony per credential or use the signCount at the time the ceremony began to compare to.

Is this deemed too unlikely to warrant mention?

As an explicit example:

User starts a passkey authentication ceremony and sends the updated signature counter, C1. Same user starts another passkey authentication ceremony and sends a newer signature counter, C2. Before the server receives the response containing C1, it receives and processes the response containing C2. Finally the server receives and processes the response containing C1 which is less than the current counter, C2.

There are many reasons for why such a thing happens: BGP routes messed up, wonky load balancer, black hole causing time dilation1. The point is that there are technically legitimate reasons for a counter not increasing.

Footnotes

  1. I'm obviously being facetious about this one

@Firehed
Copy link

Firehed commented Oct 1, 2024

I see no immediate harm in pointing it out, but I'm not sure how actionable it would be for any of the involved parties. Trying to differentiate it from a cloned or malfunctioning authenticator could well be impossible. It might be doable with associating counter data with challenges (I think this is what you are getting at) but such an implementation may be highly error-prone and someone that has cloned an authenticator may be able to exploit this. And even still, I think the appropriate thing to do is fail the ceremony, as you would have done previously.

In your example, the majority case (outside of application bugs) user experience would likely be "my sign-in request is hanging so I'll try again", which would tend to result in the C1 request/response getting ignored or aborted by the client - though I suppose it could set off inappropriate some alarm bells on the RP side.

This primarily affects passkey flows and not second-factor ones; since for the latter, RPs can either force at most one active ceremony per credential or use the signCount at the time the ceremony began to compare to.

Can you help me understand how this would differ in practice? For conditional flows, the fact that the request could have started minutes or even hours prior to response processing shouldn't have incremented the counter until the user actually approves the request (and if that's not the case, I'd argue the authenticator is malfunctioning). I think this only creates problems if you're doing counter/challenge associations - effectively, trying to allow a counter rollback to go through under certain scenarios makes a common flow more likely to run into this problem in the first place.

So after thinking it through a bit, my feeling is "this is unlikely enough that it's safe to omit", but "call it out but still recommend failing the ceremony" also seems fine to me. It's also completely possible I'm missing something obvious!

@zacknewman
Copy link
Contributor Author

zacknewman commented Oct 1, 2024

I see no immediate harm in pointing it out, but I'm not sure how actionable it would be for any of the involved parties. Trying to differentiate it from a cloned or malfunctioning authenticator could well be impossible.

Indeed. I was not trying to imply this was actionable; merely stating non-malicious reasons for this scenario to occur. That same section states:

Detecting a signature counter mismatch does not indicate whether the current operation was performed by a cloned authenticator or the original authenticator. Relying Parties should address this situation appropriately relative to their individual situations, i.e., their risk tolerance.

so an RP may want to account for these legitimate reasons in their risk tolerance based on whatever probabilities they ascribe.

It might be doable with associating counter data with challenges (I think this is what you are getting at)

That is what I am getting at, and why I stated such a thing would only be possible for "second-factor" flows (i.e., more accurately, non-discoverable requests).

but such an implementation may be highly error-prone and someone that has cloned an authenticator may be able to exploit this.

A careful RP could make this relatively error free. Depending on how the RP achieves this, a cloned authenticator could exploit this; however a "short" timeout makes this less of an issue.

And even still, I think the appropriate thing to do is fail the ceremony, as you would have done previously.

Agreed. Again, I was not implying anything with this issue. I was merely pointing out "legitimate" reasons for a signature counter to not increase. As mentioned earlier, this is likely not actionable; therefore I would indeed fail the ceremony. All a user would have to do is re-try.

In your example, the majority case (outside of application bugs) user experience would likely be "my sign-in request is hanging so I'll try again", which would tend to result in the C1 request/response getting ignored or aborted by the client - though I suppose it could set off inappropriate some alarm bells on the RP side.

Yep.

Can you help me understand how this would differ in practice? For conditional flows, the fact that the request could have started minutes or even hours prior to response processing shouldn't have incremented the counter until the user actually approves the request (and if that's not the case, I'd argue the authenticator is malfunctioning). I think this only creates problems if you're doing counter/challenge associations - effectively, trying to allow a counter rollback to go through under certain scenarios makes a common flow more likely to run into this problem in the first place.

I'm guessing I shouldn't have used the adverb "primarily". It indeed may be the case that most RPs that use non-discoverable requests (i.e., relying on a non-empty PublicKeyCredentialRequestOptions.allowCredentials) are equally susceptible to this. What I was trying to say was that it's at least possible for an RP that uses non-discoverable requests to combat this. A couple of ways are the following:

  • Allow at most one active ceremony per Credential ID. This can be achieved several ways (e.g., a bit flag saved on the database which is using serializable transactions and perhaps additional exclusive locks to ensure a read does not occur while an update does). The RP only populates allowCredentials with the PublicKeyCredentialDescriptors that aren't associated with an active ceremony. This is the most foolproof but comes at the cost of UX since users should be able to start concurrent ceremonies with the same Credential ID.
  • As stated, the RP could associate the signature counter with the challenge/ceremony. The RP would allow a user to authenticate so long as the counter is larger than the counter at the time the ceremony started. Additionally the RP would not update the counter unless the saved counter is strictly less. Like you said, timeout duration is correlated with cloned authenticator risk.

@Firehed
Copy link

Firehed commented Oct 1, 2024

Gotcha, thanks for all of the clarification! Under the context of "be aware this is a non-malicious scenario where it can occur, but probably still let it fail" this seems like a fine addition.

I do fear that if an RP attempts to permit such requests to go through anyway, a meddling party (though not necessarily one that could MITM things - once that's in play, basically all bets are off) might be able to create some sort of side-channel attack if the RP tries to detect and allow this. E.g. a bad actor on the same network could cause enough traffic to get request C1 to hang, then attempt some sort of replay attack.

To be clear, this fear is entirely based on a gut reaction, not any sort of actual cryptographic assessment. If challenges have a proper timeout, it seems entirely infeasible that the bad actor could do anything in the necessary time window (without nation-state resources, at least).

@nadalin nadalin added this to the L3-WD-02 milestone Oct 2, 2024
@zacknewman
Copy link
Contributor Author

zacknewman commented Oct 2, 2024

In your example, the majority case (outside of application bugs) user experience would likely be "my sign-in request is hanging so I'll try again", which would tend to result in the C1 request/response getting ignored or aborted by the client - though I suppose it could set off inappropriate some alarm bells on the RP side.

I think the most likely "real" scenario—don't misconstrue this as me stating this is "likely" in absolute terms—is using a roaming authenticator (e.g., a USB security key). I plug the USB into my mobile device and authenticate. Before waiting for the process to complete, I unplug it and plug it into my laptop where I authenticate. For reasons already mentioned in addition to weaker resources on the phone, congested mobile network, etc., the authentication succeeds on the laptop first. Shortly after, authentication finishes on the mobile device. Most users would probably wait for the process to complete before removing the authenticator mind you, but it's an example. Perhaps I am self-hosting a password manager and my laptop is on the same LAN; however my mobile device is using data slowing the connection especially since it will likely encounter multiple firewalls that my laptop bypasses and is communicating within a VPN tunnel further slowing traffic.

sbweeden added a commit to sbweeden/webauthn that referenced this issue Oct 3, 2024
@sbweeden
Copy link
Contributor

sbweeden commented Oct 3, 2024

Submitted PR as discussed in WG call on 2024-10-02.
@zacknewman given you opened this issue, hope it covers what you were thinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants