-
Notifications
You must be signed in to change notification settings - Fork 128
Ack Frequency Draft-11 #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
kazuho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR. Looks great!
lib/quicly.c
Outdated
|
|
||
| /* Question: Do we want to move scheduling/sending of ack frequency frames out of this function? */ | ||
| /* TODO: register an ack callback for the ack frequency frame. | ||
| * We don't have a way to handle the ack of the ack_frequency frame */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, considering that ACK_FREQUENCY is an optimization and that it is used only when the connection is in a stable state, maybe the current logic of sending an updated value every 4 PTO is fine?
It might be possible to detect the loss of an ACK_FREQUENCY frame and reset conn->egress.ack_frequency.update_at, but the question is if it is worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it's probably not worth doing just to reset conn->egress.ack_frequency.update_at.
I'm more concerned about the logic around the Probe Timeout Period https://datatracker.ietf.org/doc/html/draft-ietf-quic-ack-frequency-11#section-7. Specifically this part:
Until the packet carrying the ACK_FREQUENCY frame is acknowledged, the endpoint MUST use the greater of the current max_ack_delay and the value that is in flight when computing the PTO period. Doing so avoids spurious PTOs that can be caused by an update that decreases the peer's max_ack_delay.
The calculated max_ack_delay is essentially the max of the current max_ack_delay and all in flight max_ack_delays. If we could handle the ack of this frame we would be able to update the current state and reduce the list of in flight states.
For now it's not necessary as we don't change this value.
lib/quicly.c
Outdated
|
|
||
| // TODO: use received frame.max_ack_delay. We currently use a constant (25 ms) and | ||
| // ignore the value set our our transport parameter (see max_ack_delay field comment). | ||
| // Is there a reason for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because I am lazy; we do not care much about performance acting as a receiver compared to being a sender.
That said, it is better to have both sides implemented efficiently, so improvements here would be appreciated, either in this PR or elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to tackle this in a separate PR
4512f0f to
d54d3dc
Compare
d54d3dc to
205f7be
Compare
kazuho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will do?
Lets the caller find the next missing packet number in quicly ranges
205f7be to
288cc27
Compare
I think so! I just pushed two changes (dropping the "fix openssl warning" commit and adding the "always maintain smallest_unreported_missing"). Easier to view with |
288cc27 to
6775e4e
Compare
|
and I removed the questions from the TODOs commit, but feel free to drop that commit as well. |
kazuho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the efforts (including answering my stupid questions)! I think this is getting really good.
I could be missing something, but how do we follow the following statement in Section 6.2: When an ack-eliciting packet is received with a packet number less than Largest Acked, this still triggers an immediate acknowledgement in an effort to avoid the packet being spuriously declared lost?
I wonder if we need to track largest_acked. And assuming that is the case we might not need to retain smallest_unreported_missing as a state, because we can calculate smallest_unreported_missing using other states?
6775e4e to
c9a1951
Compare
I didn’t follow that text. I instead followed the text from https://github.com/quicwg/ack-frequency/pull/322/files. But with a semantically equivalent change. To inline the relevant parts:
And to quote the relevant parts of the code: uint64_t largest_pn_outside_reorder_window = largest_unacked - (uint64_t)reordering_threshold;
// ...
received_pn <= largest_pn_outside_reorder_windowAlso note our definition of /**
* largest unacked ack-eliciting packet number.
* After sending an ack this remains the same and represents the last
* largest unacked ack-eliciting packet number.
*/
uint64_t largest_unacked;Instead of comparing
Therefore we will always send an immediate ack if the received pn is outside the reordering window relative to largest ack. There is still the question if comparing against Does that make sense?
We could do instead track largest_acked instead of smallest_unreported_missing, but I think it would lead to calling The reason I went with the current design is that we can avoid calling
|
|
@MarcoPolo Thank you always for sharing the insights behind the well-thought design.
I see, makes perfect sense.
I think this is mostly true. When the packet(s) with largest PNs being received are non-ack-eliciting, I think we might not send ACKs when the spec requires us to. But senders would not expect ACKs triggered by non-ack-eliciting packets to drive loss recovery, so maybe that's fine? I know you have added tests to verify the behavior, I will look into them closely and add some to convince myself fully.
I agree that this is the concern. |
max_ack_delay = min_rtt * ctx->ack_frequency / 1024;
Previously we just used the largest seen packet number for calculating the reordering window. Instead, we should only use the largest seen ack-eliciting packet number.
This lets us avoid having to calculate a new value when receiving a ack_frequency_frame.
This happens when the largest seen packet number is non-ack-eliciting, and we have unacked ack-eliciting packet numbers.
c9a1951 to
7608cd7
Compare
My pleasure! Thank you for the patience here and sharing your context. I’ve learned a lot.
Right. I think my logic is sound, but my code doesn’t uphold this invariant:
There are cases where the largest_acked is larger than the largest unacked ack-eliciting packet number. For example, assume:
If we then receive non-ack-eliciting PN 4 and This can be fixed by storing instead the max of largest acked or largest unacked ack eliciting packet number. I did this in 7608cd7, and added a test case for this.
I’ve extended the tests a bit to add the ability to represent non-ack-eliciting packets and the time interval between packets.
Thinking about this more with the context of non-ack-eliciting packets, there is another tradeoff. If we store As a reminder, in the current design we need to find the next |
… loss detection, and it is uncertain what the optimal value is
…eordering_threshold is zero
MarcoPolo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the recent changes. They look good (sorry for the formatting and missing the conventions).
| * passed to quicly_loss_detect_loss too. */ | ||
| uint64_t max_ack_delay = conn->super.remote.transport_params.max_ack_delay * 1000; | ||
| uint64_t reordering_threshold = | ||
| conn->egress.loss.thresholds.use_packet_based ? QUICLY_LOSS_DEFAULT_PACKET_THRESHOLD : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a callout - prior to this PR, setting use_packet_based = 0 would still trigger an ack on reordering. This changes that behavior, but I think it makes sense because semantically disabling packet based behavior should also mean ignoring reordering. Perhaps the reason we didn’t do this before is because RFC 9000 didn’t allow it (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking a close look at the changes.
Perhaps the reason we didn’t do this before is because RFC 9000 didn’t allow it (?)
I checked just now, and this was exactly the case! This call site was introduced by #319, and as the quicly_encode_ack_frequency_frame function in that PR indicates, the ignore_order flag was still missing in the specification. The encoding was updated in #491, but we failed to adjust this call site.
sorry for the formatting and missing the conventions
Not at all! It's a matter of taste of the maintainers.
This change updates the existing ack frequency code to be up to date with draft-11 of the spec.
The commits are meaningful, and I recommend reviewing this PR commit by commit.
The last commit contains some next steps/questions as TODOs. I wanted to sync up on that first before continuing. We may not want to include this commit in the final merge.
A couple other questions:
We have a fixed size of ack ranges we’ll track, and very large reordering thresholds can introduce many gaps. The issue with many gaps is that we could drop a range before acking it. However, I don’t think that’s actually an issue as we have the
QUICLY_NUM_ACK_BLOCKS_TO_INDUCE_ACKACKlogic that should report our acks when we have too many gaps.send_ack_at?I’m not sure there’s a security issue here since a peer could get the same high rate of ack behavior by setting the Ack-Eliciting threshold to 0 or by introducing gaps in packet numbers.
It must be at least our timer’s resolution (1ms?). But should it be higher? Do we need to be mindful of server load in what we decide here?
The first two commits fix some warnings. Happy to pull them out into a separate PR, just let me know.