Skip to content

Conversation

@MarcoPolo
Copy link
Contributor

This change updates the existing ack frequency code to be up to date with draft-11 of the spec.

The commits are meaningful, and I recommend reviewing this PR commit by commit.

The last commit contains some next steps/questions as TODOs. I wanted to sync up on that first before continuing. We may not want to include this commit in the final merge.

A couple other questions:

  • Do we want to limit the max reordering threshold we accept?

We have a fixed size of ack ranges we’ll track, and very large reordering thresholds can introduce many gaps. The issue with many gaps is that we could drop a range before acking it. However, I don’t think that’s actually an issue as we have the QUICLY_NUM_ACK_BLOCKS_TO_INDUCE_ACKACK logic that should report our acks when we have too many gaps.

  • Is there anything else we should do when handling an immediate ack besides setting send_ack_at?

I’m not sure there’s a security issue here since a peer could get the same high rate of ack behavior by setting the Ack-Eliciting threshold to 0 or by introducing gaps in packet numbers.

  • What should we advertise for QUICLY_TRANSPORT_PARAMETER_ID_MIN_ACK_DELAY?

It must be at least our timer’s resolution (1ms?). But should it be higher? Do we need to be mindful of server load in what we decide here?


The first two commits fix some warnings. Happy to pull them out into a separate PR, just let me know.

Copy link
Member

@kazuho kazuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. Looks great!

lib/quicly.c Outdated

/* Question: Do we want to move scheduling/sending of ack frequency frames out of this function? */
/* TODO: register an ack callback for the ack frequency frame.
* We don't have a way to handle the ack of the ack_frequency frame */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, considering that ACK_FREQUENCY is an optimization and that it is used only when the connection is in a stable state, maybe the current logic of sending an updated value every 4 PTO is fine?

It might be possible to detect the loss of an ACK_FREQUENCY frame and reset conn->egress.ack_frequency.update_at, but the question is if it is worth doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it's probably not worth doing just to reset conn->egress.ack_frequency.update_at.

I'm more concerned about the logic around the Probe Timeout Period https://datatracker.ietf.org/doc/html/draft-ietf-quic-ack-frequency-11#section-7. Specifically this part:

Until the packet carrying the ACK_FREQUENCY frame is acknowledged, the endpoint MUST use the greater of the current max_ack_delay and the value that is in flight when computing the PTO period. Doing so avoids spurious PTOs that can be caused by an update that decreases the peer's max_ack_delay.

The calculated max_ack_delay is essentially the max of the current max_ack_delay and all in flight max_ack_delays. If we could handle the ack of this frame we would be able to update the current state and reduce the list of in flight states.

For now it's not necessary as we don't change this value.

lib/quicly.c Outdated

// TODO: use received frame.max_ack_delay. We currently use a constant (25 ms) and
// ignore the value set our our transport parameter (see max_ack_delay field comment).
// Is there a reason for this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because I am lazy; we do not care much about performance acting as a receiver compared to being a sender.

That said, it is better to have both sides implemented efficiently, so improvements here would be appreciated, either in this PR or elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to tackle this in a separate PR

Copy link
Member

@kazuho kazuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will do?

@MarcoPolo
Copy link
Contributor Author

I think this will do?

I think so! I just pushed two changes (dropping the "fix openssl warning" commit and adding the "always maintain smallest_unreported_missing"). Easier to view with git range-diff as well.

git range-diff master 205f7be6b192 288cc27c2ae6

@MarcoPolo
Copy link
Contributor Author

and I removed the questions from the TODOs commit, but feel free to drop that commit as well.

Copy link
Member

@kazuho kazuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the efforts (including answering my stupid questions)! I think this is getting really good.

I could be missing something, but how do we follow the following statement in Section 6.2: When an ack-eliciting packet is received with a packet number less than Largest Acked, this still triggers an immediate acknowledgement in an effort to avoid the packet being spuriously declared lost?

I wonder if we need to track largest_acked. And assuming that is the case we might not need to retain smallest_unreported_missing as a state, because we can calculate smallest_unreported_missing using other states?

@MarcoPolo
Copy link
Contributor Author

I could be missing something, but how do we follow the following statement in Section 6.2: When an ack-eliciting packet is received with a packet number less than Largest Acked, this still triggers an immediate acknowledgement in an effort to avoid the packet being spuriously declared lost?

I didn’t follow that text. I instead followed the text from https://github.com/quicwg/ack-frequency/pull/322/files. But with a semantically equivalent change. To inline the relevant parts:

An endpoint that receives an ACK_FREQUENCY frame with a non-zero Reordering Threshold value SHOULD send an immediate ACK whenever it receives an ack-eliciting, out-of order packet whose packet number is outside the reordering window of the peer, i.e. when

  • the difference between the smallest Unreported Missing packet and the Largest Unacked packet is greater than or equal to the Reordering Threshold value; or
  • the received packet number is less than or equal to Largest Acked - Reordering Threshold.

And to quote the relevant parts of the code:

uint64_t largest_pn_outside_reorder_window = largest_unacked - (uint64_t)reordering_threshold;
// ...
received_pn <= largest_pn_outside_reorder_window

Also note our definition of largest_unacked:

/**
 * largest unacked ack-eliciting packet number.
 * After sending an ack this remains the same and represents the last
 * largest unacked ack-eliciting packet number.
 */
uint64_t largest_unacked;

Instead of comparing Largest Acked - Reordering Threshold, I’m comparing Largest Unacked - Reordering Threshold. But this is equivalent. The rough proof is:

  • largest_acked <= largest_unacked (by definition)
  • if received_pn <= largest_acked-reordering_threshold, then received_pn <= largest_unacked-reordering_threshold.

Therefore we will always send an immediate ack if the received pn is outside the reordering window relative to largest ack.

There is still the question if comparing against largest_unacked introduces an immediate ack that wouldn’t have happened if we compared against largest_acked. This would only happen if received_pn > largest_acked-reordering_threshold and received_pn <= largest_unacked-reordering_threshold. However, note that if the latter case is true, then it implies that the received_pn was missing when we received largest_unacked, which means that we would have sent an immediate ack (as largest_unacked - received_pn >= reordering_threshold) due to the first condition in the quoted spec (no extra immediate acks).

Does that make sense?


I wonder if we need to track largest_acked. And assuming that is the case we might not need to retain smallest_unreported_missing as a state, because we can calculate smallest_unreported_missing using other states?

We could do instead track largest_acked instead of smallest_unreported_missing, but I think it would lead to calling quicly_ranges_next_missing (and thus traversing the ranges) more often, as you’d need to call that method anytime you want to find the smallest_unreported_missing packet number.

The reason I went with the current design is that we can avoid calling quicly_ranges_next_missing in the following cases:

  • Receiving packets in order.
  • The received packet is not the smallest_unreported_missing and the smallest_unreported_missing is still within the reordering window.

@kazuho
Copy link
Member

kazuho commented Sep 17, 2025

@MarcoPolo Thank you always for sharing the insights behind the well-thought design.

I instead followed the text from https://github.com/quicwg/ack-frequency/pull/322/files.

I see, makes perfect sense.

Instead of comparing Largest Acked - Reordering Threshold, I’m comparing Largest Unacked - Reordering Threshold. But this is equivalent.

I think this is mostly true. When the packet(s) with largest PNs being received are non-ack-eliciting, I think we might not send ACKs when the spec requires us to. But senders would not expect ACKs triggered by non-ack-eliciting packets to drive loss recovery, so maybe that's fine?

I know you have added tests to verify the behavior, I will look into them closely and add some to convince myself fully.

We could do instead track largest_acked instead of smallest_unreported_missing, but I think it would lead to calling quicly_ranges_next_missing (and thus traversing the ranges) more often, as you’d need to call that method anytime you want to find the smallest_unreported_missing packet number.

I agree that this is the concern.

This lets us avoid having to calculate a new value when receiving a
ack_frequency_frame.
This happens when the largest seen packet number is non-ack-eliciting,
and we have unacked ack-eliciting packet numbers.
@MarcoPolo
Copy link
Contributor Author

@MarcoPolo Thank you always for sharing the insights behind the well-thought design.

My pleasure! Thank you for the patience here and sharing your context. I’ve learned a lot.

Instead of comparing Largest Acked - Reordering Threshold, I’m comparing Largest Unacked - Reordering Threshold. But this is equivalent.

I think this is mostly true. When the packet(s) with largest PNs being received are non-ack-eliciting, I think we might not send ACKs when the spec requires us to. But senders would not expect ACKs triggered by non-ack-eliciting packets to drive loss recovery, so maybe that’s fine?

Right. I think my logic is sound, but my code doesn’t uphold this invariant:

  • largest_acked <= largest_unacked (by definition)

There are cases where the largest_acked is larger than the largest unacked ack-eliciting packet number.

For example, assume:

  • we received ack-eliciting packets [0,1,3]
  • our reordering threshold is 2
  • ack eliciting threshold is 10.

If we then receive non-ack-eliciting PN 4 and QUICLY_DELAYED_ACK_TIMEOUT time passes, then we’ll send an ack with the largest acked being 4 and 2 can be declared lost.

This can be fixed by storing instead the max of largest acked or largest unacked ack eliciting packet number. I did this in 7608cd7, and added a test case for this.

I know you have added tests to verify the behavior, I will look into them closely and add some to convince myself fully.

I’ve extended the tests a bit to add the ability to represent non-ack-eliciting packets and the time interval between packets.

We could do instead track largest_acked instead of smallest_unreported_missing, but I think it would lead to calling quicly_ranges_next_missing (and thus traversing the ranges) more often, as you’d need to call that method anytime you want to find the smallest_unreported_missing packet number.

I agree that this is the concern.

Thinking about this more with the context of non-ack-eliciting packets, there is another tradeoff. If we store largest_acked and largest_unacked, then we wouldn’t need to calculate smallest_unreported_missing when receiving a non-ack-eliciting packet and thus could skip calling quicly_ranges_next_missing. If the application tends to handle many non-ack-eliciting packets (possibly the case with quicly on the server?), then it may be more efficient to store the largest_acked+largest_unacked. What do you think?

As a reminder, in the current design we need to find the next smallest_unreported_missing if the received PN is the smallest_unreported_missing regardless if the PN is ack-eliciting or not.

Copy link
Contributor Author

@MarcoPolo MarcoPolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the recent changes. They look good (sorry for the formatting and missing the conventions).

* passed to quicly_loss_detect_loss too. */
uint64_t max_ack_delay = conn->super.remote.transport_params.max_ack_delay * 1000;
uint64_t reordering_threshold =
conn->egress.loss.thresholds.use_packet_based ? QUICLY_LOSS_DEFAULT_PACKET_THRESHOLD : 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a callout - prior to this PR, setting use_packet_based = 0 would still trigger an ack on reordering. This changes that behavior, but I think it makes sense because semantically disabling packet based behavior should also mean ignoring reordering. Perhaps the reason we didn’t do this before is because RFC 9000 didn’t allow it (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking a close look at the changes.

Perhaps the reason we didn’t do this before is because RFC 9000 didn’t allow it (?)

I checked just now, and this was exactly the case! This call site was introduced by #319, and as the quicly_encode_ack_frequency_frame function in that PR indicates, the ignore_order flag was still missing in the specification. The encoding was updated in #491, but we failed to adjust this call site.

sorry for the formatting and missing the conventions

Not at all! It's a matter of taste of the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants