Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only mark timers to update #3

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Hasan6979
Copy link

@Hasan6979 Hasan6979 commented Nov 28, 2024

Timers are updated on every single packet call, but their accuracy is limited to 250ms where currentTime is updated. So just mark timers to update, and update them only once in update_timers.

This PR is part 1 of several PR's to revamp the neptun noise logic. The next PR would remove the lock from the peer.tunnel and introduce interior locks to tunnel members instead. In the hotpath, we would only be reading the session so no need to lock the entire tunnel for that. Other members of tunnel are either lockable or can be made atomic. Locking timers internally is not possible, otherwise it can lead to a deadlock as update_timers also tries to lock() the timers then.

@Hasan6979 Hasan6979 force-pushed the LLT-5805_update_timers_periodically branch from 5e1292c to 8092b80 Compare November 28, 2024 15:35
@tomaszklak
Copy link
Contributor

Please add some more detail to the second part:

This is also needed in order to have a lock free tunnel, because locking timers internally is not possible, otherwise it can lead to a deadlock as update_timers also tries to lock() the timers then.

I don't fully understand this.

Comment on lines +234 to +245
let timer_mask = self
.timers
.timers_to_update_mask
.load(std::sync::atomic::Ordering::Relaxed);
for timer_name in TimerName::VALUES {
if (timer_mask & (1 << (timer_name as u16))) != 0 {
self.timer_tick(timer_name);
}
}
// Reset all marked bits
self.timers
.timers_to_update_mask
.store(0, std::sync::atomic::Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will miss mark_timer_to_update calls that happen during the for loop. I think you want to call let timer_mask = self.timers.timers_to_update_mask.swap(0, Relaxed) in the beginning.

If I'm correct, please also add a test that could catch this issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed and added test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this test, verifies the fix for the above issue. The issue happens only when the for loop (now in tick_marked_timers) runs in parallel with mark_timer_to_update. Also, reverting the swap change from implementation, doesn't break the test - please add a test that fails with the old version of the code, but passes with the new one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but because of a bit difficulty for running them in parallel to make sure the issue happens, I decided to mark the timer after reading the bit mask. That timer should be updated in the next cycle of update timers, but in previous version that update was being missed.

neptun/src/noise/timers.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@packgron packgron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.0

my_tun.update_timers(&mut [0]);

// Only those timers marked should be udpated
assert!(!my_tun.timers[TimerName::TimeLastDataPacketSent].is_zero());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: use assert_eq!(timers[x], timers[TimeCurrent])

@Hasan6979 Hasan6979 force-pushed the LLT-5805_update_timers_periodically branch from 4f1d22c to 4ba9f5e Compare December 9, 2024 13:36
@Hasan6979
Copy link
Author

Hasan6979 commented Dec 9, 2024

Performance test from base

ID Role Interval Transfer Bitrate Retransmissions
5 Sender 0.00-120.00 sec 5.17 GBytes 370 Mbits/sec 40,779
5 Receiver 0.00-120.00 sec 5.17 GBytes 370 Mbits/sec N/A
7 Sender 0.00-120.00 sec 5.21 GBytes 373 Mbits/sec 42,817
7 Receiver 0.00-120.00 sec 5.21 GBytes 373 Mbits/sec N/A

Performance of current branch

ID Role Interval Transfer Bitrate Retransmissions
5 Sender 0.00-120.00 sec 5.24 GBytes 375 Mbits/sec 43,217
5 Receiver 0.00-120.00 sec 5.24 GBytes 375 Mbits/sec N/A
7 Sender 0.00-120.00 sec 5.08 GBytes 364 Mbits/sec 41,078
7 Receiver 0.00-120.00 sec 5.08 GBytes 364 Mbits/sec N/A

@jjanowsk
Copy link
Collaborator

I've done a ton of testing yesterday on my setup (two physical machines connected directly via cable). Unsurprisingly the most consistent results I get when measuring only one way UDP traffic. On multiple 6 minute runs I got an average of 684 Mbps with this branch vs 662 Mbps with the main branch. Looks like 3% improvement which is nice.

@Hasan6979
Copy link
Author

@jjanowsk I think this amount is good enough because I would expect only a marginal gain, if any, by this change. Either way this change is needed for future improvements.

@Hasan6979 Hasan6979 force-pushed the LLT-5805_update_timers_periodically branch from 4326a1f to e64f7d8 Compare December 18, 2024 08:56
@@ -82,6 +96,7 @@ pub struct Timers {
persistent_keepalive: usize,
/// Should this timer call reset rr function (if not a shared rr instance)
pub(super) should_reset_rr: bool,
timers_to_update_mask: AtomicU16,
Copy link
Contributor

@tomaszklak tomaszklak Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need AtomicU16 here? can we use instead the parking_lot::Mutex<u16> or even parking_lot::Mutex<[bool;8]>?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is main area that should be bringing improment, as then we are just swtiching from a mutex to a mutex.
Now we fundamentaly move away from any wait for lock's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is adding atomicu16, not removing any mutex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, it's not with self as &self, but &mut self.
We don't even need Mutex or Atomic then in reality, just u16 would be enough

Timers are updated on every single packet call, but their
accuracy is limited to 250ms where currentTime is updated.
So just mark timers to update, and update them only once in
update_timers.
@Hasan6979 Hasan6979 force-pushed the LLT-5805_update_timers_periodically branch from e64f7d8 to d4e74e9 Compare December 18, 2024 13:24
@jjanowsk
Copy link
Collaborator

More updates from the measurements on the CI. When run in the CI on 15 minute runs of single way UDP traffic the gains are not significant. There is less then 0.5% of the difference between main and this branch, sometimes main being faster and sometimes this branch. In total I've run 4 pipelines like this.

@tomaszklak
Copy link
Contributor

Traffic the gains are not significant. There is less then 0.5% of the difference between main and this branch, sometimes main being faster and sometimes this branch.

In that case, maybe we can close this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants