close connections which violate policy after updates #772

ilrudie · 2024-01-16T22:15:51Z

related #311

Adds:

tracking open connections for inbound and inbound_passthrough
notifications to trackers when auth policy changes occur
separate task handles change notifications, asserts policy against the existing connections and closes open connections which are now denied by policy

howardjohn

Mostly skimmed it so far but looks reasonable overall

howardjohn · 2024-01-16T22:29:47Z

src/proxy/connection_manager.rs

+    pub async fn drain(&self, c: &Connection) {
+        match self.drains.clone().write().await.remove(c) {
+            Some(cd) => {
+                cd.tx.drain().await;


is the lock held here? If so, is drain() possibly blocking?

I don't think the write lock should be held here. Remove totally removes a value from the HashMap returning the owned value (not a reference) so there's no writing beyond the single remove and no reference to the lock required. Now you've got me doubting that understanding a bit though 😅

It's definitely possible to make that more explicit though if we want.

let w = self.drains.clone().write().await; let cd = w.remove(c); drop(w); match cd { ... }

The lock should be released when the value returned from write() is dropped. I am just not sure if its drops when the match body (L67) is done or when the match check (L65) is done

probably better break it out like that to align with the standards we have for zt... prioritize understandability for our reviewers over the most idiomatic looking rustlang

It turns out the lock is held through the duration of the match. This lead to a deadlock in the track method where the read lock was being held for the entire match and one branch was trying to get a write lock.

Use of a central RWLocked connection tracker pretty much means by definition that we will be blocking on add/remove ops, I think.

If that's not acceptable, inverting this via channels and ditching the centralized locked conntrack is probably the only other way.

we can also explore using linked lists for less locking

dashmap crate

howardjohn · 2024-01-16T22:33:46Z

src/proxy/inbound.rs

+                            if !state.assert_rbac(&conn).await {
+                                connection_manager.drain(&conn).await;


This definitely needs some nice logging since it will be a pretty major "WTF just happened" in some cases

firm agree. TY

bleggett · 2024-01-17T17:50:53Z

src/proxy/connection_manager.rs

+    pub async fn connections(&self) -> Vec<Connection> {
+        // potentially large copy under read lock, could require optomization
+        self.drains.read().await.keys().cloned().collect()
+    }


Can we just do the rbac check here to avoid copying the entire conntrack, returning it, and checking in the caller?

We could consider that but I was a little worried about interlocking. We'd need to await locking on the PolicyStore (likely multiple times) inside this lock if we assert_rbac in here I think.

We shouldn't have to lock state multiple times, once we get a change event we can (at least) clone it and provide it to the conn_manager to apply in an iterative loop.

In general if we go with this approach I think eventually assert_rbac should live in connection_manager.rs and not state, anyway, and we can probably avoid cloning the entire state to do what we want.

But, again, this is getting a bit in the weeds.

that's fair. we could do some reworking of how policy is applied and retained to ease or eliminate interlocking problems if need be. My thinking was sort of limited to not making those kind changes which maybe is unnecessarily narrow

This is fine for now. If the centralized rwlock'd conntrack becomes a problem/bottleneck, we can just remove it.

hzxuzhonghu · 2024-01-19T01:52:02Z

src/proxy/connection_manager.rs

+        rx
+    }
+
+    pub async fn drain(&self, c: &Connection) {


is there a graceful draining time?

That's still TBD

I think there are really 2 behaviors we want here:

drain: used when ztunnel is shutting down. sends goaway and continues to handle traffic

close: used when auth policy changes and existing connections become denied. stop handling traffic

@howardjohn and @kyessenov, does this approach seem reasonable?

I don't see any drain here. This is just local close?

As it sits now even when policy changes from allow to deny we send a goaway and continue to handle traffic. We'd originally implemented drains in this way because our only connection drains were when ztunnel was being shutdown and in that case we decided that k8s killing the pod would be our drain timeout.

From an auth deny perspective this didn't seem correct though. I'm just about ready to push some new code which has both drain (for shutdown) and close (for auth) implemented for inbound.

Let's not conflate HBONE goaway with application drain. HBONE goaway is an infastructure mechanism and should not be used for application intent - it's like reinstalling a cluster when redeploying an app.

For application policies, there's no drain or GOAWAY here AFAICT. You should just close the connection.

By close, I mean RST. I think we shouldn't do half close here.

In the latest commits proxy drain behavior is unchanged and all related tests are passing but during testing auth deny closes are (basically) immediate.

I agree that is the desired behavior on drain on upgrade/shutdown vs immediate close on authz policy denies.

ilrudie · 2024-01-24T01:49:25Z

/retest

…changes Signed-off-by: Ian Rudie <[email protected]>

Signed-off-by: Ian Rudie <[email protected]>

Signed-off-by: ilrudie <[email protected]>

… which asserts policy against running connections Signed-off-by: ilrudie <[email protected]>

Signed-off-by: ilrudie <[email protected]>

…ager tests Signed-off-by: ilrudie <[email protected]>

…ager, unit testing around the same Signed-off-by: ilrudie <[email protected]>

ilrudie · 2024-01-29T17:59:34Z

@howardjohn, @hzxuzhonghu and @kyessenov; if you get a chance to take another look and provide feedback that would be greatly appreciated. TYIA

kyessenov · 2024-01-30T05:51:22Z

src/proxy/connection_manager.rs

+            _ = policies_changed.changed() => {
+                let connections = connection_manager.connections().await;
+                for conn in connections {
+                    if !state.assert_rbac(&conn).await {


Sorry, I'm not a Rust expert - does this await individually or globally? Any time there's IO, you probably don't want to stall in a loop in a switch - you may overflow the pending events because IO is slow. It's best to fire-and-forget and then let someone reap the zombies.

This will await the individual assertion against the one connection but await doesn't mean block (there is a specific block function if that's what you want though). If the Future being awaited is not ready it relinquishes control and will be polled later by the executor of the task/thread.

Rusts Futures are somewhat unique in that they are lazy. If nothing drives them to completion they will not do any work so we can't totally fire and forget. We can either await them and let the executor/callback model poll them or we could collect the all futures and join them later. If we do neither no work would happen though.

I feel like you do want to block the loop or at least debounce it to avoid edge-trigerred event spam in the runtime. But I have no idea how to express that in Rust. Maybe a separate actor that is "fast" in consuming policy events but "slow" in enacting them.

Yeah, I think having scoped events would be a solid optimization. IMO it would be better to keep this PR a little smaller and then iterate on some optimizations as we move forward but if folks strongly disagree we can start optimizing now.

The code you are commenting on is sort of that actor (in concept at least). It's being spun up in its own task so it should be (at least somewhat) independent of the tasks which are actually moving data.

If I understand the event spam, is the concern is we enqueue multiple times to trigger this code? Like while we are looping here, we get changed() multiple times stacked up?

If so, I think it will have at most one pending. That being said, you could still hit some spam I guess

I've changed the implementation of sending to policy subscribers to be less frequent in cases where we've got batched updates. Should help

There can be a race between connection tracked and policy update.
like:

t0: a connection established with old rbac
t1: the rbac updated
t2: policy watcher running drain here
t3. the connection established at t0 get tracked

the connection actually should be closed

ilrudie · 2024-01-30T16:41:10Z

cc @stevenctl

howardjohn · 2024-01-30T21:33:18Z

src/identity/manager.rs

@@ -42,6 +42,21 @@ pub enum Identity {
    },
 }

+impl Ord for Identity {
+    // Not sure this is a super legit compare but I think it should work for POC


given this is no long POC should we do a more robust approach?

Good call. I do at want to consider if this is adequate.

Using something like this would force us to consider how Identity::X should be compared to Identity::Spiffe if/when a new identity format is added.

impl Ord for Identity { fn cmp(&self, other: &Self) -> Ordering { let s = match self { Identity::Spiffe { trust_domain, namespace, service_account, } => trust_domain.to_owned() + namespace + service_account, }; let o = match other { Identity::Spiffe { trust_domain, namespace, service_account, } => trust_domain.to_owned() + namespace + service_account, }; s.cmp(&o) } }

why do we actually need ord?

nm, they can be derived... that's the play

Ord is required to sort which is used in the testing.

howardjohn · 2024-01-30T21:36:14Z

src/proxy/connection_manager.rs

+        let cd = ConnectionDrain::new();
+        let rx = cd.rx.clone();
+        let mut drains = self.drains.write().await;
+        if let Some(w) = drains.remove(c) {


nit: del+mutate+insert can be done with code like:

map.entry("poneyland") .and_modify(|e| { *e += 1 }) .or_insert(42);

maybe not with the return rx aspect though

modified rust playground example

Seems like it could work. TY

could very well be a step too far and difficult to grok but:

// register a connection with the manager and get a channel to receive on pub async fn track(self, c: &Connection) -> drain::Watch { self.drains .write() .await .entry(c.to_owned()) .and_modify(|cd| cd.count += 1) .or_insert(ConnectionDrain::new()) .rx .clone() }

howardjohn · 2024-01-30T21:40:48Z

src/proxy/connection_manager.rs

+}
+
+#[derive(Clone)]
+pub struct ConnectionManager {


We already have a "pool" tracking connections. It seems like there, ideally, would be a single data structure tracking connections. I guess Pool tracks outer HBONE connections while this tracks inner user connections, though. Also I don't see a feasible way to use 1 structure anyways, so mostly hypothetical.

It's a good thought. I couldn't work out how to do it either though.

howardjohn · 2024-01-30T21:41:24Z

src/proxy/connection_manager.rs

+    pub async fn release(self, c: &Connection) {
+        let mut drains = self.drains.write().await;
+        if let Some((k, v)) = drains.remove_entry(c) {
+            if v.count.fetch_sub(1, std::sync::atomic::Ordering::SeqCst) > 1 {


nit: why do we need atomic operations if we have a write lock?

You're correct. I was gunna try something a little fancier and ended going with this without removing the atomic. A lot of this logic may be just removed though if we want to switch off the drain crate onto a different type of channel as Ben suggested elsewhere.

Should be better now. Atomic removed

howardjohn · 2024-01-30T21:43:08Z

src/proxy/connection_manager.rs

+        }
+    }
+
+    // signal all connections listening to this channel to take action (typically terminate traffic)


comment seems confusing, or I don't understand it. Signal all connections? But the input is 1 connection.

Is Connection representing the outer hbone connection, and drain() waits until all inner connects close?

Potentially a single rbac::connection could have multiple connections handling traffic I think. In that case they all need to be closed and this code should handle the case if it arises.

howardjohn · 2024-01-30T21:47:55Z

src/proxy/connection_manager.rs

+            _ = policies_changed.changed() => {
+                let connections = connection_manager.connections().await;
+                for conn in connections {
+                    if !state.assert_rbac(&conn).await {


If I understand the event spam, is the concern is we enqueue multiple times to trigger this code? Like while we are looping here, we get changed() multiple times stacked up?

If so, I think it will have at most one pending. That being said, you could still hit some spam I guess

howardjohn · 2024-01-30T21:48:14Z

src/state/policy.rs

@@ -57,12 +79,14 @@ impl PolicyStore {
            RbacScope::WorkloadSelector => {}
        }
        self.by_key.insert(key, rbac);
+        self.notifier.send();


Probably we want to send at most 1 per xds push. That may modify many policies at once

I was going to look to a follow up which adds batching and includes putting the scope onto the channel so that the receivers have less spam and can inspect the scope to determine if they need to re-assert. If we think it's critical path I could include it but if not critical perhaps a TODO would be better so it's not just in my head.

added an impl for your comment. Not scoped but should reduce the number of notifications sent to policy subscribers.

Signed-off-by: ilrudie <[email protected]>

hzxuzhonghu · 2024-01-31T09:26:24Z

src/proxy/connection_manager.rs

+
+    // signal all connections listening to this channel to take action (typically terminate traffic)
+    async fn close(&self, c: &Connection) {
+        if let Some(cd) = self.drains.clone().write().await.remove(c) {


remove clone

hzxuzhonghu · 2024-01-31T09:52:06Z

src/proxy/connection_manager.rs

+            _ = policies_changed.changed() => {
+                let connections = connection_manager.connections().await;
+                for conn in connections {
+                    if !state.assert_rbac(&conn).await {


There can be a race between connection tracked and policy update.
like:

t0: a connection established with old rbac
t1: the rbac updated
t2: policy watcher running drain here
t3. the connection established at t0 get tracked

the connection actually should be closed

Signed-off-by: ilrudie <[email protected]>

ilrudie · 2024-01-31T16:41:32Z

There can be a race between connection tracked and policy update.
like:

t0: a connection established with old rbac
t1: the rbac updated
t2: policy watcher running drain here
t3. the connection established at t0 get tracked

the connection actually should be closed

Good catch @hzxuzhonghu. Track used to be called with assert_rbac but the design changed a little and now there is a significant gap for races to occur. I'll have take a look at this in more detail.

Signed-off-by: ilrudie <[email protected]>

ilrudie · 2024-02-02T02:05:47Z

added hold to consider reworking without drain::Watch and using a tokio channel instead

hzxuzhonghu

Some tests should be added, could in a follow up

Signed-off-by: ilrudie <[email protected]>

ilrudie · 2024-02-02T23:23:16Z

/retest

ilrudie · 2024-02-05T15:34:21Z

Thanks @hzxuzhonghu, I've created an issue to track on follow up items (#798) but will stop making changes to this PR.

supercharge-xsy · 2024-03-14T02:49:56Z

not familiar with Rust，have a question：If the connection is closed properly after send some packets, can the management program sense it? I don't seem to see the corresponding processing. @ilrudie

ilrudie · 2024-03-14T17:23:15Z

not familiar with Rust，have a question：If the connection is closed properly after send some packets, can the management program sense it? I don't seem to see the corresponding processing. @ilrudie

It's not checking the connections themselves but we are presently using the drain crate which waits for the handle(s) to be dropped when you signal. The handle drop should occur once we break out of looping on handling traffic for the connection.

ilrudie requested review from a team as code owners January 16, 2024 22:15

istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 16, 2024

ilrudie mentioned this pull request Jan 16, 2024

Determine if RBAC updates should drain #311

Closed

howardjohn reviewed Jan 16, 2024

View reviewed changes

ilrudie force-pushed the auth-drain branch from 5f4901c to 7917ab3 Compare January 17, 2024 16:56

bleggett reviewed Jan 17, 2024

View reviewed changes

istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jan 18, 2024

ilrudie force-pushed the auth-drain branch from cc4e389 to e51ff93 Compare January 18, 2024 22:08

istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR needs to be rebased before being merged size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 18, 2024

hzxuzhonghu reviewed Jan 19, 2024

View reviewed changes

ilrudie added 12 commits January 23, 2024 20:55

initial impl for draining connections which are invalid after policy …

2f36a49

…changes Signed-off-by: Ian Rudie <[email protected]>

addressing early comments, fixing deadlocks, fixing drain logic

eeef8b0

Signed-off-by: Ian Rudie <[email protected]>

seperate connection close and proxy drain into different channels

07589b6

Signed-off-by: ilrudie <[email protected]>

fix cargo fmt error

2ac3e7a

Signed-off-by: ilrudie <[email protected]>

unit testing for ConnectionManager

e042654

Signed-off-by: ilrudie <[email protected]>

sort connections to remove flakiness from indeterminant vec ordering

b7825f9

Signed-off-by: ilrudie <[email protected]>

fix trait impls for Identity per clippy

5080ea8

Signed-off-by: ilrudie <[email protected]>

more clippy fixes

99dacd9

Signed-off-by: ilrudie <[email protected]>

impl closing denied connections for inbound_passthrough, DRY asnyc fn…

e90bf03

… which asserts policy against running connections Signed-off-by: ilrudie <[email protected]>

testing for policy_watcher

fa5dd8a

Signed-off-by: ilrudie <[email protected]>

cleanup extra newline

59594d3

Signed-off-by: ilrudie <[email protected]>

use hickory_resolver instead of trust_dns_resolver for connection_man…

4808316

…ager tests Signed-off-by: ilrudie <[email protected]>

ilrudie force-pushed the auth-drain branch from e34a9aa to 4808316 Compare January 24, 2024 02:17

enhancement to stop leaking connection channels in the connection man…

0ac3b25

…ager, unit testing around the same Signed-off-by: ilrudie <[email protected]>

istio-testing removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 24, 2024

kyessenov reviewed Jan 30, 2024

View reviewed changes

ilrudie changed the title ~~drain connections which violate policy after updates~~ close connections which violate policy after updates Jan 30, 2024

howardjohn reviewed Jan 30, 2024

View reviewed changes

ilrudie added 2 commits January 30, 2024 17:34

remove unnecessary atomic

79c12c8

Signed-off-by: ilrudie <[email protected]>

use the map.entry... suggestion

55e6afa

Signed-off-by: ilrudie <[email protected]>

istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 30, 2024

ilrudie added 2 commits January 30, 2024 18:33

derive PartialOrd and Ord instead of implement

e4f4ce6

Signed-off-by: ilrudie <[email protected]>

alert policy subscribers at most once per xds update

f6d174b

Signed-off-by: ilrudie <[email protected]>

hzxuzhonghu reviewed Jan 31, 2024

View reviewed changes

remove unneeded clone

aadcc51

Signed-off-by: ilrudie <[email protected]>

separate beginning to manage a conn from requestin a close receiver

a840456

Signed-off-by: ilrudie <[email protected]>

istio-testing added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 2, 2024

register connections in outbound as well

e2468da

Signed-off-by: ilrudie <[email protected]>

ilrudie added the do-not-merge/hold Block automatic merging of a PR. label Feb 2, 2024

hzxuzhonghu approved these changes Feb 2, 2024

View reviewed changes

switch to borrows for connection_manager, audit clone usage

45c243f

Signed-off-by: ilrudie <[email protected]>

ilrudie mentioned this pull request Feb 5, 2024

iterate on close connections which violate policy after updates #798

Open

3 tasks

ilrudie removed the do-not-merge/hold Block automatic merging of a PR. label Feb 5, 2024

istio-testing merged commit b95e5dd into istio:master Feb 5, 2024
3 checks passed

ilrudie mentioned this pull request May 14, 2024

Add ilrudie as ztunnel maintainer istio/community#1401

Merged

		if !state.assert_rbac(&conn).await {
		connection_manager.drain(&conn).await;

close connections which violate policy after updates #772

close connections which violate policy after updates #772

Conversation

ilrudie commented Jan 16, 2024 • edited Loading

howardjohn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleggett Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleggett Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleggett Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilrudie commented Jan 24, 2024

ilrudie commented Jan 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilrudie commented Jan 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilrudie Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilrudie commented Jan 31, 2024 • edited Loading

ilrudie commented Feb 2, 2024

hzxuzhonghu left a comment

Choose a reason for hiding this comment

ilrudie commented Feb 2, 2024

ilrudie commented Feb 5, 2024

supercharge-xsy commented Mar 14, 2024

ilrudie commented Mar 14, 2024

ilrudie commented Jan 16, 2024 •

edited

Loading

bleggett Jan 17, 2024 •

edited

Loading

bleggett Jan 17, 2024 •

edited

Loading

bleggett Jan 17, 2024 •

edited

Loading

ilrudie Jan 30, 2024 •

edited

Loading

ilrudie commented Jan 31, 2024 •

edited

Loading