Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xDS: v1.66.2 and above break most xDS client gRPC requests #7691

Open
marcel808 opened this issue Oct 1, 2024 · 24 comments
Open

xDS: v1.66.2 and above break most xDS client gRPC requests #7691

marcel808 opened this issue Oct 1, 2024 · 24 comments
Assignees
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Status: Blocked Type: Bug

Comments

@marcel808
Copy link

marcel808 commented Oct 1, 2024

What version of gRPC are you using?

v1.65.0 works, 1.66.2/1.67.0 cause most gRPC xDS based requests to fail with error

rpc error: code = Unavailable desc = xds: error received from xDS stream: EOF

What version of Go are you using (go version)?

1.22.6

What operating system (Linux, Windows, …) and version?

Linux (Google GKE)

What did you do?

We have go service pods that call out to other services using the istio agent (inject.istio.io/templates:a grpc-agent), prefix the service urls with "xds:///" and import _ "google.golang.org/grpc/xds".

istiod-1-22-4

What did you expect to see?

Succesfull gRPC requests load balanced using xDS

What did you see instead?

90% gRPC failure rate with error: rpc error: code = Unavailable desc = xds: error received from xDS stream: EOF

@easwars
Copy link
Contributor

easwars commented Oct 2, 2024

Would you be able to provide some logs for us? With the following env vars set: GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info.

Thanks.

@easwars
Copy link
Contributor

easwars commented Oct 2, 2024

The error mentioned in the issue description happens when connection to the management server fails. https://github.com/grpc/grpc-go/blob/ca4865d6dd6f3d8b77f1943ccfd6c9e78223912d/xds/internal/xdsclient/authority.go#L462C1-L463C1

@marcel808
Copy link
Author

marcel808 commented Oct 3, 2024

I enabled grpc-go version 1.67.2 and here are the logs leading up to the error:

2024/10/03 16:01:45 INFO: [core] [Channel #26 SubChannel #27]Subchannel created
2024/10/03 16:01:45 INFO: [core] [Channel #26]Channel Connectivity change to CONNECTING
2024/10/03 16:01:45 INFO: [core] [Channel #26]Channel exiting idle mode
2024/10/03 16:01:45 INFO: [xds] [xds-client 0xc000bd8190] [unix:///etc/istio/proxy/XDS] Created transport to server "unix:///etc/istio/proxy/XDS"
2024/10/03 16:01:45 INFO: [xds] [xds-client 0xc000bd8190] [unix:///etc/istio/proxy/XDS] New watch for type "ListenerResource", resource name "<redacted>.svc.cluster.local:50051"
2024/10/03 16:01:45 INFO: [xds] [xds-client 0xc000bd8190] [unix:///etc/istio/proxy/XDS] First watch for type "ListenerResource", resource name "<redacted>.svc.cluster.local:50051"
2024/10/03 16:01:45 INFO: [core] [Channel #24]Channel exiting idle mode
2024/10/03 16:01:45 INFO: [core] [Channel #26 SubChannel #27]Subchannel Connectivity change to CONNECTING
2024/10/03 16:01:45 INFO: [core] [Channel #26 SubChannel #27]Subchannel picks a new address "/etc/istio/proxy/XDS" to connect
2024/10/03 16:01:45 INFO: [core] original dial target is: "xds:///<redacted>.svc.cluster.local:50052"
2024/10/03 16:01:45 INFO: [core] [Channel #28]Channel created
2024/10/03 16:01:45 INFO: [core] [Channel #28]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"xds", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/<redacted>.svc.cluster.local:50052", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
2024/10/03 16:01:45 INFO: [core] [Channel #28]Channel authority set to "<redacted>.svc.cluster.local:50052"
2024/10/03 16:01:45 INFO: [pick-first-lb] [pick-first-lb 0xc000c1e240] Received SubConn state update: 0xc000c1e2a0, {ConnectivityState:CONNECTING ConnectionError:<nil> connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}}
2024/10/03 16:01:45 INFO: [core] connect called on addrConn in non-idle state (CONNECTING); ignoring.
2024/10/03 16:01:45 INFO: [xds] [xds-resolver 0xc000a74fc0] Creating resolver for target: xds:///<redacted>.svc.cluster.local:50052
2024/10/03 16:01:45 INFO: [xds] [xds-bootstrap] Using bootstrap file with name "/etc/istio/proxy/grpc-bootstrap.json" from GRPC_XDS_BOOTSTRAP environment variable
2024/10/03 16:01:45 INFO: [core] [Channel #26 SubChannel #27]Subchannel Connectivity change to READY
2024/10/03 16:01:45 INFO: [pick-first-lb] [pick-first-lb 0xc000c1e240] Received SubConn state update: 0xc000c1e2a0, {ConnectivityState:READY ConnectionError:<nil> connectedAddress:{Addr:/etc/istio/proxy/XDS ServerName:localhost Attributes:0xc00021a7a8 BalancerAttributes:<nil> Metadata:<nil>}}
2024/10/03 16:01:45 INFO: [core] [Channel #26]Channel Connectivity change to READY
2024/10/03 16:01:45 INFO: [xds] [xds-client 0xc000bd8190] [unix:///etc/istio/proxy/XDS] ADS stream created
2024/10/03 16:01:45 WARNING: [xds] [xds-client 0xc0001d7b80] [unix:///etc/istio/proxy/XDS] ADS stream closed: EOF
2024/10/03 16:01:45 INFO: [xds] [xds-resolver 0xc00072b440] Received error for Listener resource "<redacted>.cluster.local:8666": xds: error received from xDS stream: EOF
2024/10/03 16:01:45 WARNING: [core] [Channel #1]ccResolverWrapper: reporting error to cc: xds: error received from xDS stream: EOF

@marcel808
Copy link
Author

vs 1.65.0:

2024/10/03 15:21:58 INFO: [xds] [xds-client 0xc0007fee10] [unix:///etc/istio/proxy/XDS] ADS stream created
2024/10/03 15:21:58 INFO: [xds] [xds-client 0xc0007fee10] [unix:///etc/istio/proxy/XDS] ADS request sent: {
"node":  {
"id":  "sidecar~<redacted>.cluster.local",

@easwars
Copy link
Contributor

easwars commented Oct 3, 2024

Thanks for the logs.

2024/10/03 16:01:45 WARNING: [xds] [xds-client 0xc0001d7b80] [unix:///etc/istio/proxy/XDS] ADS stream closed: EOF

The above line seem to indicate to me that the server is closing the stream for whatever reason. The above is logged when the xDS client runs into an error when attempting to read from the ADS stream.

@easwars
Copy link
Contributor

easwars commented Oct 3, 2024

Does your server logs contain anything useful?

@marcel808
Copy link
Author

marcel808 commented Oct 3, 2024

Could this be related (blanked out some of the ip parts)?

{"level":"warn","time":"2024-10-03T16:01:45.814940Z","scope":"ads","msg":"ADS: \"dd.dd.15.80:37586\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:45.820678Z","scope":"ads","msg":"ADS: \"dd.dd.15.79:57722\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:46.037482Z","scope":"ads","msg":"ADS: \"dd.dd.0.133:35194\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:46.426842Z","scope":"ads","msg":"ADS: \"dd.dd.4.251:50008\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:46.823614Z","scope":"ads","msg":"ADS: \"dd.dd.15.80:37612\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:47.610846Z","scope":"ads","msg":"ADS: \"dd.dd.8.49:60104\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:47.680695Z","scope":"ads","msg":"ADS: \"dd.dd.17.87:40414\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:47.894435Z","scope":"ads","msg":"ADS: \"dd.dd.0.133:35212\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:48.303335Z","scope":"ads","msg":"ADS: \"dd.dd.17.86:56104\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:48.446480Z","scope":"ads","msg":"ADS: \"dd.dd.9.82:22297\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:48.598658Z","scope":"ads","msg":"ADS: \"dd.dd.15.79:55362\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:48.676814Z","scope":"ads","msg":"ADS: \"dd.dd.17.86:56120\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:49.729190Z","scope":"ads","msg":"ADS: \"dd.dd.17.86:56146\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:50.035842Z","scope":"ads","msg":"ADS: \"dd.dd.8.49:60114\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:50.945518Z","scope":"ads","msg":"ADS: \"dd.dd.0.133:36854\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:52.271491Z","scope":"ads","msg":"ADS: \"dd.dd.17.86:56196\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:52.466689Z","scope":"ads","msg":"ADS: \"dd.dd.9.82:35128\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:54.651522Z","scope":"ads","msg":"ADS: \"dd.dd.15.79:55434\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:01:59.183774Z","scope":"ads","msg":"ADS: \"dd.dd.17.86:41806\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:02:01.358069Z","scope":"ads","msg":"ADS: \"dd.dd.17.86:41826\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:02:01.572871Z","scope":"ads","msg":"ADS: \"dd.dd.17.87:33432\" exceeded rate limit: context canceled"}
{"level":"warn","time":"2024-10-03T16:02:01.891733Z","scope":"ads","msg":"ADS: \"dd.dd.0.133:56666\" exceeded rate limit: context canceled"}

@easwars
Copy link
Contributor

easwars commented Oct 3, 2024

Interesting. Is this from your server?

But from the client logs, it seems like the client is not even sending one ADS request, right?

@marcel808
Copy link
Author

Interesting. Is this from your server?
Yes, this is from kubectl logs -n istio-system istiod-....

But from the client logs, it seems like the client is not even sending one ADS request, right?

Seems like it since like it since the working version shows ADS request sent. I'll ask our infra structure team whether there are certain rate limits in play.
Note sure whether the rate limits are the cause of the EOF errors but they occur around the same time.
Any idea how the changes post 1.65 could have affected this?

@easwars
Copy link
Contributor

easwars commented Oct 3, 2024

Any idea how the changes post 1.65 could have affected this?

There were a few changes that could be of interest:

  • xds: implement ADS stream flow control mechanism #7458: This change causes the xDS client to block further reads on the ADS stream until the previous update is completely processed locally
  • xds: add support for multiple xDS clients, for fallback #7347: This change causes gRPC to create one xDS client per gRPC dial target. Earlier, there used to be single xDS client per gRPC binary. Now, if your binary makes multiple gRPC channels to different target URIs, you would have multiple xDS clients, one each for each target URI
    • This could cause more connections and more ADS streams on your server

@marcel808
Copy link
Author

Thanks for the reference, I'll see also if the metric can located here that shows the ADS request rate to confirm there's a link to this issue.

@marcel808 marcel808 changed the title xDS: v1.66.2 and above break all xDS client gRPC requests xDS: v1.66.2 and above break most xDS client gRPC requests Oct 4, 2024
@marcel808
Copy link
Author

marcel808 commented Oct 4, 2024

Adjusted the title as there might be some pods able to make requests. I will gather more information on the gRPC service and client pod counts.

@marcel808
Copy link
Author

marcel808 commented Oct 8, 2024

background

QA GKE cluster with limited number of pods (to keep cost down). There are 3 pod/service types that were upgraded to with different service dependencies:

  • a: 4 deps
  • b: 2
  • c: 2

e.g. the grpc-go change to create separate xDS clients for each service dependencies brings with it additional ADS request load on the istiod side.

client side observations

  • ADS requests (as logged by grpc-go) from the upgraded pods increased on a per minute basis by 10x. The h/l/t prefixes identify different services while the 15:00 hour counts are v1.65.0 vs 16:00 are v1.67.1. Note that the logged request counts don't seem to add up to the 100 PILOT_MAX_REQUESTS_PER_SECOND (e.g. 6k/minute).

image

  • Nearly all client gRPC requests fail when the upgrade to grpc-go v1.67.1 is deployed (~ 12 ET - 16:00 UTC). See graph.

image

  • The services never seem to get into a stable state, e.g. the only way to recover is to roll back the deploy back to v1.65.0

istiod obverations

  • Rate limit errors happen also before switching the service to v1.67.1 but do increase. The per minute rate is lower than the gRPC xdDS stream EOF error rate on the client side:

image

  • The (3) istiod pods did not show signs of being overloaded (cpu/memory wise) during the the time of v1.67.1 change.

So in summary, it doesn't seem like the istiod side logged ADS rate limit errors account for a near 100% failure rate of gRPC calls for xDS clients. Also, if there's a temporary case of the 100/s rate limit for ADS requests being exceeded (potentially due to the increased number of xDS clients per > 1.65.0 changes) I would expect the services eventually being able to recover (and get ADS responses). In this QA scenario they don't seem to be able to.

@easwars
Copy link
Contributor

easwars commented Oct 8, 2024

Thanks for the detailed report.

So, if the stream errors seen by gRPC are not because of rate limit errors, does the istiod logs have anything useful about why it is closing streams? gRPC is seeing EOF on stream reads, so there is not anything else useful from gRPC logs about why the stream failed.

Is there anyway you can provide us with a repro that we can use on our side? I personally don't have much experience with configuring and running istiod.

@marcel808
Copy link
Author

The istiod logs are not showing much besides rate limit warnings (no errors). What is interesting though what's not logged: I see :"ADS: new delta connection for node:..." for services that run a plain istio-proxy side car but not for the service that has problems (it uses the istio-proxy in agent mode).
The agent logs I will pull today to see the difference in output between the 1.65.0 usage and 1.67 since that sits in between the service and the istiod.

Will make attempt to reproduce the problem with some dummy services/clients and minikube.

@easwars
Copy link
Contributor

easwars commented Oct 10, 2024

ADS: new delta connection for node

That is interesting. gRPC does not support the delta variant of the xDS protocol. gRPC currently only supports the SotW variant. So, maybe the delta connection is originating from something else?

@marcel808
Copy link
Author

marcel808 commented Oct 10, 2024

I compared the istio-proxy logs for agent vs "normal" mode and found this difference:

agent:
{"level":"info","time":"2024-10-10T12:47:42.542125Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":81}

"normal (and logged by istiod)":
{"level":"info","time":"2024-10-10T09:03:02.891659Z","scope":"xdsproxy","msg":"connected to delta upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":129}

Going to try running the service with a plain proxy and 1.67.1 and see what happens; at least it would narrow the problem down.

@marcel808
Copy link
Author

marcel808 commented Oct 11, 2024

Switching to non-agent opened another can of worms so sticking with agent mode for now (this is also what we prefer as these are very high volume services that get bogged down by the overhead of the proxy using up more CPU than the service container itself processing every byte going in and out)

The agent logs only these lines

{"level":"info","time":"2024-10-11T14:47:00.012793Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":54}
{"level":"info","time":"2024-10-11T14:47:00.012957Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":53}
{"level":"warn","time":"2024-10-11T14:47:00.279688Z","scope":"xdsproxy","msg":"registered overlapping stream; closing previous"}
{"level":"warn","time":"2024-10-11T14:47:00.280400Z","scope":"xdsproxy","msg":"registered overlapping stream; closing previous"}
{"level":"info","time":"2024-10-11T14:47:00.286153Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":55}
{"level":"info","time":"2024-10-11T14:47:00.287524Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":56}
{"level":"warn","time":"2024-10-11T14:47:05.700289Z","scope":"xdsproxy","msg":"registered overlapping stream; closing previous"}
{"level":"warn","time":"2024-10-11T14:47:05.700923Z","scope":"xdsproxy","msg":"registered overlapping stream; closing previous"}

vs the v1.65.0 scenario shows more work being done:

{"level":"info","time":"2024-10-11T15:41:53.643040Z","scope":"xdsproxy","msg":"Initializing with upstream address \"istiod-1-22-4.istio-system.svc:15012\" and cluster \"Kubernetes\""}
{"level":"info","time":"2024-10-11T15:41:53.899503Z","scope":"cache","msg":"generated new workload certificate","resourceName":"default","latency":256031280,"ttl":86399100507759}
{"level":"info","time":"2024-10-11T15:41:53.899573Z","scope":"cache","msg":"Root cert has changed, start rotating root cert"}
{"level":"info","time":"2024-10-11T15:41:53.899600Z","scope":"cache","msg":"returned workload certificate from cache","ttl":86399100400169}
{"level":"info","time":"2024-10-11T15:41:53.899831Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399100169619}
{"level":"info","time":"2024-10-11T15:41:53.900279Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399099721839}
{"level":"info","time":"2024-10-11T15:41:56.077673Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":1}
{"level":"info","time":"2024-10-11T15:41:53.643040Z","scope":"xdsproxy","msg":"Initializing with upstream address \"istiod-1-22-4.istio-system.svc:15012\" and cluster \"Kubernetes\""}
{"level":"info","time":"2024-10-11T15:41:53.899503Z","scope":"cache","msg":"generated new workload certificate","resourceName":"default","latency":256031280,"ttl":86399100507759}
{"level":"info","time":"2024-10-11T15:41:53.899573Z","scope":"cache","msg":"Root cert has changed, start rotating root cert"}
{"level":"info","time":"2024-10-11T15:41:53.899600Z","scope":"cache","msg":"returned workload certificate from cache","ttl":86399100400169}
{"level":"info","time":"2024-10-11T15:41:53.899831Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399100169619}
{"level":"info","time":"2024-10-11T15:41:53.900279Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399099721839}

istio/istio#37152 has a comment on this:

I think the 'overlapping' message in Istiod is the key - if I remember correctly, it happens if 2 connections with 
the same node ID connect to Istiod - and istio believes it failed to detect the RST/FIN on the first connection 
( a client should have a single active XDS connection), so it closes it. And it gets into a loop.

Please make sure you didn't cut&paste a bootstrap file in 2 places. Normally the generated node id has a unique name, based on pod.

This sounds connected to the change to make different xDS clients per service name connected to?

@marcel808
Copy link
Author

One issue that might be happening is that istiod is not expecting multiple xDS connections with the same node is, in our case it's set as:

            - name: ISTIO_BOOTSTRAP_OVERRIDE
              value: "/etc/istio/custom-bootstrap/custom_bootstrap.json"

                  NODE_ID="sidecar~${INSTANCE_IP}~${POD_NAME}.${POD_NAMESPACE}~cluster.local"

@marcel808
Copy link
Author

marcel808 commented Oct 11, 2024

In short, the istio-agent seems to be closing the xDS clients beyond the first due to the single node_id used. #7347 requires quite some understanding of both grpc-go and istio xDS as to know how multiple named clients would work, when the agent doesn't seem to be aware of the "name" part of the xDS clients.

This is the typical startup behavior of the istio agent where the first xDS connection works but all subsequent attempts get closed (https://github.com/istio/istio/blob/270710c2ec6495770a6f30e6616011719e580162/pkg/istio-agent/xds_proxy.go#L255) , hence the gRPC client call xds EOF failures:

{"level":"info","time":"2024-10-11T17:57:26.160505Z","scope":"xdsproxy","msg":"Initializing with upstream address \"istiod-1-22-4.istio-system.svc:15012\" and cluster \"Kubernetes\""}
{"level":"info","time":"2024-10-11T17:57:26.427983Z","scope":"cache","msg":"generated new workload certificate","resourceName":"default","latency":267051261,"ttl":86399572022083}
{"level":"info","time":"2024-10-11T17:57:26.428066Z","scope":"cache","msg":"Root cert has changed, start rotating root cert"}
{"level":"info","time":"2024-10-11T17:57:26.428100Z","scope":"cache","msg":"returned workload certificate from cache","ttl":86399571901173}
{"level":"info","time":"2024-10-11T17:57:26.428366Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399571634504}
{"level":"info","time":"2024-10-11T17:57:26.428802Z","scope":"cache","msg":"returned workload trust anchor from cache","ttl":86399571198514}
{"level":"info","time":"2024-10-11T17:57:33.536595Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":1}
{"level":"warn","time":"2024-10-11T17:57:33.808855Z","scope":"xdsproxy","msg":"registered overlapping stream; closing previous"}
{"level":"warn","time":"2024-10-11T17:57:33.809498Z","scope":"xdsproxy","msg":"registered overlapping stream; closing previous"}
{"level":"info","time":"2024-10-11T17:57:33.815859Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":2}
{"level":"info","time":"2024-10-11T17:57:33.816460Z","scope":"xdsproxy","msg":"connected to upstream XDS server: istiod-1-22-4.istio-system.svc:15012","id":3}
{"level":"warn","time":"2024-10-11T17:57:34.817348Z","scope":"xdsproxy","msg":"registered overlapping stream; closing previous"}
{"level":"warn","time":"2024-10-11T17:57:34.818000Z","scope":"xdsproxy","msg":"registered overlapping stream; closing previous"}

@easwars
Copy link
Contributor

easwars commented Oct 14, 2024

#7347 requires quite some understanding of both grpc-go and istio xDS as to know how multiple named clients would work, when the agent doesn't seem to be aware of the "name" part of the xDS clients.

The "name" is completely local to grpc and is not part of the xDS protocol. So, the name will not be communicated to the xDS peer.

Do you know why the istio-agent closes xDS clients beyond the first one? Can it be configured to allow multiple xDS clients with the same node ID? All xDS clients from the same gRPC binary should be using the same node ID.

@easwars easwars removed their assignment Oct 14, 2024
@marcel808
Copy link
Author

marcel808 commented Oct 14, 2024

Note sure; It might just an assumption that a pod should only open one xDS connection during its lifetime. Opening up a ticket on the istio project might clarify or provide some recommendations, linking back to the this ticket.

edit: istio/istio#53532

@easwars
Copy link
Contributor

easwars commented Oct 14, 2024

Thanks for filing the issue with istio. We do feel that it is a bug on the istio side to close xDS connections from gRPC other than the first one. Let's see what they say.

@bozaro
Copy link
Contributor

bozaro commented Nov 12, 2024

This issue is show-stopper for go-grpc upgrade. Are there some workaround?

@purnesh42H purnesh42H added the Area: xDS Includes everything xDS related, including LB policies used with xDS. label Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Status: Blocked Type: Bug
Projects
None yet
Development

No branches or pull requests

4 participants