Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: introduce pool to manage multiple xDS clients with same bootstrap content #7898

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Dec 4, 2024

Fixes: #7592

RELEASE NOTES:

  • xds: fixed a bug which sometimes prevented a server to be used with multiple clients with different configurations.

@purnesh42H purnesh42H added this to the 1.69 Release milestone Dec 4, 2024
@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch from 202847e to b561f1c Compare December 4, 2024 20:23
Copy link

codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 85.22727% with 13 lines in your changes missing coverage. Please review.

Project coverage is 81.97%. Comparing base (4c07bca) to head (43b394d).
Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/xdsclient/pool.go 78.94% 8 Missing and 4 partials ⚠️
xds/internal/xdsclient/client_new.go 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7898      +/-   ##
==========================================
+ Coverage   81.84%   81.97%   +0.12%     
==========================================
  Files         377      378       +1     
  Lines       38120    38216      +96     
==========================================
+ Hits        31201    31326     +125     
+ Misses       5603     5578      -25     
+ Partials     1316     1312       -4     
Files with missing lines Coverage Δ
xds/internal/resolver/xds_resolver.go 79.38% <ø> (ø)
xds/internal/xdsclient/client_refcounted.go 83.33% <100.00%> (+1.28%) ⬆️
xds/internal/xdsclient/clientimpl_dump.go 100.00% <100.00%> (ø)
xds/server.go 82.94% <100.00%> (-0.10%) ⬇️
xds/server_options.go 100.00% <100.00%> (ø)
xds/internal/xdsclient/client_new.go 86.66% <66.66%> (+0.30%) ⬆️
xds/internal/xdsclient/pool.go 78.94% <78.94%> (ø)

... and 32 files with indirect coverage changes

@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch 2 times, most recently from 23023e9 to 12cb8a7 Compare December 4, 2024 21:16
@purnesh42H purnesh42H requested a review from easwars December 4, 2024 21:17
@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch from 12cb8a7 to 8787929 Compare December 5, 2024 04:37
@purnesh42H purnesh42H modified the milestones: 1.69 Release, 1.70 Release Dec 5, 2024
@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch from 8787929 to aa7e498 Compare December 5, 2024 16:30
var (
// DefaultPool is the default pool for xds clients. It is created at the
// init time.
DefaultPool *Pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the reason to export this DefaultPool variable? Was it to make testing easier? I see most reference of this variable is within the xdsclient package, with only one reference in xds/server_options.go.
Would it be possible to test via Public APIs instead?

Copy link
Contributor Author

@purnesh42H purnesh42H Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah server options have to set and get config for testing. I think we can provide the set and get method for default pool but then its equivalent of exporting a variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @danielzhaotongliu. We don't have to make this public. We should change the dial option and the server option to create a pool with the bootstrap config that was passed to it, and use that pool to create the xDS client. This is specified in the proposal in #7592 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just coming back to this comment. I think we would have to export the default pool from the xdsclient package or use the internal package trick, because we want the xDS resolver and the xDS server to be able to use the default pool when they want to create xDS clients in non-test environments.

The internal trick won't work once we move the xDS client out into a separate repo (as part of the generic xDS client work). So, we are only left with the option of exporting this.

Do you both have other ideas?

@@ -46,7 +46,8 @@ const Scheme = "xds"

// newBuilderWithConfigForTesting creates a new xds resolver builder using a
// specific xds bootstrap config, so tests can use multiple xds clients in
// different ClientConns at the same time.
// different ClientConns at the same time. It creates the new xds client int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo/nit: "It creates a new xds client in the default pool with the provided config."

// expected to invoke once they are done using the client. It is safe for the
// caller to invoke this close function multiple times.
func (p *Pool) NewClient(name string) (XDSClient, func(), error) {
config, err := bootstrap.GetConfiguration()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the reason to not use the Pool.config field xDS bootstrap configuration as that was the config passed in/parsed/created during the NewPool function above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewClient is for the prod path which actually reads the bootstrap config from env vars and will only deal with default pool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the use case where users ONLY set bootstrap config from the SetFallbackBootstrapConfig API and do not set env var? For this case, when creating the xDS client, would it be created via this method? Would they not require p.config instead since that is the field set from the implementation below (line 133-143)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method must use the config field inside the Pool. There is no point in reading the bootstrap configuration from the env vars or the fallback bootstrap configuration. In fact, once we do that, we don't have a reason to have Pool.NewClientForTesting. Tests could as well use pool.NewClient.

What do you think?

NewClient is for the prod path which actually reads the bootstrap config from env vars and will only deal with default pool

The bootstrap env vars should be read when the default pool is created at init time. And if bootstrap env vars are not set, the default pool should have a nil config and if NewClient is called without calling SetFallbackBootstrapConfig, i.e. config is nil when NewClient is called, then NewClient should fail with a useful error message.

if err != nil {
return nil, nil, fmt.Errorf("xds: failed to get xDS bootstrap config: %v", err)
}
return newRefCounted(name, p, config, defaultWatchExpiryTimeout, defaultIdleChannelExpiryTimeout, backoff.DefaultExponential.Backoff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding, would there be a need now/or in the future for users or anyone to customize the expiry timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only for testing. We usually provide the testing knobs set this values via methods but only for testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #7924, we no longer have the idle channel expiry timeout. We only have the watch expiry timeout. And so far, we haven't had any real usecase where a user wants to customize that expiry timeout.

if opts.Contents != nil {
config, err := bootstrap.NewConfigForTesting(opts.Contents)
if err != nil {
p.mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, where is this p.mu.Lock() called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good catch. That was supposed to be removed. I removed it.

@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch from 2ab7ba2 to 6e822a3 Compare December 6, 2024 09:42
@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch 2 times, most recently from 4e62279 to 43b394d Compare December 6, 2024 18:47
Copy link
Contributor

@danielzhaotongliu danielzhaotongliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not be familiar enough with the Github testing process for gRPC Go, have we tested this end-to-end, or would the current tests suffice? I recall there are (PSM) interop tests, are those run post PR submission or manually triggered?

Are there any performance implications of this change?

// expected to invoke once they are done using the client. It is safe for the
// caller to invoke this close function multiple times.
func (p *Pool) NewClient(name string) (XDSClient, func(), error) {
config, err := bootstrap.GetConfiguration()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the use case where users ONLY set bootstrap config from the SetFallbackBootstrapConfig API and do not set env var? For this case, when creating the xDS client, would it be created via this method? Would they not require p.config instead since that is the field set from the implementation below (line 133-143)?

)

var (
// DefaultPool is the default pool for xds clients. It is created at the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Here and everywhere in this PR, please replace occurrences of xds client(s) with xDS client(s). We've been trying to be consistent with xDS client(s) recently. So, let's stick to that. Thanks.

DefaultPool *Pool
)

// Pool represents pool of xds clients.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring should mention that the pool of clients share the same bootstrap configuration.

Comment on lines +38 to +45
mu sync.Mutex
clients map[string]*clientRefCounted
config *bootstrap.Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: mu should ideally only have to guard clients. But here, we need it to guard config as well since SetFallbackBootstrapConfig writes to config. Maybe a comment here to mention that?

var (
// DefaultPool is the default pool for xds clients. It is created at the
// init time.
DefaultPool *Pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @danielzhaotongliu. We don't have to make this public. We should change the dial option and the server option to create a pool with the bootstrap config that was passed to it, and use that pool to create the xDS client. This is specified in the proposal in #7592 (comment)

}

// NewPool creates a new xds client pool with the given bootstrap contents.
func NewPool(bootstrapContents []byte) (*Pool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer to have this accept a *bootstrap.Config (which could optionally be nil) rather than having to parse the configuration here. With that change, this API will return a single value, *Pool, instead of two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we change this to accept a bootstrap.Config, we should also export bootstrap.newConfigFromContents. This would allow non-test usages for NewPool where the caller would use the exported bootstrap.newConfigFromContents to parse the JSON config and pass it to this function.

@@ -360,7 +360,7 @@ func (s) TestHandleListenerResponseFromManagementServer(t *testing.T) {
// involving receipt of an RDS response from the management server. The test
// verifies that the internal state of the xDS client (parsed resource and
// metadata) matches expectations.
func (s) TestHandleRouteConfigResponseFromManagementServer(t *testing.T) {
func TestHandleRouteConfigResponseFromManagementServer(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the s receiver being removed from this test?

WatchExpiryTimeout: defaultTestWatchExpiryTimeout,
Contents: bc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should be using a pool to create an xDS client in. In fact, once we delete these NewXxx functions which don't operate on a pool, the tests will be left with no other choice.

xds/server.go Outdated
@@ -93,12 +93,11 @@ func NewGRPCServer(opts ...grpc.ServerOption) (*GRPCServer, error) {
// simplifies the code by eliminating the need for a mutex to protect the
// xdsC and xdsClientClose fields.
newXDSClient := newXDSClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having this newXDSClient var which can be overridden by tests to customize xDS client creation, the xDS server (and possibly the xDS resolver) should contain an xdsClientPool as a global var which would be set to the default pool at init time and can be customized by tests.

@@ -121,14 +121,18 @@ func (s) TestNewServer_Success(t *testing.T) {
desc: "without_xds_creds",
serverOpts: []grpc.ServerOption{
grpc.Creds(insecure.NewCredentials()),
BootstrapContentsForTesting(generateBootstrapContents(t, uuid.NewString(), nonExistentManagementServer)),
func() grpc.ServerOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this func here?

// matched route configuration. After it transitions back into not ready
// (through an explicit LDS Resource Not Found), previously running RPC's
// should be gracefully closed and still work, and new RPC's should fail.
func (s) TestServingModeChanges_MultipleServers(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to test serving mode changes here. All we need to do here is to ensure that we can create two servers with two different bootstrap configurations. This would also mean that the two servers will request two different LDS resources from the management server, because the LDS resource name depends on the port on which the server is listening.

Then we could have two gRPC clients that talk to each of these servers and we ensure that RPCs work. We need to ensure that a gRPC client is actually talking to the server that it intends to talk to. This can be done with the Peer call option.

@easwars easwars assigned purnesh42H and unassigned easwars Dec 13, 2024
@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch 3 times, most recently from ee9dcb0 to 6d704a9 Compare December 20, 2024 11:33
@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch from 6d704a9 to 8cade2a Compare December 21, 2024 22:18
@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch 3 times, most recently from 4b295c9 to 9588254 Compare December 21, 2024 23:04
@purnesh42H purnesh42H force-pushed the xdsclient-pool-per-bootstrap-content branch from 9588254 to 325def2 Compare December 21, 2024 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xds: v1.66.0 regression in xds.BootstrapContentsForTesting
3 participants