Skip to content

Conversation

vinothkumarr227
Copy link
Contributor

Fixes: #7049

RELEASE NOTES: None

Copy link

codecov bot commented Sep 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.87%. Comparing base (7235bb7) to head (40a938e).
⚠️ Report is 35 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8602      +/-   ##
==========================================
+ Coverage   81.86%   81.87%   +0.01%     
==========================================
  Files         415      416       +1     
  Lines       40694    40789      +95     
==========================================
+ Hits        33316    33398      +82     
- Misses       5993     6022      +29     
+ Partials     1385     1369      -16     

see 57 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 189 to 193
"",
"unix://a/b/c",
"unix://authority",
"unix-abstract://authority/a/b/c",
"unix-abstract://authority",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not remove these test cases. Since NewClient doesn't connect like Dial, we need to do the following to get an error instead:

  1. Call cc.Connect()
  2. Wait for TRANSIENT_FAILURE state using AwaitState. If using this function causes a circular dependency, copy it over to this file, adding a comment that mentioning the reason for the duplication.
  3. If context times out before TF state, fail the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using Idle because these malformed targets result in zero resolved addresses. The client never attempts a connection, so it stays IDLE and cannot reach TRANSIENT_FAILURE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an INFO log similar to the following from the unix resolver:

INFO clientconn.go:333 [core] [Channel #5] Channel failed to exit idle mode: invalid (non-empty) authority: authority (t=+1.01919ms)

Since this is the only indication that the connection attempt failed (no errors are returned), I think this should be logged as an ERROR. Presently, the method responsible for channelz trace event logging is logging everything at as an INFO log. We could have at accept the severity as an additional param to achieve this. Once this is an error log, we can assert that the error is logged similar to the following.

grpctest.ExpectError("Client received GoAway with error code ENHANCE_YOUR_CALM and debug data equal to ASCII \"too_many_pings\"")

@dfawley do you think we should log the failure to exit idle mode in cc.Connect() as an ERROR instead of INFO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two options:

  1. Change the unix name resolver not to error from Build when this happens. Instead, have it produce a ResolverError and otherwise do nothing.

  2. Handle resolver.Build errors so that all name resolvers that error during Build will instead effectively produce a ResolverError. This fixes that class of problem in the same way for all name resolvers.

Either way, RPCs will fail with that error message text, and we shouldn't need anything besides INFO (or even V2?) logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised a PR with the proposed change: #8643.

@vinothkumarr227 you can leave this test unchanged in this PR, I'll update the test at part of #8643.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After investigating, I confirmed that RPCs already fail with the correct error message. The connection is attempted when the stream creation function calls exitIdle, as seen here:

grpc-go/stream.go

Lines 180 to 186 in 2d92271

func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, method string, opts ...CallOption) (_ ClientStream, err error) {
// Start tracking the RPC for idleness purposes. This is where a stream is
// created for both streaming and unary RPCs, and hence is a good place to
// track active RPC count.
if err := cc.idlenessMgr.OnCallBegin(); err != nil {
return nil, err
}

Therefore, no changes are needed to propagate the error. While cc.Connect itself cannot return an error due to its signature, any resolver build failure is correctly surfaced to the user during the next RPC.


This test can be changes as follows to create a new stream and validate the error.

			cc, err := NewClient(target, WithTransportCredentials(insecure.NewCredentials()))
			if err != nil {
				t.Fatalf("NewClient(%q) failed: %v", target, err)
			}
			defer cc.Close()
			wantErrSubstr := "failed to exit idle mode"
			if _, err := cc.NewStream(ctx, &StreamDesc{}, "/my.service.v1.MyService/UnaryCall"); err == nil {
				t.Fatalf("NewStream() succeeded with target = %q, cc.parsedTarget = %+v, expected to fail", target, cc.parsedTarget)
			} else if !strings.Contains(err.Error(), wantErrSubstr) {
				t.Fatalf("NewStream() with target = %q returned unexpected error: got %v, want substring %q", target, err, wantErrSubstr)
			}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I have modified the code changes accordingly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the team meeting, we discussed this and decided that we should keep the origin test that uses Dial and add another test that uses NewClient. Can you please add the original test back, keeping the updated one also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@arjan-bal arjan-bal added the Type: Internal Cleanup Refactors, etc label Sep 25, 2025
@arjan-bal arjan-bal added this to the 1.77 Release milestone Sep 25, 2025
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
cc, err := NewClient(target, WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are changing the check condition here , is it intentional? If it is , the error message isn't correct. We are checking err!=nil which means NewClient failed and the error message says NewClient succeeded which is not right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

cc, err := Dial(test.target, WithTransportCredentials(insecure.NewCredentials()), WithContextDialer(dialer))
cc, err := NewClient(test.target, WithTransportCredentials(insecure.NewCredentials()), withDefaultScheme(defScheme), WithContextDialer(dialer))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should override the default scheme to passthrough with new client because we want to check the behavior of parsed strings with defualt NewClient function with custom dialer. And the behavior might be different , because dns resolver also does its own parsing , so the results might be different. Correct me if I have understood incorrectly. But I think we should have a separate test for NewClient's default parsed address behavior with custom dialer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback, I used withDefaultScheme only in the specific tests where it’s needed, as suggested by Arjan. For checking the default NewClient behavior with a custom dialer, I’m not overriding the scheme. If you’d like a separate test specifically for NewClient's default behavior, let me know and I can add it.

if err != nil {
t.Fatalf("Dial failed. Err: %v", err)
}
client.Connect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we add this here? We did not change from Dial to NewClient in the test and Dial will connect automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change, I reverted it back.

Comment on lines 189 to 193
"",
"unix://a/b/c",
"unix://authority",
"unix-abstract://authority/a/b/c",
"unix-abstract://authority",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an INFO log similar to the following from the unix resolver:

INFO clientconn.go:333 [core] [Channel #5] Channel failed to exit idle mode: invalid (non-empty) authority: authority (t=+1.01919ms)

Since this is the only indication that the connection attempt failed (no errors are returned), I think this should be logged as an ERROR. Presently, the method responsible for channelz trace event logging is logging everything at as an INFO log. We could have at accept the severity as an additional param to achieve this. Once this is an error log, we can assert that the error is logged similar to the following.

grpctest.ExpectError("Client received GoAway with error code ENHANCE_YOUR_CALM and debug data equal to ASCII \"too_many_pings\"")

@dfawley do you think we should log the failure to exit idle mode in cc.Connect() as an ERROR instead of INFO.

@arjan-bal arjan-bal assigned dfawley and vinothkumarr227 and unassigned arjan-bal Oct 6, 2025
@dfawley dfawley removed their assignment Oct 6, 2025
@vinothkumarr227 vinothkumarr227 removed their assignment Oct 14, 2025
@arjan-bal arjan-bal self-assigned this Oct 16, 2025
Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. Please address the pending comments and I'll add a second reviewer.

t.Fatalf("Dialing without a credential did not fail")
// Ensure the NewClient fails without credentials
if _, err := NewClient("fake"); err == nil {
t.Fatalf("grpc.NewClient without a credential did not fail")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please use NewClient instead of grpc.NewClient in the log, i.e. omit the package name. Change this throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 189 to 193
"",
"unix://a/b/c",
"unix://authority",
"unix-abstract://authority/a/b/c",
"unix-abstract://authority",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the team meeting, we discussed this and decided that we should keep the origin test that uses Dial and add another test that uses NewClient. Can you please add the original test back, keeping the updated one also?

@vinothkumarr227 vinothkumarr227 removed their assignment Oct 16, 2025
@arjan-bal arjan-bal self-assigned this Oct 17, 2025
noTSecStr := "no transport security set"
if _, err := NewClient("fake", internal.DisableGlobalDialOptions.(func() DialOption)()); !strings.Contains(fmt.Sprint(err), noTSecStr) {
t.Fatalf("grpc.NewClient received unexpected error: %v, want error containing \"%v\"", err, noTSecStr)
t.Fatalf("NewClient received unexpected error: %q, want error containing %q", err, noTSecStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format errors with %v, %q should be used for strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

noTSecStr := "no transport security set"
if _, err := NewClient("dns:///fake"); !strings.Contains(fmt.Sprint(err), noTSecStr) {
t.Fatalf("grpc.NewClient received unexpected error: %v, want error containing \"%v\"", err, noTSecStr)
t.Fatalf("NewClient received unexpected error: %q, want error containing %q", err, noTSecStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format errors with %v, %q should be used for strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vinothkumarr227 vinothkumarr227 removed their assignment Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update docs and examples and tests to use NewClient instead of Dial

5 participants