gRPC Connection Reset

Problem Description
Experiencing a common issue with long-lived gRPC connections being reset after periods of inactivity. The example application initializes gRPC clients during startup and stores them in a configuration map. 

After approximately 38 minutes of inactivity, subsequent calls fail with:
UNAVAILABLE: io exception
Caused by: io.grpc.netty.shaded.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

16:26:53 - Successful gRPC call made
17:04:26 - Call fails with "Connection reset by peer" (after ~38 minutes of inactivity)

Client gRPC app and the Server gRPC are deployed on kubernetes and are on separate namespace. 

Example code:

```
@Configuration
public class ServiceConfig {
    private Map<String, ServiceGrpcClient> clientMap = new HashMap<>();

    @PostConstruct
    public void init() {
        // Initialize clients at startup
        hostMap.forEach((key, host) -> {
            ServiceGrpcClient client = new ServiceGrpcClient(host, portMap.get(key));
            clientMap.put(key, client);
        });
    }

    public ServiceGrpcClient getClient(String key) {
        return clientMap.get(key);
    }
}

@Slf4j
@Component
public class GenericServiceClient {

    @Value("${grpc.service.host}")
    private String host;

    @Value("${grpc.service.port}")
    private Integer port;

    private ManagedChannel channel;
    private ServiceStub blockingStub;

    public GenericServiceClient(String serverHost, int target) {
        this.targetPort = target;
        this.serverHost = serverHost;
        this.init();
    }

    public void init() {
        log.debug("Connecting to Service: {}, port:{}", this.serverHost, this.targetPort);
        this.blockingStub = ServiceStub.newBlockingStub(this.getManagedChannel());
    }

    private ManagedChannel getManagedChannel() {
        ManagedChannel managedChannel = ManagedChannelBuilder.forAddress(this.serverHost, this.targetPort)
                .usePlaintext().keepAliveTime(120, TimeUnit.SECONDS)
                .keepAliveTimeout(60, TimeUnit.SECONDS).build();
        return managedChannel;
    }

    public ServiceStub getBlockingStub() {
        return blockingStub;
    }
    
    //some other methods to interact with the service can be added here
}

```
Is it a best practice to create gRPC clients at the startup( eager loading) vs lazy ?
Are there best practices for handling idle connection timeouts in gRPC where the clients are created at the startup?
What are the recommended keepalive settings to prevent this issue?
Is there a way to detect stale connections before attempting to use them?
How do others handle this in production environments with firewalls and load balancers?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gRPC Connection Reset #12112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gRPC Connection Reset #12112

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions