Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running service occasional prompts “com.alibaba.nacos.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception” #3597

Closed
chengyouling opened this issue Jan 17, 2024 · 9 comments
Labels

Comments

@chengyouling
Copy link

主要依赖及版本:
spring-cloud-gateway-3.1.4
spring-cloud-starter-alibaba-nacos-discovery-2021.0.4
spring-cloud-starter-alibaba-nacos-config-2021.0.4
nacos-client-2.1.2
nacos-server-2.1.0

场景:gateway服务能正常启动,但是偶尔会有com.alibaba.nacos.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception异常,8848、9848端口都是正常的

具体异常堆栈信息:
2024-01-15 08:02:52.739 ERROR {"appName":"bigdata-di","thread":"nacos-grpc-client-executor-myj-hngz-prod-cse-mdszh2.nacos.cse.com-31165","className":"com.alibaba.nacos.common.utils.LoggerUtils","methodName":"printIfErrorEnabled","codeLine":"102"}|-> [1705274581653_198.19.131.62_3051]Request stream error, switch server,error={}
com.alibaba.nacos.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at com.alibaba.nacos.shaded.io.grpc.Status.asRuntimeException(Status.java:539)
at com.alibaba.nacos.shaded.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487)
at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:563)
at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:744)
at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:723)
at com.alibaba.nacos.shaded.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at com.alibaba.nacos.shaded.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:258)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 common frames omitted
2024-01-15 08:02:52.740 INFO {"appName":"bigdata-di","thread":"com.alibaba.nacos.client.remote.worker","className":"com.alibaba.nacos.common.utils.LoggerUtils","methodName":"printIfInfoEnabled","codeLine":"63"}|-> [095cfdd8-092c-4e52-8707-23a05612aef6] Try to reconnect to a new server, server is not appointed, will choose a random server.
2024-01-15 08:02:52.742 INFO {"appName":"bigdata-di","thread":"com.alibaba.nacos.client.remote.worker","className":"com.alibaba.nacos.common.remote.client.grpc.GrpcClient","methodName":"createNewManagedChannel","codeLine":"182"}|-> grpc client connection server:xxx.xxx.xx.xx ip,serverPort:9848,grpcTslConfig:{"sslProvider":"","enableTls":false,"mutualAuthEnable":false,"trustAll":false}
2024-01-15 08:02:52.887 INFO {"appName":"bigdata-di","thread":"com.alibaba.nacos.client.remote.worker","className":"com.alibaba.nacos.common.utils.LoggerUtils","methodName":"printIfInfoEnabled","codeLine":"63"}|-> [095cfdd8-092c-4e52-8707-23a05612aef6] Success to connect a server [xxx.xxx.xx.xx:8848], connectionId = 1705276972764_198.19.130.14_1073

求助:现在不知道定位方向,帮忙指导下,谢谢

@yuluo-yx
Copy link
Collaborator

网络波动问题?排查下 nacos 9848 端口看看

如果是容器部署的话,可以尝试把 9848 9849 8848 端口都放开看看(nacos 的端口偏移量是 1000 和 1001)

可以提供一个复现 demo,本地不好复现没办法验证问题

@chengyouling
Copy link
Author

chengyouling commented Jan 18, 2024

容器部署的,9848 9849 8848 端口都放开了,本地确实不好复现,这个异常是偶尔发生的,主要是看到了Caused by: java.io.IOException: Connection reset by peer这个提示,感觉是服务端或者客户端的某一方主动关闭了连接,怀疑是不是客户端和服务端的不匹配导致的呢?这两个版本是否存在兼容性问题?
nacos-client-2.1.2
nacos-server-2.1.0

@yuluo-yx
Copy link
Collaborator

容器部署的,9848 9849 8848 端口都放开了,本地确实不好复现,这个异常是偶尔发生的,主要是看到了Caused by: java.io.IOException: Connection reset by peer这个提示,感觉是服务端或者客户端的某一方主动关闭了连接,怀疑是不是客户端和服务端的不匹配导致的呢?这两个版本是否存在兼容性问题? nacos-client-2.1.2 nacos-server-2.1.0

nacos-client-2.1.2 nacos-server-2.1.0 不是很确定有没有兼容性问题,你可以参考 sca 的推荐的版本组件 https://github.com/alibaba/spring-cloud-alibaba/wiki/%E7%89%88%E6%9C%AC%E8%AF%B4%E6%98%8E

可以试着从 spring-cloud-starter-alibaba-nacos-discovery中排除现有的 nacos-client 依赖,引入和 nacos-server 一致的看看

<dependency>
	<groupId>com.alibaba.cloud</groupId>
	<artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
	<exclusions>
		<exclusion>
			<groupId>com.alibaba.nacos</groupId>
			<artifactId>nacos-client</artifactId>
		</exclusion>
	</exclusions>
</dependency>

<dependency>
	<groupId>com.alibaba.nacos</groupId>
	<artifactId>nacos-client</artifactId>
	<version>${nacos.client}</version>
</dependency>

@ruansheng8
Copy link
Collaborator

客户端与服务端通信中间是否有经过 Nginx 或其他代理转发

@chengyouling
Copy link
Author

客户端与服务端通信中间是否有经过 Nginx 或其他代理转发

没有的,都是直接通过sdk连接的Nacos-server,本地又很难复现这个错误,但是部署到容器中就会偶尔有这个问题。

@chengyouling
Copy link
Author

@ruansheng8
这个现象奇怪就在于没有规律,其他业务组件也没有类似的报错,就集成spring-cloud-gateway的这个组件偶尔报错一两次,有没有可能grpc跟gateway组件有什么冲突呢?不知道之前有没有类似的问题?

@ruansheng8
Copy link
Collaborator

@chengyouling 本地可以用 Maven Helper 看看是否有依赖冲突,或者咨询一下 Nacos 社区之前是否有遇到过类似的情况。

Copy link

This issue has been open 30 days with no activity. This will be closed in 7 days.

@liuyuchuan
Copy link

nacos-client-2.1.0
nacos-server-2.0.3
也有此问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants