Replies: 12 comments 8 replies
-
Client version?
Can you please post some logs? With this kind of behaviour? We tested the connection lost in different scenarios; it would be helpful to understand why the client could not reconnect to the server. |
Beta Was this translation helpful? Give feedback.
-
Maybe we can add a callback with the status Btw you should have the messages in the messages in not confirmed status in the
|
Beta Was this translation helpful? Give feedback.
-
Thanks for responding so quickly. Here is the log
The lines prefixed with !!!! are written in the ConfirmationHandler of the producer when it fails. I haven't been able to find a terminating condition in the source code with an upper limit for how long it should stay in reconnecting "mode". When trying to implement the custom IReconnectStrategy, returning false in WhenDisconnected doesn't raise any exceptions or change any state that is readable from the outside of my consumer. IsOpen() always returns true, it seems. The way I am testing is quite simple. I connect a producer and consumer in the same app with more or less default settings. (This log is with the default retry handling). Then I disconnect from WiFi on my laptop. What is happening is that the application will continue running, but failing silently. I would like it to throw an exception after a timeout or some other way to handle this scenario. Specifically, we use Streams in our containers, and if we would be able to add a heartbeat on the connection, our orchestrator would restart the container. But in this case, we had a service that ran over the weekend with a passing health check. In other words, I am looking for a property to query for the connetion status. In terms of the ConfirmationHandler, I did indeed find that for the publisher. I have yet to investigate whether we need to add custom logic around this. My initial survey seems to indicate that the publisher will eventually fail, but we cannot determine whether or not a message was actually sent other than doing confirmation collection in this handler. Out of the box, the service sends message into oblivion for a short period until detected in the confirmationhandler. This might be a problem for Queue->Stream services where you'd want to ACK messages on the queue synchroniously. However, this is solvable currently. But would require a custom implementation for every user as far as I can see. My main problem right now is with the consumer. Ideally, we would have a healthcheck on when we last received a message, but we have cases where messages are sporadic and it's hard to set a good timeout interval. Here, we would like to restart the container if the underlying stream connection had been in "retrying" for more than, say, 5 minutes. |
Beta Was this translation helpful? Give feedback.
-
I am personally open to many different ways of getting the information into the hands of the user. Technically, I could do it with reflection, but that is iffy. If both Producer and Consumer exposed a ConnectionState enum e.g. { Initialized, Connected, Reconnecting, TimedOut, Closed } that would solve it for me. |
Beta Was this translation helpful? Give feedback.
-
Thank you for all the info. |
Beta Was this translation helpful? Give feedback.
-
So basically, you are looking for a way to understand the producer/consumer status. |
Beta Was this translation helpful? Give feedback.
-
Per conversation with @Zerpet and @acogoluegnes. The idea would be to implement a sort of event bus where the stream system receives general events. cc @TroelsL |
Beta Was this translation helpful? Give feedback.
-
General info: We made some decisions on the Producer and Consumer side to make it as reliable as possible. We expected that some use cases could not be covered. There are still raw-level classes where you can implement your behaviour in case of disconnection, metadata update and messages confirmation and consumer disconnection. The event bus idea is transparent to the current implementation. The best approach in this case is:
Users tend to suggest a solution for their use case. but we have to think about the big picture. We want to analyze one problem at a time to see how to solve it. We will start with the upgrade scenario. Thank you |
Beta Was this translation helpful? Give feedback.
-
We tried different fail-over scenarios like:
Producers and Consumers can reconnect to the server. Let me add that we do these kinds of tests regularly. about:
Looking the logs it seems that the client can't actually lookup the host |
Beta Was this translation helpful? Give feedback.
-
Just to be clear - the log you received is from an MRE, not our production system. We are still investigating what would cause our service not being able to reconnect for several days. But that is not what our main problem is - our problem is not being able to detect this. I will do another test to double check my statements regarding the IsOpen() method and the ReconnectStrategy. I will let you know if my findings. (And just to be clear - I have no expectation to dictate a solution here. I am only proposing examples of what would solve the issue we are seeing in hopes of providing a bit more detail) |
Beta Was this translation helpful? Give feedback.
-
@TroelsL please keep under control che 1.8 version. We added the In the next PRs, we add an event when the status changes. So you will be aware of what is happening. |
Beta Was this translation helpful? Give feedback.
-
Will close the issue. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is your feature request related to a problem? Please describe.
We are using the client in production, and we ran into an issue where we have lost connectivity, but the reliable consumer keeps retrying indefinitely, and we have discovered no immediate way to detect that we have a serious problem.
We have looked at the source code and attempted to create a custom ReconnectStrategy, but haven't readily been able to find a good solution.
Please feel free to propose a user-space fix for this as well. This is quite critical to us that we can detect message losses. However, this is also a request to add documentation describing how to handle what I assume is a very common scenario. The answer may very well be that we should be using the RawProducer/Consumers instead of the Reliable ones.
Describe the solution you'd like
In the event of a longer disconnect from the server, I want to raise an exception or at least be able to get a status on the fact that I am not connected to any stream.
We tried producer.IsOpen(), !System.IsClosed, etc. But they all seem to show no sign of trouble. Private member Producer._inReconnection seems to contain some information.
Some advice on this would be greatly appreciated.
Describe alternatives you've considered
No response
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions