Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding streams #99

Closed
Isola92 opened this issue Oct 3, 2019 · 6 comments
Closed

Questions regarding streams #99

Isola92 opened this issue Oct 3, 2019 · 6 comments

Comments

@Isola92
Copy link

Isola92 commented Oct 3, 2019

Hey!

We're currently having some issues with our backplane that we find hard to explain.

From time to time we see no clients receiving messages and this is usually resolved by rebooting the client. I have a limited understanding of streams but when testing locally it seems like I always need to restart the client after I restart the host. Otherwise the messages pushed into the stream from ClientGrain won't reach OrleansHubLifetimeManager.

In our production-like environment we have 2 machines with one client and one host on each. We do rolling updates with one machine at a time where we restart host first then client. We've had a similar issue appear now which was resolved by restarting the clients manually once again.
In another scenario the issue was resolved by itself when the machines were left idle for a while.
We use PubSubStore without persistence.

Any ideas or tips on how to approach this? Can find no errors or warnings in the logs so I will activate the debug flag on these environments and see if it appears again tomorrow.

Thanks for a great product.

Kind regards,
Olof

@stephenlautier
Copy link
Contributor

@Isola92 today we were currently experiencing a similar issue to be honest.

We recently updated from Orleans 2.3.5 to 2.4.2 and that seemed to be the issue; We still dont understand exactly 100% why tho and it seemed exactly like you said that the stream doesn't reach the OrleansHubLifetimeManager

Can you specify which Orleans version you use?

Not sure if you already enabled, but in order to help you debug enable logs "SignalR.Orleans": "Debug"
Also have you been able to reproduce it locally? Or on a cluster of 1? I only managed to reproduce it on our server 2-9 silos
An issue which we found sometimes throwing errors was dotnet/orleans#5993 and also there were some etag mismatches which was weird

@Isola92
Copy link
Author

Isola92 commented Oct 4, 2019

Thank you for sharing your experience and the tip regarding debug flag. Will update this issue as soon as we've got logs.

We also noticed the issue after upgrading to Orleans 2.4.2. Trying a downgrade now to see if it helps but will keep debugging this.

I have not managed to re-create it locally except for when restarting the Host without restarting Client (with the OrleansHubLifetimeManager) . But I think that's expected behaviour.

Kind regards,
Olof

@stephenlautier
Copy link
Contributor

Would be happy to help if you have more info of why this is happening, but for us it was pretty sure the Orleans 2.4.2, but I would like to know exactly why - in the coming days we will also try to update again however we had issues on prod so we had to down version Orleans

I have not managed to re-create it locally except for when restarting the Host without restarting Client (with the OrleansHubLifetimeManager) . But I think that's expected behaviour.

If you kill the OrleansHubLifetimeManager, it shouldn't be the same behavior as connections (clientside) will get disconnected, and they will reconnect (if you have any reconnection strat on client)

@Isola92
Copy link
Author

Isola92 commented Oct 7, 2019

Hey again! Done some more debugging and it seems like our current main issue is related to streams and not this library.

According to some posts I've found in the Orleans repository, this is supposed to work:

  • Set up one client connected to a cluster of two silos.
  • Restart one silo and wait for it to start up properly again.
  • Restart the other silo.

But when adding some code locally I can see that _serverStream.GetAllSubscriptionHandles() returns a list of 0 after the restarts. Most common explanation for this appears to be a diconnect between the client and the cluster but I can find no indication that this has happened.

Adding a call to SetupStreams() in OrleansHubLifetimeManager when this happens fixes the issue for us. But it's not very elegant since it fixes the symptom rather than the underlying cause.

This refresh behaviour can't really be triggered from outside of OrleansHubLifetimeManager from what I can see. Do you have any suggestions for a stream refresh strategy? If I can't find the underlying issue I might need to change this class manually.

This won't be a problem in general because we will restart clients after hosts during rolling updates but I want to be able to restart specific silos without worrying about this.

@stephenlautier
Copy link
Contributor

Can you try and use PubSubStore with persistence? Because from the flow you mentioned, I believe you should have in order for it to work 100%.

Basically for reliability there two parts, PubSubStore and the SignalROrleans Grain state, in order to have reliability you need to have both persisted otherwise if the silo which gets restarted contains parts of data which is crucial on its MemoryGrainState, it will be lost e.g. publishers/subscribers for PubSubStore and UserGroup/Group connections/serverIds for the grain state

@Isola92
Copy link
Author

Isola92 commented Oct 8, 2019

It's worth trying for sure. Thank you for the suggestion. I fully grasp the issue with SignalROrleans grain state and why persistance is needed there. I'm just slightly confused by the stream documentation which doesn't clearly state what works or not.

I'm worried about the performance impact when keeping the streams persistent. We previously had a large amount of subscribers to one particular stream and it had a huge impact on performance. But it looks like this library will distribute the load on various streams a bit better. We will try it out and see.

I'm closing this issue. Thanks for all the help.

Kind regards,
Olof

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants