Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CasbahPersistenceReadJournaller robustness of live events under DB failure #157

Closed
scullxbones opened this issue Jun 27, 2017 · 5 comments
Closed

Comments

@scullxbones
Copy link
Owner

Bringing discussion over from #156

This seems to be the case also for read journaller, the code does not have any error handling so it's not very resilient.

This line appears to be only executed once:

https://github.com/scullxbones/akka-persistence-mongo/blob/master/casbah/src/main/scala/akka/contrib/persistence/mongodb/CasbahPersistenceReadJournaller.scala#L137

So if the cursor fails it will never be recreated:

https://github.com/scullxbones/akka-persistence-mongo/blob/master/casbah/src/main/scala/akka/contrib/persistence/mongodb/CasbahPersistenceReadJournaller.scala#L122

And any new queries using the realtime stream will never receive new journal events.
This should be rebuilt in a way that would recover in case of a database connection drop.

@scullxbones any preference on how this could be implemented?

There are a few options, a simple single threaded scheduled execution context that can schedule recreation of the cursor on failures with a delay, or an actor based one that keeps the state of the connection/cursor running, or whatever...

@yahor-filipchyk
Copy link

@scullxbones any update on this and related issues? Or do you have any tips on to how detect such DB failures so we can take some action (like restarting the service)?

@scullxbones
Copy link
Owner Author

Hi @yahor-filipchyk - this ticket has totally stalled out since a year ago. I don't have a great detection approach, but I also don't see issues in production so I haven't had to develop one. That said I'm primarily using on-demand streams rather than live ones, so I'm not sure I'm really exercising the code with issues.

Is there anything you're specifically seeing that hasn't already been covered in this or #156?

@yahor-filipchyk
Copy link

Sorry for the late response @scullxbones. #156 probably covers the issue. Just wanted to check if someone has any plans of fixing this (I don't know how you guys prioritize issues). It isn't bothering us all that much but when it happens it's quite frustrating. Maybe we'll invest some time in trying to contribute after all 😉

@scullxbones
Copy link
Owner Author

Hi @yahor-filipchyk - I'm happy to review PRs, and to get this un-stalled. I'm leaning more and more to simplifying the live streams, at the cost of more load on mongodb via #199. I think that should eliminate this cluster of issues around the single shared tailing cursor.

@scullxbones
Copy link
Owner Author

Fixed by #199

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants