- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 98
Open
Description
Now that we have the topology API, here are the steps required towards automatic reconnection:
- add an
Option<Consumer>
to the wrappedbasic_consume
, set it to None from the public wrapper, store the Option in the state and use it to restore everything in the given Consumer if we got Some.introduce an InternalTopology that stores Connection/Channel/Consumer objects alongside the topology itemsAdd some conversion between InternalTopology and Topology; dropping the associated itemschange topology methods to return the Internaltopology, and make the current one use that and convert to public Topology.in the same way, introduce some restore_internal, make restore use it, and pass the Options stored in the InternalTopology to basic_consume and friendshook up basic get in InternalTopology + restore_internaladd anOption<Channel>
to the channel creation to share internals with the Channel we want to restore, set it to None, but use it when finalizing if it's Some.add anOption<InternalTopology>
set to None to the connection process, and use it when it's some to restore_internaldetect network failure from the event loop, and instead of bubbling it up, call topology_internal, reinitiate connection with Some(InternalTopology)To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
syrm, mikroskeem, retriku, feniljain, mbrekhov and 6 morevipera, omerbenamram, Miaxos, vidyli, nbari and 10 more
Activity
robo-corg commentedon Mar 17, 2021
Would you be interested in a PR for this?
Keruspe commentedon Mar 17, 2021
Sure.
Otherwise I plan to work on this this summer once 2.0 is out
kageru commentedon Sep 19, 2022
Any update on this?
Automatic reconnects would be really useful for me. I’d even try to contribute if something specific is missing.
Ks89 commentedon Dec 29, 2022
I'm also interested on this feature
Keruspe commentedon Dec 29, 2022
I'd be willing to take some sponsorship to work on this
TroyKomodo commentedon May 2, 2023
Something like this
https://gist.github.com/TroyKomodo/0e746b9dd2b5e4618af2a1b92a6efaf9
carlhoerberg commentedon May 24, 2024
We're willing to sponsor this, plz email me at carl@cloudamqp.com
Keruspe commentedon Jul 11, 2024
Progress is being made on this front, initial version should be coming this summer
Keruspe commentedon Aug 4, 2024
Small update on this front:
I've slightly reworked my approach for this now that I could actually spend time on this (thanks to @carlhoerberg and CloudAMQP support).
I fixed a few bugs in the TCP loop that will be required for this to work properly.
I'm working on handling AMQP "soft" errors (e.g. errors local to 1 channel) to first be able to properly implement recovery of one channel and get it more easily tested.
Once channel recovery is done, I'll move on to AMQP "hard" errors, that are global to the connection, to ensure we properly recover all channels too.
Then the last step will be to trigger the recovery for other errors too (Such as TCP errors).
I will create the associated issues, but the Channel part (which is fundamental for the other parts to properly work) should be done before end of summer. Issuing a passive queue declare on a non existing queue on a channel will probably be the easiest way of testing this, as it triggers a channel error.
conioX commentedon Oct 4, 2024
Any news about this?
Keruspe commentedon Oct 12, 2024
I'm sorry about this, last two months were a lot... rougher than anticipated. All current progress can be tracked in #416.
I'm still first focusing on channel recovery, and I'll get to connection recovery once this is stabilized.
Currently, the publishing part works pretty well and I'm confident in the implementation.
I want to hook up some topology recovery (tmp queues recreation and so on) .
The consumer part is trickier, but parts of it are already there.
Keruspe commentedon Apr 19, 2025
I haven't posted an update here for quite some time, but things have moved forward a lot!
All the ongoing work has been merged as part of the latest 3.0 beta versions.
I'm still targetting only channel reconnection n case of a channel error as a first step... but that's actually most of the work, or at least most of the complexity, protocol-wise.
For the publishing case, I think we're good now.
For the receiving part, there are missing pieces in the consumers handling (some of the work is done but finitions to make it actually work transparently are missing). Basic-get should properly handle reconnection but needs confirmation.
Extensions such as confirm-select are properly supported.
Automatic re-declaration of temporary queues are in progress locally. I want to finish adressing this point and have a first working impl of consumers, then I'll release 3.0 final. The connection part will come after that.
Implementing this has also led me to fixing a few corner case bugs here and there which is good news for the overall stability.
I cannot promise any deadline, but my hopes for 3.0 are in May
Keruspe commentedon Jun 4, 2025
Good news : I have consumers working properly!
Needs a little bit of cleanup but release is coming real soon
sin-ack commentedon Jul 15, 2025
Hi, are there any updates on this? Is the checklist in OP up-to-date? I'd like to help with any remaining missing features.
Keruspe commentedon Jul 15, 2025
Checklist is partially outdated, we took a different approach in the end.
End of school year was pretty packed, but I should be able to allocate more time on this soon.
Basically, I have one bug that I'm currently troubleshooting for channel reconnection, then I'll be targeting connection reconnection. I really want channel reconnection to be battle tested before enabling the next layers as it's way easier to diagnose issues this way.
Connection recovery should be pretty fast, with all the tooling that got in for channel recovery.
Last step will be triggering it on network errors.
I'd say we're 80-85% here. Connection recovery is ~5% and hooking up the network failures is ~10-15%
sin-ack commentedon Jul 15, 2025
Thanks for the update! I'd like to help get this ready as soon as possible, so my help offer still stands.
Keruspe commentedon Jul 27, 2025
I'll update the checklist to reflect more what has been and has to be done, but basically, the AMQP connection part is done. I need to make a last tweak to the io_loop to properly keep on going during reconnection (there's currently a hacky workaround for testing).
I think the TCP part should be mostly done before mid August. I'll release a 3.1.0 at that point, with the full recovery/reconnection testable. Won't be considered ready for production for a while though, as it's inevitable there'll be corner cases where things can go south, as AMQP is stateful and really not designed with recovery in mind
Keruspe commentedon Jul 31, 2025
Keeping this issue opened a little more, but this is now testable as part of 3.1.0 with unstable feature as described in README