Possible relay discovery defect

- **Version**: 1.9.2


- **Platform**: Darwin / Linux


- **Subsystem**: Circuit relay transport


#### Severity: High


#### Description:
I have a p2p network of some relay and non-relay nodes. The non-relay nodes are configured to discover 3 relays, configured both with and without reservation concurrency. For some reason (maybe connection issues, or anything else - it's irrelevant for the main issue), the connection between nodes and relays breaks, and then after a time, the node is disconnected from all of the relays. 
 
After checking the logs and digging into the implementation of the circuit relay transport, I found that there is an already-implemented relay discovery mechanism, and it works this way in simple terms:  
1. When a relay disconnects, check the number of connected relays against `discoverRelays`
2. If not enough relays are connected, start the discovery process, and _grab discovery lock_, by setting a `running` flag in `RelayDiscovery` instance
3. After discovering relays, try to connect them until we reach `discoverRelays`
4. If enough relays are connected, release _the lock_, letting the discovery to be run again on relays disconnection

I think there is a critical issue here. If for any reason, the discovery doesn't discover enough relays, or we cannot connect the discovered ones, the relays count keeps under the `discoverRelays`, while the discovery is _locked_ and cannot be run again. Over time, relays disconnect one by one, and we don't reconnect to them.

As an example:  
1. Suppose a `discoverRelays` of 3
2. We connect to 3 relays
3. We disconnect from 2 for some reason
4. The discovery starts, and finds 4 new relays
5. For some reason, we only connect to 1 of 4, resulting in 1 new + 1 old connected relays, which is below 3 threshold, and so the discovery won't be run again
6. The process continues, until no relays are connected

I wonder why other possible solutions are not implemented, for example, changing the condition with which the lock is released, or running the discovery periodically.  




#### Steps to reproduce the error:

It is clarified in description. There are some other configurations that may (or may not) affect the scenario, though. As an example, the auto dial should be disabled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible relay discovery defect #2676

Severity: High

Description:

Steps to reproduce the error:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible relay discovery defect #2676

Description

Severity: High

Description:

Steps to reproduce the error:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions