-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: dead/solo subnets remedy (enhanced) #1949
base: stage
Are you sure you want to change the base?
Conversation
5dac0bc
to
a364c2c
Compare
network/peers/conn_manager.go
Outdated
var ( | ||
TrimmedRecently = ttlcache.New(ttlcache.WithTTL[peer.ID, struct{}](30 * time.Minute)) | ||
) | ||
|
||
func init() { | ||
go TrimmedRecently.Start() // start cleanup go-routine | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know global vars aren't best way to do it, but I think we should leave it global for now and refactor it later (once this code has been running in production for a while and we've proven this solution really does help). Since we might want/have to swap it out for something else.
network/discovery/dv5_service.go
Outdated
if len(topicPeers) >= 2 { | ||
continue // this topic has enough peers | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we are looking (in discovery) for peers who help with either dead or solo subnets (without actually prioritizing the dead ones, and also this kind of filtering is maybe too aggressive overall since in my testing I see that we aren't keeping at MaxPeers
limit for long - and rather falling 10-20% below it and keeping at that level; which probably means we can probably bump inboundLimitRatio
up from 0.5
to let incoming connections fill in more spots - but all that varies from run to run)
obviously this ^ isn't the optimal way to do it, but in my testing on prod I found it to be working reasonably good (with 60 and 90MaxPeers
) - so we probably want to improve this, just maybe not in the upcoming release
network/p2p/p2p_setup.go
Outdated
// inboundLimitRatio is the ratio of inbound connections to the total connections | ||
// we allow (both inbound and outbound). | ||
inboundLimitRatio = float64(0.75) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing now if 0.75
is a good fit, but so far
- I found the value of
0.5
to be working well - and the value of
0.8
seems to be noticeably worse (eg. making it 30 minutes instead of 15 to get rid of dead subnets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess 0.75
is too high still, even with MaxPeers = 90 sometimes it can take up to 90 minutes to get rid of dead subnets (or even longer, if we are unlucky enough),
it's probably better to set inboundLimitRatio
= 0.6 or even lower (the lower, the better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with 0.5 if there aren't any reasons not to
Another thing I've realized is happening - because we only trim once we've reached MaxPeers limit, at some point we might stop trimming completely - for example, we've got to max amount of the incoming connections we allow via 0.5 ratio (say, 30) + we've discovered some outgoing peers but not enough to reach MaxPeers limit, in other words, we've discovered everything we could (and we keep discovering, just at a very slow rate - due to quite an aggressive filter there isn't many connections we are interested in) but that also means we stopped re-cycling(trimming) our incoming connections periodically because we don't trim incoming/outgoing connections separately I've added a commit to address that - d230c4a - it's pretty "hacky/ad-hoc" way do the job - ideally (to simplify this) I think we probably want to have separate limits per incoming/outgoing connections and somewhat separate trimming logic, WDYT ? |
network/discovery/dv5_service.go
Outdated
// withExtraFiltering specifies whether we want to additionally filter discovered peers, | ||
// since operator node and boot node are using the same discovery code (`DiscV5Service`) we | ||
// want to do this extra filtering for operator node only | ||
withExtraFiltering bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bootnode doesn't pull out peers from discovery to filter them so this is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify why it needs newDiscV5Service
service then ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iurii-ssv
Bootnode is still a regular node in terms of discovery, it has a routing table which collects other peers it has discovery interactions with, it does "pull" peers from this table to supply other peers that query it for network peers it knows. But it doesn't go through our filtering, our filtering is only used for pulling peers out of the routing table to be used for protocol connections (TCP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, removed withExtraFiltering
then, and so we are using this func now
func (n *p2pNetwork) peerScore
both:
- to rank "proposed" peers
- to rank peers when trimming (peer "protection")
63b2c45
to
88a752c
Compare
88a752c
to
44665d0
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files☔ View full report in Codecov by Sentry. |
…to get outbound connections of better quality
75c0d37
to
25950ac
Compare
…that it would prevent their timely rotation)
e828500
to
416fd27
Compare
Replaces #1917 additionally improving a bunch of p2p-handling code, with the main contributing factor being "selective discovery".