-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvements for crawling dht peers #59
base: master
Are you sure you want to change the base?
Conversation
Let's talk about MaxSearchQueries. How does it help you compared to using a low value for NumTargetPeers? It feels like during your experiments it would have been enough to set NumTargetPeers to 5 or so? Then set a RateLimit too? My concern is that the new features are overlapping with existing ones. PassiveMode seems redundant with RateLimit for example. |
With In theroy, if you add a torrent with 0 peers, the search continues infinityfly and send more and more get_peers request packets to each new learned node, till If you set
As I understand the code |
Torrent clients only need DHT to establish an initial connection to a
swarm. After that they exchange peer information directly and the DHT is
barely needed.
In my mind, you control outbound traffic using NumTargetPeers and inbound
traffic using RateLimit.
The DHT is doing what you're asking it to do, get to 5000 peers as fast as
possible. If you don't need that, set a lower number perhaps?
I still don't see how your proposed approach is better. In both cases you
need to choose values carefully, but for the existing approach the default
settings should work for most people.
…On Thu, Aug 9, 2018, 7:25 PM Michael ***@***.***> wrote:
With NumTargetPeers 5 you get 5 or maybe 20-30 peers after the first
batch of get_peers and then the search stops, because needMorePeers is
not triggered. There are no more peers learned until the program, which
uses the library sends a new PeersRequest(), even if there are 500 or
5000 possible peers out there. That should be sufficient for a common
torrent client, I guess.
In theroy, if you add a torrent with 0 peers, the search continues
infinityfly and send more and more get_peers request packets to each new
learned node, till RateLimit is reached and keeps filling up the whole
RateLimit. The only reason that is not happening in real-world is, that
some bogus nodes out there always send peers, no matter which hash you're
asking for, and so 5 is always reached. I've just tested a bunch of random
generated hashes and all of them immediately got peers. (I will investigate
that further and evaluate if it's possible to detect rogue nodes, for my
use case. I know there is bep42 in dht to do exactly that, but 98% of the
dht nodes out there don't support it. I guess I need to disclose the nodeip
or nodeid to the application for blacklisting them, based on the fact that
they deliver peers for more than x hashes.)
If you set NumTargetPeers 5000 and RateLimit 100 (default), the library
sends out an insane ammount of new get_peers packets but drops reply
packets above RateLimit, but keeps sending new requests. Each reply thats
sneaks thru the RateLimit with peers or nodes triggers new get_peers
packets sent out, because needMorePeers is not reached. After a while you
send out 100Mbits of get_peers packets and drop the replies. It's kind of a
DDoS for your own server, because each small get_peers triggers a bigger
response packet, filling up the network stack.
PassiveMode seems redundant with RateLimit for example.
As I understand the code RateLimit works on all incoming packets. It's
also dropping reply packets to send out requests from us. So if you set
RateLimit to 1, to process only 1 packet per second, dht would be real
slow, because dropping all the responses.. while setting PassivMode only
affects incoming queries but not replies.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#59 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAMY9uybe5GLmc5bvoK3hXJTiZHy0IwQks5uPO81gaJpZM4V1c2F>
.
|
let's say you want a minimum of 250 peers and use NumTargetPeers for what it seems to be. The library will running a udp flood till NumTargetPeers is satisfied - if there are not enought peers out there, which is in my opinion not the best use of network bandwidth. RateLimit only limits inbound traffic.
my parameters/features are all optional if not set. |
If you were to change the default behavior, what do you think it should be?
In other words, I think we should make things work out of the box without
having to change knobs. Any ideas for how we can do that?
…On Thu, Aug 9, 2018, 10:18 PM Michael ***@***.***> wrote:
In my mind, you control outbound traffic using NumTargetPeers and inbound
traffic using RateLimit.
let's say you want a minimum of 250 peers and use NumTargetPeers for what
it seems to be. The library will running a udp flood till NumTargetPeers is
satisfied - if there are not enought peers out there, which is in my
opinion not the best use of network bandwidth. RateLimit only limits
inbound traffic.
I still don't see how your proposed approach is better. In both cases you
need to choose values carefully, but for the existing approach the default
settings should work for most people.
my parameters/features are all optional if not set.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#59 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAMY9qsIbHQrPo1h_fOoBxS3Qm-o2u7sks5uPRfAgaJpZM4V1c2F>
.
|
just leave your default parameters like they are - what my branch already does. all my new parameters are default disabled. |
Can we reduce the number of changes and number of knobs from your proposed
change? The DHT Config is already quite large. Can we maybe go with one or
two new options?
I am not 100% against adding all your proposed changes - just trying to
find a middle ground to reduce the API surface.
The more knobs we have the harder it is for people to find what values work
well together.
Perhaps some of these things can be hidden constants, if they are so rarely
changed?
…On Thu, Aug 9, 2018, 10:27 PM Michael ***@***.***> wrote:
just leave your default parameters like they are - what my branch already
does. all my new parameters are default disabled.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#59 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAMY9rJBLddtWVz9Ts6JYKXRUr33pje6ks5uPRnNgaJpZM4V1c2F>
.
|
I need all the options and parameters adjustable for my program code. As mentioned before, you don't need to merge any of my changes, since it seems to work well for everybody else as it is since >3 years. I just wanted to discuss my extentions and publish them back to the community 😉 |
added few more well known
DHTRouters
added optional
PassiveMode
parameter, so the server doesn't responde to any incoming queries (get_peers, ping, etc)added optional
MaxSearchQueries
parameter to limit the number of generated get_peers queries per hash. If NumTargetPeers is set to ie 5000 the server will infinityfly create new get_peers requests, till NumTargetPeers is reached - which may create a lot of packets and traffic that can't be handled any more. Especial if the hash has only 5 peers out there, the server will keep sending new requests. WithMaxSearchQueries
set, this will hard limit the number of new get_peers queries per hash. The counter will reset (per hash) afterSearchCntExpire
(default 10 minutes).added an expiration timer for nodes. This will only take effect if
MaxNodes
limit is reached, and purge old (> CleanupPeriod) nodes without pendingQueries, till MaxNodes is reached again.replaced the old Ubuntu Iso fileHash in
tests
with a current one