Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements for crawling dht peers #59

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

btkador
Copy link

@btkador btkador commented Aug 9, 2018

  • added few more well known DHTRouters

  • added optional PassiveMode parameter, so the server doesn't responde to any incoming queries (get_peers, ping, etc)

  • added optional MaxSearchQueries parameter to limit the number of generated get_peers queries per hash. If NumTargetPeers is set to ie 5000 the server will infinityfly create new get_peers requests, till NumTargetPeers is reached - which may create a lot of packets and traffic that can't be handled any more. Especial if the hash has only 5 peers out there, the server will keep sending new requests. With MaxSearchQueries set, this will hard limit the number of new get_peers queries per hash. The counter will reset (per hash) after SearchCntExpire (default 10 minutes).

  • added an expiration timer for nodes. This will only take effect if MaxNodes limit is reached, and purge old (> CleanupPeriod) nodes without pendingQueries, till MaxNodes is reached again.

  • replaced the old Ubuntu Iso fileHash in tests with a current one

@nictuku
Copy link
Owner

nictuku commented Aug 9, 2018

Let's talk about MaxSearchQueries. How does it help you compared to using a low value for NumTargetPeers? It feels like during your experiments it would have been enough to set NumTargetPeers to 5 or so? Then set a RateLimit too?

My concern is that the new features are overlapping with existing ones.

PassiveMode seems redundant with RateLimit for example.

@btkador
Copy link
Author

btkador commented Aug 10, 2018

With NumTargetPeers 5 you get 5 or maybe 20-30 peers after the first batch of get_peers and then the search stops, because needMorePeers is not triggered. There are no more peers learned until the program, which uses the library sends a new PeersRequest(), even if there are 500 or 5000 possible peers out there. That should be sufficient for a common torrent client, I guess.

In theroy, if you add a torrent with 0 peers, the search continues infinityfly and send more and more get_peers request packets to each new learned node, till RateLimit is reached and keeps filling up the whole RateLimit. The only reason that is not happening in real-world is, that some bogus nodes out there always send peers, no matter which hash you're asking for, and so 5 is always reached. I've just tested a bunch of random generated hashes and all of them immediately got peers. (I will investigate that further and evaluate if it's possible to detect rogue nodes, for my use case. I know there is bep42 in dht to do exactly that, but 98% of the dht nodes out there don't support it. I guess I need to disclose the nodeip or nodeid to the application for blacklisting them, based on the fact that they deliver peers for more than x hashes.)

If you set NumTargetPeers 5000 and RateLimit 100 (default), the library sends out an insane ammount of new get_peers packets but drops reply packets above RateLimit, but keeps sending new requests. Each reply thats sneaks thru the RateLimit with peers or nodes triggers new get_peers packets sent out, because needMorePeers is not reached. After a while you send out 100Mbits of get_peers packets and drop the replies. It's kind of a DDoS for your own server, because each small get_peers triggers a bigger response packet, filling up the network stack.

PassiveMode seems redundant with RateLimit for example.

As I understand the code RateLimit works on all incoming packets. It's also dropping reply packets to send out requests from us. So if you set RateLimit to 1, to process only 1 packet per second, dht would be real slow, because dropping all the responses.. while setting PassivMode only affects incoming queries but not replies.

@nictuku
Copy link
Owner

nictuku commented Aug 10, 2018 via email

@btkador
Copy link
Author

btkador commented Aug 10, 2018

In my mind, you control outbound traffic using NumTargetPeers and inbound
traffic using RateLimit.

let's say you want a minimum of 250 peers and use NumTargetPeers for what it seems to be. The library will running a udp flood till NumTargetPeers is satisfied - if there are not enought peers out there, which is in my opinion not the best use of network bandwidth. RateLimit only limits inbound traffic.

I still don't see how your proposed approach is better. In both cases you need to choose values carefully, but for the existing approach the default settings should work for most people.

my parameters/features are all optional if not set.

@nictuku
Copy link
Owner

nictuku commented Aug 10, 2018 via email

@btkador
Copy link
Author

btkador commented Aug 10, 2018

just leave your default parameters like they are - what my branch already does. all my new parameters are default disabled.

@nictuku
Copy link
Owner

nictuku commented Aug 10, 2018 via email

@btkador
Copy link
Author

btkador commented Aug 10, 2018

I need all the options and parameters adjustable for my program code.

As mentioned before, you don't need to merge any of my changes, since it seems to work well for everybody else as it is since >3 years. I just wanted to discuss my extentions and publish them back to the community 😉

ufukomer referenced this pull request in cenkalti/rain Nov 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants