Skip to content

Latest commit

 

History

History
21 lines (18 loc) · 752 Bytes

README.md

File metadata and controls

21 lines (18 loc) · 752 Bytes

proxy_pool

A proxy pool which you can get an avaiable proxy http server.

When we run a crawler for data collecting purpose, we always get blocked. This module may help you get out of the trouble.

start_page = 'http://www.xicidaili.com/nt/'
target_parttern = r'href=\"(\/nt\/\d+)\"\>'
ip_parttern = r'\<td\>(\d+\.\d+\.\d+\.\d+)\<\/td\>'
port_parttern = r'\<td\>(\d{2,5})\<\/td\>'

collector = Collector(start_page=start_page,
					target_parttern=target_parttern,
					regex_obj={
						"IP": ip_parttern,
						"PORT": port_parttern
					})
collector = proxy_pool.Collector()
collector.collect_proxies() # init or update proxy info
collector.get_one_proxy() # get the proxy