Skip to content

A proxy pool which you can get an avaiable proxy http server.

License

Notifications You must be signed in to change notification settings

wueason/proxy_pool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

proxy_pool

A proxy pool which you can get an avaiable proxy http server.

When we run a crawler for data collecting purpose, we always get blocked. This module may help you get out of the trouble.

start_page = 'http://www.xicidaili.com/nt/'
target_parttern = r'href=\"(\/nt\/\d+)\"\>'
ip_parttern = r'\<td\>(\d+\.\d+\.\d+\.\d+)\<\/td\>'
port_parttern = r'\<td\>(\d{2,5})\<\/td\>'

collector = Collector(start_page=start_page,
					target_parttern=target_parttern,
					regex_obj={
						"IP": ip_parttern,
						"PORT": port_parttern
					})
collector = proxy_pool.Collector()
collector.collect_proxies() # init or update proxy info
collector.get_one_proxy() # get the proxy

About

A proxy pool which you can get an avaiable proxy http server.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages