Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I search through all zipcodes or bounding boxes in the U.S.? #33

Open
Liviayi opened this issue May 27, 2018 · 7 comments
Open

Can I search through all zipcodes or bounding boxes in the U.S.? #33

Liviayi opened this issue May 27, 2018 · 7 comments

Comments

@Liviayi
Copy link

Liviayi commented May 27, 2018

Thanks for the amazing project!! Is there a way for me to search over all zip codes in the U.S.? Or maybe divide the U.S. into several bounding boxes and search over all bounding boxes? It seems that your code is based on cities (regardless of whether the search is being done through bounding boxes, neighborhoods or zipcodes). Thank you very much.

@tjukanovt
Copy link

Hi Liviayi!

I would probably implement this so that I would pre-process the bounding boxes e.g. from shapefiles to my local database or to a csv file and then loop over them with the bounding box search option. In PostGIS you can find the bounding box with ST_envelope function: https://postgis.net/docs/ST_Envelope.html

@Liviayi
Copy link
Author

Liviayi commented May 28, 2018

Thanks! That would be awesome. Is this difficult to implement? When do you plan to implement this?

@tjukanovt
Copy link

I meant that that might be sensible to do as a pre-processing script and not implement in this project... That's just my suggestion.

@Liviayi
Copy link
Author

Liviayi commented May 28, 2018

I see. Thanks. In your code, is it possible to run bounding box search using bounding box of a state, instead of a city?

@tomslee
Copy link
Owner

tomslee commented May 29, 2018

Hi Livi.ayi: The word "city" appears in tables etc purely for historical reasons. So long as you can get a bounding box and are prepared to be patient, you should be able to run it on other areas. I've run it on Switzerland and Sri Lanka, for example.

That said, please note the status message on https://github.com/tomslee/airbnb-data-collection. The script currently misses some listings because of changes on the Airbnb site (about 10% for some cities) and I am unlikely to fix that problem.

If you are lucky, the bounding box method gets about 8 or 9 new listings per request on average: let's be generous and say 10 (I don't know if that would work for an area the size of the USA). Each request takes a few seconds, so let's say you get 200 listings per minute. If there are about 650K listings in the US (http://www.businessinsider.com/airbnb-total-worldwide-listings-2017-8) then that's about 3,000 minutes, or 50 hours. It may be possible! Caveat: you would also collect a lot of non-USA listings from southern Canada and northern Mexico. Maybe more like four days or so. I've started an exploratory run and I'll post a note when/if it finishes. Maybe this weekend.

There may be other questions about zoom levels and bounding boxes for boxes that big...

@Liviayi
Copy link
Author

Liviayi commented May 29, 2018

Thanks so much for the detailed answer. I have tried to run a bounding box search around one particular (rectangle-looking) state and it worked well. It took about 2 hours. I did not use proxies so my IP was occasionally blocked for five minutes or so in the middle...The regular wait time was a fantastic idea.

That leads to my last question: what is the procedure for using proxy servers with your code? Do I just obtain a list of hosts/ports, and add them to the user.config file, and that's it?

@luisencalada
Copy link

Hello to all! I am getting this problem when searching by bbox... Warning HTTP Status 400 from web site: IP address blocked.Waiting 1.0 minutes... It seems my university IP is blocked... Do you have any recommendation to overpass this issue? When searching by zipcode or neighborhood, it finishes the process but no data is on DB.. Thanks in advance! I am looking for data within Lisbon boundaries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants