Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search By City/State #20

Open
jasonkalmeida opened this issue Sep 19, 2019 · 26 comments
Open

Search By City/State #20

jasonkalmeida opened this issue Sep 19, 2019 · 26 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jasonkalmeida
Copy link
Collaborator

This might be something we want to tackle a little further down the line, but it is an obvious feature enhancement that we should implement.

@jasonkalmeida jasonkalmeida added the enhancement New feature or request label Sep 19, 2019
@joegoldbeck
Copy link

This would require calling out to a backend or another API right?

Looks like the mobilize event api only has zip code as an option

@jlev
Copy link
Contributor

jlev commented Sep 26, 2019

Would also be great to enable the browser location api, so people on mobile won't have to type anything. I've done something like this for StoriesOfSolidarity.org, using the OSM Nominatim reverse geocode API to go from lat/lon to city/state (https://github.com/storiesofsolidarity/website-frontend/blob/master/app/scripts/views/storyPost.js#L118)

@Hucxley
Copy link

Hucxley commented Sep 26, 2019

Actually, @joegoldbeck the mobilize event API has the lat/long in location.location object (which I learned about when searching the site for events while texting people (it wants to ALWAYS return the lat/long in the query URL after you submit search). Ref:

"location": { "venue": "", "address_lines": [ "204 E 13th St", "" ] "locality": "", "region": "", "postal_code": "10003", "location": { "latitude": 40.7322535, "longitude": -73.9874105 },

@joegoldbeck
Copy link

Right, but you can’t pass in lat/lon, so to search by city/state we first need to do a lookup elsewhere from city/state -> zip

jlev added a commit that referenced this issue Sep 26, 2019
- browser geolocation on demand (requires https)
@jlev
Copy link
Contributor

jlev commented Sep 26, 2019

Basic implementation of geolocation on demand, using Nominatim to look up zip from lat/lon (limiting accuracy to 3 decimal places for user privacy).

UX could be improved by showing spinner or progress indicator, as the location API takes a few seconds to return.

@jlev jlev self-assigned this Sep 26, 2019
@Hucxley
Copy link

Hucxley commented Sep 28, 2019

@joegoldbeck I was looking at this again...

IF we're getting a user's geolocation, wouldn't we pull all (public) events for an organization, presumably on a regular basis, let's say hourly, store all of the events using the event ID as the key for the record and the location.location data. After getting the user's location, quickly loop through each event, and if the INT value of the lat === INT user's lat, then look if INT of long === INT user's long, and if both === true, push the event ids to an array of "possible" events to drill down closer to the user.

The other option is to fetch all events on a regular schedule and store the event IDs into arrays organized by INT values for lats and longs (creating and absolute maximum 52 Lat arrays and 113 Long arrays, though practically much smaller since there certainly won't be an active event in every lat and long of US territory), with eventIDs pushed into the lat and long arrays that match the INT values of the lat and long of the event location. (for example, any events in Chattanooga would have their eventIDs in the lat: {35: [$eventIDs]} and long: {85: [$eventIDs]} entries in storage)

Then, it would seem rather quick to check user's INT value of lat and long against the stored events:
if(docStore.lat.$intUserLat && docStore.long.$intUserLong) => return events that appear in both arrays
Go back to our event storage dic, and pull the event details that match the events from the step above, and drill down to the locations.

Maybe this is over-engineered. I don't know, I haven't written or tested any code for it, but I don't think we want to ping the mobilize API every time a user comes to our event map, especially in about 7 or 8 months from now. Instead, we'd probably be better served to update our stored events on a regular basis and query against our own storage instead.

@joegoldbeck
Copy link

joegoldbeck commented Sep 28, 2019

@Hucxley I think you’re right that at some point we may want to add caching (you can just use a LRU cache, you don’t need to proactively fetch things. Happen to explain further in a PM), but as you suggest, that’s an optimization for later, in the case that we get to scale.

That doesn’t have especially bearing on this ticket though. The issue here is that we need to do a lookup on city/state in order to find the zip code to pass in. We could definitely cache that, as well, once we implemented it, if warranted based on rate-limits/cost (I’d file premature caching under over-engineering 😄)

But we’d need to implement it first 😅

@joegoldbeck
Copy link

Speaking of which: @jlev, could we use nominatim to power a city/state -> zipcode lookup? (Assuming you haven’t already done that)

@Hucxley
Copy link

Hucxley commented Sep 28, 2019

@joegoldbeck I guess I'm disconnecting in why we'd need to look up a zip code from the user's geolocation to pass to a query if we have a list of events (which contain the location.location data as well as other event details) AND the user's geolocation. If we want to limit a return of results within a radius of the user, that can still be done with event.location.location and the user's geolocation.

I guess I'm looking at it from a user who is looking for something near them and automatically returning upcoming nearby events. I recognize there's also another use case of looking up events FOR someone else, for which the zip code + distance query would be preferred intent. But, those are separate user stories that don't have a lot of overlap.

Am I missing something super obvious here? I'm on Slack if you want to send me a PM over there.

@joegoldbeck
Copy link

joegoldbeck commented Sep 28, 2019 via email

@joegoldbeck
Copy link

joegoldbeck commented Sep 28, 2019

I believe the lat/lon lookup you're asking about has already been solved via a7f6df0 , so the geolocation api integration is 👌

You're definitely right that we could find a way to cache some of this intelligently (I think you have some good ideas there!), and almost certainly should at scale, but that's a later concern imo. We've gotta get this MVP out the door :). If we have the good problem of hitting scale (or the likelihood of being advertised in such a way that we expect to hit scale) we can deal with that then.

A maxim I tend to go by in software engineering (though occasionally to my detriment, admittedly) is "The future is uncertain and you will never know less than you know right now (<--- strongly recommend POODR which is the book this is from)

@Hucxley
Copy link

Hucxley commented Sep 28, 2019

As in, say I want to type in Cambridge, MA, on my desktop, instead of 02139.

@joegoldbeck That's the part I was missing. I thought the point was to be able to determine city/state by using geolocation, then lookup the zip code for that and pass it through to the query for events automatically. I may have aggregated some stuff from Slack/Tech Talk/Idea Board and read more into this issue that was intended. :headbonk

@jlev
Copy link
Contributor

jlev commented Sep 28, 2019

Returning to the immediate need of a city/state lookup, this is something we can do with a real geocoder. But it's not a straightforward single call, because there are many zipcodes within each city, and most span boundaries. They're not even really geographic entities (see my dataset at us-zipcodes-congress for more on why this is tricky.

But we can do something clever to get an approximate answer. Use Nominatim to search for a city/state (eg https://nominatim.openstreetmap.org/details.php?place_id=198544636), take the centroid of that and reverse it (eg https://nominatim.openstreetmap.org/reverse.php?lat=42.37815&lon=-71.11222&zoom=13&format=html), getting a zipcode which we can send to Mobilize.

In the future if we keep our own event store there are all sorts of clever ways we can search and sort. But in the short term, we are constrained to how Mobilize makes their data available, which is exclusively by zipcode.

@jlev
Copy link
Contributor

jlev commented Sep 28, 2019

This will almost certainly hit Nominatim's usage limits, though. They request that users stay under 1 req/sec.

For scale, I'd suggest using Geocod.io, which will give back a list of zips that match a city in one shot. (eg https://api.geocod.io/v1.4/geocode?q=cambridge,%20ma&api_key=DEMO). It's free for 2,500 lookups per day, and $0.50 per thousand after that.

@jasonkalmeida
Copy link
Collaborator Author

This is really interesting.

The data varies by definition, but overall there are ~35,000 cities in the US. Theoretically speaking, at a 2,500 look ups per day, we could cache the entire country in the span of 2 weeks.

I'm not suggesting we do something that drastic, but we could probably cache a mapping of a lot of the major cities to their respective zip codes. If we want to do the super basic way on GH Pages, we could store it in a JSON object.

@joegoldbeck
Copy link

That's a good point @jasonkalmeida! Looks like we can download city/state -> zip codes mappings from http://federalgovernmentzipcodes.us/download.html

Should be fairly easy to transform that into a JSON lookup table

@jasonkalmeida
Copy link
Collaborator Author

jasonkalmeida commented Sep 29, 2019

Do we want to do all cities via a file? Or a hybrid approach where we do a file for the top cities (effectively caching) and fall back to the Geocod.io API?

In addition to having a safe fallback for whatever reason, I just looked at that zip code file and it's 13 MB - I know this might be an over-optimization concern, but I just feel weird having the client download 13 MB+ on page load.

@joegoldbeck
Copy link

joegoldbeck commented Sep 29, 2019 via email

@joegoldbeck
Copy link

Though, fwiw, that original file has a whole bunch of info we don't need. If we make it a simple lookup table, it's only 1.4 MB, which gzips down to 420KB (GH gzips assets by default).

Is that small enough for the page load do you think? Might be small enough that if we do it asynchronously, it would be fine.

zipcodelookuplist.json.zip
zipcodelookupsinglevalue.json.zip
zipcodelookupsinglevalueasint.json.zip

@jlev
Copy link
Contributor

jlev commented Sep 29, 2019

Maybe it's my python background, but I hesitate to put calculations on the client that feel like they should be done on the server. Geocod.io has a good, fast, reliable API and it's free up to 2,500 lookups per day. We could also build a very simple lookup server and host it on something like Zeit for pennies (at least until we get to millions of requests per month) https://zeit.co/pricing/calculator

@jlev
Copy link
Contributor

jlev commented Sep 29, 2019

FYI, that single value as int is truncating leading zeros, which will mess up the entire northeast. "02140" -> 2140.

@joegoldbeck
Copy link

Oh totally! We could do the same on GCP (AWS, etc...) if we want a server.

I’m not familiar with Zeit! Would be happy to learn if you think it’s easier for this kind of thing

@joegoldbeck
Copy link

joegoldbeck commented Sep 29, 2019

@jlev yes that was intentional to save space ;), since that was the issue under discussion. obviously we can add back in the zeros with an lpad, since we know zips are 5 digits.

Btw that’s only in the “...asint” file. I included a few files just as options to see different file sizes in case we wanted to go full client-side while getting it off the ground.

@jasonkalmeida
Copy link
Collaborator Author

I personally do like the API as well - my suggestion of the hybrid mode where we cache the popular cities was mainly to reduce our API call usage for obvious cases that we can map with minimal effort.

jlev added a commit that referenced this issue Oct 9, 2019
- browser geolocation on demand (requires https)
@jasonkalmeida
Copy link
Collaborator Author

@jlev - can we put a PR for this back up? I really liked the feature and think it would add value.

@joegoldbeck joegoldbeck added the help wanted Extra attention is needed label Dec 13, 2019
@joegoldbeck
Copy link

Unassigning based on conversation with @jlev . Would be great to make this happen if anyone has the time!

@joegoldbeck joegoldbeck pinned this issue Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants