-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize Geodata Schema to avoid repeated lookups of same lat/lon #428
Comments
An |
I do have an example where denormalization would not be wanted. The geodata field contains place data such as business names. Thus a lookup in 2024 and 2030 would vary. So it's not so bad to have this duplicated. Perhaps a cache would be warranted, and the queue processing could order entries by lat/long to optimize cache usage. |
For those importing millions of points here's an optimization
Hope this helps! Maybe add this as an FAQ or add some helper to do all of the above... |
@lindner thanks for diving into the topic! This issue is not missed or ignored, I'll pay it more attention later, hopefully soon. Meanwhile, thank you for providing options and suggesting another approach! |
I guess #699 is a first step at tackling this. |
Describe the bug
The Points table has denormalized
city
,country
andgeodata
, normalize this.Version
0.16.4
To Reproduce
Import lots of Points, notice that reverse Geocoding is taking a long time. Look at schema and data to notice a lot of duplication.
Expected behavior
The schema should be adjusted use a single geodata/city/country for a given lat/lon. Alternatively the ReverseGeocode job could look for existing data from the other Points to see if data has already been fetched.
Additional context
Here are some statistics from my import:
As a workaround you can use this UPDATE sql to populate null fields where the result is already fetched. It processes 1000 item batches, so you just keep running it until you get 0 rows.
The text was updated successfully, but these errors were encountered: