Skip to content

feat: Switch geodata providers #7393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 88 commits into
base: master
Choose a base branch
from

Conversation

camdecoster
Copy link
Contributor

@camdecoster camdecoster commented Mar 20, 2025

Description

Adds scripts to build topojson from UN sourced data and updates build process to run these scripts.

Closes #7334

Changes

  • Adds script for downloading shapefiles, geojson
  • Adds script for converting shapefiles, geojson into topojson
  • Updates topojson path references

Testing

  • Run npm run build_topojson and make sure the script completes successfully
  • Try loading the maps in ./dist/topojson in Mapshaper
  • Open the test dashboard
  • Try loading some of the geo plots and look for errors

Notes

  • The new maps are built from this UN data source for countries, coastlines, and lands layers. The oceans, lakes, rivers, and subunits layers are built from Natural Earth Data.
  • The current maps come from sane-topojson, which is entirely derived from Natural Earth Data
  • I haven't been able to verify the resolution of the data from the UN. There's a reference here to the "UNGeodata 25 million scale", but I haven't been able to confirm what that means. As such, the labels '50m' and '110m' probably aren't accurate. But they need to remain as they are to not break references.
  • Mapshaper can't handle antimeridian cutting, so features that cross the antimeridian can look weird (source)
  • I want to credit sane-topojson for some of the design of these scripts. I'm attempting to create a (mostly) drop in replacement, so it made sense to follow those conventions.

TODO

  • [ ] Clean up shapefiles after converting to topojson Saving everything into the build folder makes this unnecessary
  • Fix Antarctica maps (or just remove them)
  • Commit new maps to ./dist/topojson folder
  • Fix broken tests (I'm sure there will be some)
  • Remove log statements in process_geodata.mjs (or hide them behind flag)
  • Add the 'usa' map (though it will be pretty bland without states)
  • Remove sane-topojson package
  • Verify resolutions (see note above about resolutions)
  • Only include actual coastlines in coastlines
  • Make sure 110m maps don't have any unintended artifacts (like South America)
  • Add UN geodata archive to repo and only attempt download if file can't be found
  • Add markdown log file in draftlogs: 7393_feat.md
  • Remove unnecessary properties info from final maps
  • Fix ocean fill issues that appear in some geo plots (see here for example)
  • Fix Sudan not filling properly in choropleths (see here for example)
  • Lint files
  • Update test baseline images per PR changes

@gvwilson gvwilson added feature something new P1 needed for current cycle labels Mar 20, 2025
@camdecoster camdecoster changed the title Switch geodata providers feat: Switch geodata providers Mar 20, 2025
import config from './config.json' assert { type: 'json' };

try {
// Download data from UN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As these origin URLs may change (link rot), I worry that this piece of the script could become out of date quickly. I'm not sure there's too much we can do here but I might recommend committing the current source file (but not adding it to the dist/. Maybe someone else has a better idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable. The script could be updated to look for the file first and only download if it doesn't exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marthacryan do you have an opinion or ideas on this? What's the ideal dev UX here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in 3b47c67 I added the archive from the UN.

@ndrezn ndrezn requested a review from emilykl March 20, 2025 17:03
@gvwilson
Copy link
Contributor

  1. Sync with this branch.
  2. npm i to install/update packages.
  3. npm run build_topojson to build JSON files.
  4. Go to mapshaper.org.
  5. Load europe_50m.json.
  6. Initially looks like the lakes are misplaced (for example, the Caspian Sea seems to overlap the Russian border in a weird way in the screenshot).
  7. On closer inspection, maybe this is valid: I'm unable to find a map showing how national borders cross the Caspian Sea, and Russia might very well have a claim to the northwest section.
  8. The three lakes in the north check out: one is in Sweden, the other two are near St. Petersburg.
    Screenshot 2025-03-25 at 9 58 13 AM

@gvwilson
Copy link
Contributor

Q for @camdecoster : did the old usa map have the states? as you say, it's not much use without that…

@gvwilson
Copy link
Contributor

Trying to get npm run test-jasmine to work with the new topojson files, it looks like I'm tripping over a configuration issue.

  1. Original error message complained about not being able to find ../../../dist/topojsonworld_110m.json (note the missing '/' between 'topojson' and 'world').
  2. I modified getTopojsonPath in src/lib/topojson_utils.js as follows to insert the '/' if required:
 topojsonUtils.getTopojsonPath = function(topojsonURL, topojsonName) {
-    return topojsonURL + topojsonName + '.json';
+    if (topojsonURL.endsWith('/')) {
+       return topojsonURL + topojsonName + '.json';
+    } else {
+       return topojsonURL + '/' + topojsonName + '.json';
+    }
 };
  1. I also modified test/jasmine/karma.conf.js as follows because we're getting the JSON files from the local dist directory not the installed sane-topojson package:
-var pathToSaneTopojsonDist = path.join(__dirname, '..', '..', 'node_modules', 'sane-topojson', 'dist');
+var pathToTopojsonDist = path.join(__dirname, '..', '..', 'dist', 'topojson');

// ... and further down, edit to match the new variable name

-        {pattern: pathToSaneTopojsonDist + '/**', included: false, watched: false, served: true}
+        {pattern: pathToTopojsonDist + '/**', included: false, watched: false, served: true}

This still produces error messages like the one shown below saying that the JSON files can't be found:

	Failed: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
	Error: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
	    at Object.<anonymous> (/Users/gvwilson/plotly/plotly.js/src/plots/geo/geo.js:137:35 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:147433:31)
	    at Object.event (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:504:42 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:4875:48)
	    at XMLHttpRequest.respond (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:1951:24 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:6326:30)
25 03 2025 10:40:03.170:WARN [web-server]: 404: /dist/topojson/world_110m.json
[note: the message above is then repeated several times]

I've tried variations on the path in test/jasmine/tests/geo_test.js specified by this line:

Plotly.setPlotConfig({ topojsonURL: '../../../dist/topojson' });

Adding an extra .. or removing one of the ones that's there doesn't affect the outcome.

@camdecoster
Copy link
Contributor Author

@gvwilson yes the old maps had 'subunits' layers for a few regions (USA and Brazil did for sure). The UN maps don't include that information. Thanks for looking into the tests. I'm working on an update to get the tests working but I haven't pushed that yet. I'll try to get that out later today.

@ndrezn
Copy link
Member

ndrezn commented Mar 25, 2025

@etpinard when you have a chance could you give this PR a look? Would love your eyes. Thank you!

In the short term what was the source for US states/Canadian provinces in the old dataset?

Refactor to simplify scripts

Switch to UN/NE geodata hybrid

Save UN geodata to archive

Remove extra info from topojson

Add centroids to geojson

Use 'simplify' to create 110m maps
@camdecoster
Copy link
Contributor Author

@archmoj I've implemented the changes that you suggested. Could you take another look? Regarding the US/Canada border, that's just how the geodata looks coming from the UN. Manipulating that should be possible, but I'd rather wait until after these maps land to look at adding that.

@camdecoster camdecoster requested a review from archmoj July 10, 2025 16:15
@camdecoster camdecoster requested a review from emilykl July 10, 2025 16:30
@archmoj
Copy link
Contributor

archmoj commented Jul 10, 2025

Files and size changes:
Should the new files have UN prefix to avoid overwriting on the CDN?

old dist/topojson/

   37K | africa_110m.json
  144K | africa_50m.json
   55K | asia_110m.json
  353K | asia_50m.json
   33K | europe_110m.json
  194K | europe_50m.json
   66K | north-america_110m.json
  980K | north-america_50m.json
   22K | south-america_110m.json
  165K | south-america_50m.json
   48K | usa_110m.json
  460K | usa_50m.json
  134K | world_110m.json
 1075K | world_50m.json
new topojson/dist

   37K | africa_110m.json
  120K | africa_50m.json
   15K | antarctica_110m.json
   41K | antarctica_50m.json
   82K | asia_110m.json
  382K | asia_50m.json
   39K | europe_110m.json
  198K | europe_50m.json
   75K | north-america_110m.json
  588K | north-america_50m.json
   46K | oceania_110m.json
  328K | oceania_50m.json
   23K | south-america_110m.json
  168K | south-america_50m.json
   70K | usa_110m.json
  322K | usa_50m.json
  288K | world_110m.json
 1655K | world_50m.json

@camdecoster
Copy link
Contributor Author

Files and size changes: Should the new files have UN prefix to avoid overwriting on the CDN?

The file names need to be the same for the lookup to work in the library. We could update the lookup function to add a suffix, but that seems like the wrong direction to go. Eventually, the new maps will be standard. For now, we could manually update the CDN to include the new maps with a suffix. We could announce the change and then eventually update the CDN to only include the new maps. Alternatively, we could include the old maps in a legacy folder.

@camdecoster camdecoster requested a review from archmoj July 10, 2025 21:14
@archmoj
Copy link
Contributor

archmoj commented Jul 11, 2025

In regards to new files:

antarctica_110m.json
antarctica_50m.json
oceania_110m.json
oceania_50m.json

would you consider adding new scopes for antarctica and oceania here?

exports.scopeDefaults = {
world: {
lonaxisRange: [-180, 180],
lataxisRange: [-90, 90],
projType: 'equirectangular',
projRotate: [0, 0, 0]
},
usa: {
lonaxisRange: [-180, -50],
lataxisRange: [15, 80],
projType: 'albers usa'
},
europe: {
lonaxisRange: [-30, 60],
lataxisRange: [30, 85],
projType: 'conic conformal',
projRotate: [15, 0, 0],
projParallels: [0, 60]
},
asia: {
lonaxisRange: [22, 160],
lataxisRange: [-15, 55],
projType: 'mercator',
projRotate: [0, 0, 0]
},
africa: {
lonaxisRange: [-30, 60],
lataxisRange: [-40, 40],
projType: 'mercator',
projRotate: [0, 0, 0]
},
'north america': {
lonaxisRange: [-180, -45],
lataxisRange: [5, 85],
projType: 'conic conformal',
projRotate: [-100, 0, 0],
projParallels: [29.5, 45.5]
},
'south america': {
lonaxisRange: [-100, -30],
lataxisRange: [-60, 15],
projType: 'mercator',
projRotate: [0, 0, 0]
}
};

// Remove country polygons with bad geometry
`-filter '!["ATA", "FJI", "RUS"].includes(iso3cd)' target=un_polygons`,
'-merge-layers target=un_polygons,antarctica,fiji,russia force name=all_features',
// Erase Caspian Sea
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain the need for this process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UN maps didn't show the outline of the Caspian sea in the country borders, so the script modifies the geodata here to cut the shape out. The NE maps have it cut out, so this changes matches what was provided previously. You can see the source data here without the sea visible in the countries layer (though you can see it in a water bodies layer):

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the note. How about modifying the comment to be

    // Subtract Caspian Sea

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. I'll be a bit more explicit too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Erase Caspian Sea
`-filter 'globalid === "{BBBEF27F-A6F4-4FBC-9729-77B3A8739409}"' target=all_features + name=caspian_sea`,
'-erase source=caspian_sea target=all_features',
// Update country codes for disputed territories at Egypt/Sudan border: https://en.wikipedia.org/wiki/Egypt%E2%80%93Sudan_border
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please clarify why this process is needed? Is the UN data wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are disputed territories. The NE maps don't have them separated out, but the UN maps do. This was causing problems with the script due to how the metadata was assigned. I opted to give them different country codes so that we wouldn't have to decide which country "owns" which territory. I looked into the UN policy on these territories and their position is essentially that they're not taking a position. Switching to the UN data is an attempt to use a trusted source. So, I'm trying to follow the UN lead with this. You can see the difference between the old maps and the new below:

Natural Earth UN
image image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider adding an image test for this case?

{
 "data": [
  {
   "type": "choropleth",
   "name": "Sudan",
   "locations": ["SDN"],
   "z": [1],
   "colorscale": [[0, "blue"], [1, "blue"]],
   "showscale": false,
   "showlegend": true
  },
  {
   "type": "choropleth",
   "name": "Egypt",
   "locations": ["EGY"],
   "z": [1],
   "colorscale": [[0, "orange"], [1, "orange"]],
   "showscale": false,
   "showlegend": true
  }
 ],
 "layout": {
  "geo": {
    "projection": {
        "rotation": {
            "lon": 30,
            "lat": 20
        },
        "scale": 7
    }
  },
  "width": 800,
  "height": 600
 }
}

Before:
newplot (5)

After:
newplot (6)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine with me. Thanks for the config!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@camdecoster camdecoster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you consider adding new scopes for antarctica and oceania here?

@archmoj I'll add the new scopes shortly.

// Erase Caspian Sea
`-filter 'globalid === "{BBBEF27F-A6F4-4FBC-9729-77B3A8739409}"' target=all_features + name=caspian_sea`,
'-erase source=caspian_sea target=all_features',
// Update country codes for disputed territories at Egypt/Sudan border: https://en.wikipedia.org/wiki/Egypt%E2%80%93Sudan_border
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are disputed territories. The NE maps don't have them separated out, but the UN maps do. This was causing problems with the script due to how the metadata was assigned. I opted to give them different country codes so that we wouldn't have to decide which country "owns" which territory. I looked into the UN policy on these territories and their position is essentially that they're not taking a position. Switching to the UN data is an attempt to use a trusted source. So, I'm trying to follow the UN lead with this. You can see the difference between the old maps and the new below:

Natural Earth UN
image image

// Remove country polygons with bad geometry
`-filter '!["ATA", "FJI", "RUS"].includes(iso3cd)' target=un_polygons`,
'-merge-layers target=un_polygons,antarctica,fiji,russia force name=all_features',
// Erase Caspian Sea
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UN maps didn't show the outline of the Caspian sea in the country borders, so the script modifies the geodata here to cut the shape out. The NE maps have it cut out, so this changes matches what was provided previously. You can see the source data here without the sea visible in the countries layer (though you can see it in a water bodies layer):

image

function getCentroid(feature) {
const { type } = feature.geometry;
const projection = geoIdentity();
const path = geoPath(projection);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also another variable called path in global scope.
Please rename this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 8944398.


exports.topojson = saneTopojson;
module.exports = {
topojson,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be

Suggested change
topojson,
topojson: topojson,

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of ES6, you can do this if the key and value variable have the same name (MDN source).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P1 needed for current cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use UN GeoJSON data for geo charts rather than Natural Earth
5 participants