The purpose of this repository is to take OSM extracts and turn them into GeoJSON and then parse this data to produce tagged training data for machine learning with supervised address / POI parsing. Any OSM extract will work with the repo, but extracts that are larger than then ones used in step 1 can potentially cause memory issues for one or more of the parsing scripts. The repository is part of an effort to build an open-source end-to-end encrypted mapping app.
Install Osmosis: brew install osmosis
Install OSMConvert: brew install osmconvert
Install OSMtoGeoJSON: npm install -g osmtogeojson
Following the general blueprint from this Medium article with additional parsing to get data from GeoJSON into ML usable training data stored as pickle files.
For the latest OSM extract in the beta testing regions run one of the following scripts:
Download a pbf extract of OSM data, e.g. this extract of Quebec from GeoFabrick, which we are using for Montreal.
California
wget https://download.geofabrik.de/north-america/us/california-latest.osm.pbf -P osm_extracts
Georgia
wget https://download.geofabrik.de/north-america/us/georgia-latest.osm.pbf -P osm_extracts
New York
wget https://download.geofabrik.de/north-america/us/new-york-latest.osm.pbf -P osm_extracts/new_york.osm.pbf
Quebec
wget https://download.geofabrik.de/north-america/canada/quebec-latest.osm.pbf -P osm_extracts
python osm_to_json.py parseosm --region {REGION} --osm {BOOLEAN}
This is it, you are done.
However, alternatively you can run through steps 2b to 5 one by one:
bash sh/osm_pbf_to_nodes_osm.sh -r $REGION
Input | Output |
---|---|
*.osm.pbf | *.nodes.osm |
bash sh/nodes_osm_to_poi_osm.sh -r $REGION
Input | Output |
---|---|
*.nodes.pbf | *.poi.osm |
bash sh/poi_osm_to_poi_geojson.sh -r $REGION
Input | Output |
---|---|
*.poi.osm | *.poi.geojson |
python osm_to_json.py parseosm --region {REGION} --osm False
Input | Output |
---|---|
*.poi.geojson | *.osm.text.tags.coords.pkl |