-
Notifications
You must be signed in to change notification settings - Fork 5
Home
- create a file
/etc/apt/sources.list.d/nginx.list
:
deb http://nginx.org/packages/ubuntu/ codename nginx
deb-src http://nginx.org/packages/ubuntu/ codename nginx
curl http://nginx.org/keys/nginx_signing.key | apt-key add -
apt-get update
apt-get install git tig gdal-bin libgdal-dev python-dev python-virtualenv build-essential libyaml-dev libspatialindex-dev postgresql-9.3-postgis-2.1 nginx uwsgi uwsgi-plugin-python
- in file
/etc/postgresql/9.3/main/postgresql.conf
update:-
shared_buffers = 512MB
,temp_buffers = 16MB
,work_mem = 32MB
,maintenance_work_mem = 128MB
,effective_cache_size = 1024MB
,checkpoint_segments = 16
,wal_buffers = 16MB
,checkpoint_segments = 32
,checkpoint_completion_target = 0.3
,random_page_cost = 1.1 #AWS specific
-
- as postgres user create
ubuntu
superusercreateuser -s ubuntu
- clone git repository
- initialize python virtual environment
virtualenv ~/posm_env
- manually install packages (do not use pip install -r pip-requires.txt)
pip install Shapely==1.3.0 Rtree==0.7.0 PyYAML==3.11
- ubuntu specific fix to install GDAL in the virtual_env
pip install --no-install GDAL==1.10.0 && cd ~/posm_env/build/GDAL/ && python setup.py build_ext --include-dirs=/usr/include/gdal && pip install --no-download GDAL && cd -
-
create database and install extensions:
createdb posm
psql -c 'create extension postgis;' posm
psql -c 'create extension postgis_topology;' posm
-
create plpgsql functions:
psql -f extractor/postgis_sql/proc_functions.sql posm
-
in
posm/extractor
directory copy template YAML configuration files:cp admin_mapping.yaml.tmpl admin_mapping.yaml && cp settings.yaml.tmpl settings.yaml && cp admin_level_0.txt.tmpl admin_level_0.txt && cp admin_level_1.txt.tmpl admin_level_1.txt && cp admin_level_2.txt.tmpl admin_level_2.txt
-
in settings.yaml set:
-
osm_data_file
to an OSM data source file (*.pbf) -
memory_limit
to'1'
for AWS- due to OSM data size (~500mb), script requires ~3Gb of memory to store temporary OSM data, additionally it will require 1.5Gb of memory for the actual Python processing
- as the server has 3.7Gb, we can limit memory usage to 1Mb which will force the script to store temporary OSM data on the disk
- if you have a lot of memory as a rule of thumb, you can set it to 3 times the size of the OSM .pbf dataset
- also check the
debug_file
file for any messages like 'Not enough memory for temporary storage, ...' and increase memory limit if needed
-
postgis
to your postgis database identifier, i.e."PG:dbname=posm"
, omitting other parameters
-
- to extract admin_levels from the OSM dataset and import them to postgis database, run:
python extract.py
- after the extract finishes the current directory will contain 6 new files, named
admin_[0,1,2]_[new|missing].txt
- files are used to facilitate manual change tracking, files suffixed
new
will contain all the new osm_id records which are not present in baseadmin_level_[0,1,2].txt
files, similarly files suffixedmissing
will contain osm_id records which were present in base files but now missing from the new OSM dataset - if you want to track changes base files need to be manually updated, however, base files are only useful when working with OSM data that covers the same area
- files are used to facilitate manual change tracking, files suffixed
- the simplification workflow consists of:
-
geometry deconstruction - a process which will combine all admin_level geometries and create an all_geom table that contains non overlapping geometries which are later used to create higher level topo geometries
- in a perfect dataset we would use admin_level_2 as a base topo geometry data and create admin_level_1 and admin_level_0
- in the real world, we need to fill in holes in admin_level_2 by using higher level geometries to later create higher level topo geometries
- topology creation - a process that creates base_level and higher topo geometries, it uses data from the all_geom
- geometry simplification - creates simple_admin_[0,1,2] tables that contain topologically simplified geometries, uses a tolearance parameter - minimal distance between two nodes in decimal degrees
-
geometry deconstruction - a process which will combine all admin_level geometries and create an all_geom table that contains non overlapping geometries which are later used to create higher level topo geometries
- the final results are simple_admin_[0,1,2]_view database views that have osm_id attribute, admin_level relationships, natural and simplified geometries
- to run topological simplification process in the database, simply execute:
psql -f postgis_sql/simplify_admin_workflow.sql
- bare in mind that the only two parameters which can be manually changed are:
-
fill_holes BOOLEAN DEFAULT 't'
fordeconstruct_geometry()
function - setting it to false will assume that we have perfect data that has no holes -
tolerance float DEFAULT 0.1 of a degree
forcreate_simple_geoms()
function- if the Earth circumference is ~40000km then 1 degree ~ 111km, 0.1 of a degree ~ 11km, 0.01 of a degree ~ 1km
-
- to extract natural and simplified geometries from the database to a ZIP file (created in the current working directory) with geoJSON files run:
python generate_geojson.py --all --rm
- to extract one or more specific countries, you can specify them on the command line
python generate_geojson.py 88210 87565 --rm
-
extraction and simplification process depends only on the initial OSM dataset
-
the database will be flushed/purged on every process run
-
the extraction and simplification process is CPU bound and it can take quite some time
-
on a modern i7 laptop (16Gb of RAM), using tolerance of 0.01 of a degree for prepared Africa dataset:
- prepared dataset in this context is a stripped down OSM dataset which contains only admin_level features
- export.py ~ 3min
- hierarchical topology simplification:
- geometry deconstruction ~ 80 sec
- topology creation ~ 1h 50 min
- geometry simplification ~ 2h 20min
- generate_geojson.py ~ 2min
-
on a modern i7 laptop (16Gb of RAM), using tolerance of 0.1 of a degree for prepared Africa dataset:
- export.py ~ 3min
- hierarchical topology simplification:
- geometry deconstruction ~ 80 sec
- topology creation ~ 1h 50 min
- geometry simplification ~ 1h 30min
- generate_geojson.py ~ 2 min
-
on AWS instance provided by Nyaruka, using tolerance of 0.01 of a degree for prepared World dataset:
- export.py ~ 50min
- hierarchical topology simplification:
- geometry deconstruction ~ 1h 30m
- topology creation ~ 78h 40 min
- geometry simplification ~ more then 9 days (still running)
- generate_geojson.py ~ ???