Skip to content
Dražen Odobašić edited this page May 6, 2014 · 21 revisions

Ubuntu 14.04 setup

Add additional software repositories

  • create a file /etc/apt/sources.list.d/nginx.list:
deb codename nginx
deb-src codename nginx
  • curl | apt-key add -
  • apt-get update

Installing base packages

  • apt-get install git tig gdal-bin libgdal-dev python-dev python-virtualenv build-essential libyaml-dev libspatialindex-dev postgresql-9.3-postgis-2.1 nginx uwsgi uwsgi-plugin-python

Configure PostgreSQL

  • in file /etc/postgresql/9.3/main/postgresql.conf update:
    • shared_buffers = 512MB, temp_buffers = 16MB, work_mem = 32MB, maintenance_work_mem = 128MB, effective_cache_size = 1024MB, checkpoint_segments = 16, wal_buffers = 16MB, checkpoint_segments = 32, checkpoint_completion_target = 0.3, random_page_cost = 1.1 #AWS specific
  • as postgres user create ubuntu superuser
    • createuser -s ubuntu

Service setup

  • clone git repository
  • initialize python virtual environment virtualenv ~/posm_env
  • manually install packages (do not use pip install -r pip-requires.txt)
    • pip install Shapely==1.3.0 Rtree==0.7.0 PyYAML==3.11
    • ubuntu specific fix to install GDAL in the virtual_env
      • pip install --no-install GDAL==1.10.0 && cd ~/posm_env/build/GDAL/ && python build_ext --include-dirs=/usr/include/gdal && pip install --no-download GDAL && cd -

Extract configuration

  • create database and install extensions:

    • createdb posm
    • psql -c 'create extension postgis;' posm
    • psql -c 'create extension postgis_topology;' posm
  • create plpgsql functions:

    • psql -f extractor/postgis_sql/proc_functions.sql posm
  • in posm/extractor directory copy template YAML configuration files: cp admin_mapping.yaml.tmpl admin_mapping.yaml && cp settings.yaml.tmpl settings.yaml && cp admin_level_0.txt.tmpl admin_level_0.txt && cp admin_level_1.txt.tmpl admin_level_1.txt && cp admin_level_2.txt.tmpl admin_level_2.txt

  • in settings.yaml set:

    • osm_data_file to an OSM data source file (*.pbf)
    • memory_limit to '1' for AWS
      • due to OSM data size (~500mb), script requires ~3Gb of memory to store temporary OSM data, additionally it will require 1.5Gb of memory for the actual Python processing
      • as the server has 3.7Gb, we can limit memory usage to 1Mb which will force the script to store temporary OSM data on the disk
      • if you have a lot of memory as a rule of thumb, you can set it to 3 times the size of the OSM .pbf dataset
      • also check the debug_file file for any messages like 'Not enough memory for temporary storage, ...' and increase memory limit if needed
    • postgis to your postgis database identifier, i.e. "PG:dbname=posm", omitting other parameters

Running the extraction

  • to extract admin_levels from the OSM dataset and import them to postgis database, run:
    • python
  • after the extract finishes the current directory will contain 6 new files, named admin_[0,1,2]_[new|missing].txt
    • files are used to facilitate manual change tracking, files suffixed new will contain all the new osm_id records which are not present in base admin_level_[0,1,2].txt files, similarly files suffixed missing will contain osm_id records which were present in base files but now missing from the new OSM dataset
    • if you want to track changes base files need to be manually updated, however, base files are only useful when working with OSM data that covers the same area

Hierarchical Topological simplification

  • the simplification workflow consists of:
    • geometry deconstruction - a process which will combine all admin_level geometries and create an all_geom table that contains non overlapping geometries which are later used to create higher level topo geometries
      • in a perfect dataset we would use admin_level_2 as a base topo geometry data and create admin_level_1 and admin_level_0
      • in the real world, we need to fill in holes in admin_level_2 by using higher level geometries to later create higher level topo geometries
    • topology creation - a process that creates base_level and higher topo geometries, it uses data from the all_geom
    • geometry simplification - creates simple_admin_[0,1,2] tables that contain topologically simplified geometries, uses a tolearance parameter - minimal distance between two nodes in decimal degrees
  • the final results are simple_admin_[0,1,2]_view database views that have osm_id attribute, admin_level relationships, natural and simplified geometries
  • to run topological simplification process in the database, simply execute:
    • psql -f postgis_sql/simplify_admin_workflow.sql
    • bare in mind that the only two parameters which can be manually changed are:
      • fill_holes BOOLEAN DEFAULT 't' for deconstruct_geometry() function - setting it to false will assume that we have perfect data that has no holes
      • tolerance float DEFAULT 0.1 of a degree for create_simple_geoms() function
        • if the Earth circumference is ~40000km then 1 degree ~ 111km, 0.1 of a degree ~ 11km, 0.01 of a degree ~ 1km

GeoJSON extraction

  • to extract natural and simplified geometries from the database to a ZIP file (created in the current working directory) with geoJSON files run:
    • python --all --rm
  • to extract one or more specific countries, you can specify them on the command line
    • python 88210 87565 --rm

Final remarks

  • extraction and simplification process depends only on the initial OSM dataset

  • the database will be flushed/purged on every process run

  • the extraction and simplification process is CPU bound and it can take quite some time

  • on a modern i7 laptop (16Gb of RAM), using tolerance of 0.01 of a degree for prepared Africa dataset:

    • prepared dataset in this context is a stripped down OSM dataset which contains only admin_level features
    • ~ 3min
    • hierarchical topology simplification:
      • geometry deconstruction ~ 80 sec
      • topology creation ~ 1h 50 min
      • geometry simplification ~ 2h 20min
    • ~ 2min
  • on a modern i7 laptop (16Gb of RAM), using tolerance of 0.1 of a degree for prepared Africa dataset:

    • ~ 3min
    • hierarchical topology simplification:
      • geometry deconstruction ~ 80 sec
      • topology creation ~ 1h 50 min
      • geometry simplification ~ 1h 30min
    • ~ 2 min
  • on AWS instance provided by Nyaruka, using tolerance of 0.01 of a degree for prepared World dataset:

    • ~ 50min
    • hierarchical topology simplification:
      • geometry deconstruction ~ 1h 30m
      • topology creation ~ 78h 40 min
      • geometry simplification ~ more then 9 days (still running)
    • ~ ???
Clone this wiki locally