OSMNoteSync is a simple XML parser written in python that takes the daily note dump file from http://planet.openstreetmap.org/ and shoves the data into a simple postgres database so it can be queried.
It can also keep a database created with a daily dump file up to date using OSM API calls returning latest notes. In addition, the notes hidden by OSM moderators, which are not exposed in the above API call, can be pruned from the database by detecting missing note IDs in the daily dump, up to its generation date.
OSMNoteSync works with Python 3.6 or newer.
Aside from postgresql, OSMNoteSync depends on the python libraries psycopg2 and lxml. On Debian-based systems this means installing the python-psycopg2 and python-lxml packages.
If you are using pip
and virtualenv
, you can install all dependencies with pip install -r requirements.txt
.
If you want to parse the note file without first unzipping it, you will also need to install the bz2file library since the built in bz2 library can not handle multi-stream bzip files.
For building geometries, postgis
extension needs to be installed.
OSMNoteSync expects a postgres database to be set up for it. It can likely co-exist within another database if desired. Otherwise, As the postgres user execute:
createdb notes
It is easiest if your OS user has access to this database. I just created a user and made myself a superuser. Probably not best practices.
createuser <username>
TODO: unchecked!
sudo apt install sudo screen locate git tar unzip wget bzip2 apache2 python3-psycopg2 python3-yaml libpq-dev postgresql postgresql-contrib postgis postgresql-15-postgis-3 postgresql-15-postgis-3-scripts net-tools curl python3-full gcc libpython3.11-dev libxml2-dev libxslt-dev
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
sudo -u postgres -i
createuser youruseraccount
createdb -E UTF8 -O youruseraccount notes
psql
\c notes
CREATE EXTENSION postgis;
ALTER TABLE geometry_columns OWNER TO youruseraccount;
ALTER TABLE spatial_ref_sys OWNER TO youruseraccount;
\q
exit
The first time you run it, you will need to include the -c | --create option to create the table:
python osmnotesync.py -d <database> -c -g
The -g
| --geometry
argument is optional and builds point geometries for notes so that you can query which notes are within which areas.
The create function can be combined with the file option to immediately parse a file.
To parse a dump file, use the -f | --file option.
python osmnotesync.py -d <database> -g -f /tmp/planet-notes-latest.osm.bz2
If no other arguments are given, it will access postgres using the default settings of the postgres client, typically connecting on the unix socket as the current OS user. Use the --help
argument to see optional arguments for connecting to postgres.
Again, the -g
| --geometry
argument is optional. Either planet-notes-latest.osm.bz2 or nothing can be used to populate the database.
After you have parsed a daily dump file into the database, the database can be kept up to date using OSM API call that return latest N notes. Usually N is 200 except for the initial replication in which for safety we use 10000, the maximum number, so that we cover whatever has changed after downloading of the dump.
python osmnotesync.py -d <database> -r [-g]
Run this command as often as you wish to keep your database up to date with OSM.
Currently the only way to get rid of notes hidden by the OSM moderators is to scan the latest note dump for missing IDs and delete them from the DB. To account for the fact that moderators may have reopened notes after the dump generation time, OSMNoteSync intelligently skips notes with activity newer than dump generation time. This means it could take up to 24 h for a hidden note to be deleted from DB.
python osmnotesync.py -d <database> -D -f planet-notes-latest.osn.bz2 [-g]
In call_osmnotesync_replication.sh
there is a ready-made script to put inside a cron job. Redirect its output (>>
) to a log file if you wish.
- Prints a status message every 100,000 records.
- Takes 10-15 minutes to import the current dump on a decent home computer.
- I have commonly queried fields indexed. Depending on what you want to do, you may need more indexes.
OSMNoteSync populates two tables with the following structure:
note:
Primary table of all notes with the following columns:
id
: note IDcreated_at
: create timeclosed_at
: close timelat/lon
: latitude and longitude in decimal degreesgeom
: [optional] a postgis geometry column ofPoint
type (SRID: 4326) Note thatclosed_at
is null for open notes.
note_comment: Comments and other actions for all notes.
note_id
: foreign key to the ID of relevant noteaction
: one of:opened
,closed
,commented
,reopened
,hidden
time_stamp
: Time stamp of the actionuser_id
: Numerical user IDuser_name
: User nametxt
: Comment body Note that for actions other thanopened
andcommented
,txt
can be null. Alsohidden
notes appear in the DB only if they have beenreopened
by the moderator.
users: ID to user name mapping table.
id
: Numerical user IDname
: User name Note that anyuser_id
<->user_name
inconsistencies innote_comment
due to user name changes are fixed at each replication.
TODO
Copyright (C) 2012 Toby Murray, (C) 2023 Michał Brzozowski
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Affero General Public License for more details: http://www.gnu.org/licenses/agpl.txt