Skip to content

Commit

Permalink
fix: Add more content on Overture data files
Browse files Browse the repository at this point in the history
  • Loading branch information
rsavoye committed Oct 23, 2023
1 parent 85e8b30 commit 09f109f
Showing 1 changed file with 56 additions and 14 deletions.
70 changes: 56 additions & 14 deletions docs/overture.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ time. Each file has features spread across the planet, instead of a
subset in a geographical region. If you wish to get all the data for a
region, you have to load all 120 files into a database.

While the Overture recommends using [Amazon
While Overture recommends using [Amazon
Athena](https://aws.amazon.com/athena/) or [Microsoft
Synapse](https://learn.microsoft.com/en-us/azure/synapse-analytics/get-started-create-workspace),
you can also use a database.
Expand All @@ -30,28 +30,22 @@ import a parquet file into postgres. In these cases the database
schema will resemble the Overture schema. Since HOT maintains it's own
database schema that is also optimized for query performance, you can
use the [importer](https://hotosm.github.io/osm-rawdata/importer/)
program to import into the Underpass schema.
program to import into the Underpass schema. The importer utility can
parse any of the data files that are using the V2 schema into GeoJson.

## Schema

There are two versions of the file schema. The original schemawas had
less columns in it, and each data type had a schema oriented towards
that data type. The new schema (Oct 2023) is larger, but all the data
types are supported in the same schema.

The schema used in the Overture data files is [documented here](
https://docs.overturemaps.org/reference). This document is just a
summary with some implementation details.

### Buildings

* id: tmp_[Giant HEX number]
* updatetime: The last time a feature was updated
* version: The version of the feature
* names: The names of the buiding
* height: The heigth of the feature in meters
* numfloors: The numbers of floors in the building
* class: The type of building, residential, commericial, etc...
* geometry: The feature geometry
* sources: A list of dataset sources with optional recordId
* level: This appears to be unused
* bbox: A bounding box of the feature

The current list of buildings datasets is:

* Austin Building Footprints Year 2013 2D Buildings
Expand All @@ -69,6 +63,51 @@ The current list of buildings datasets is:
* USGS Lidar
* Washington DC Open Data 3D Buildings

Since the Microsoft ML Buildings and the OpenStreetMap data is
available elsewhere, and is more up-to-date for global coverage, all
of the other datasets are US only at this time.

The primary columns of interest to OSM are the number of building
floors, the height in meters, and the name if it has one. These
columns are not set in all of the datasets, but where they exist, they
can be added to OSM during conflation.

As a warning, the USGS Lidar dataset has many really bad building
geometries, so it's only the height column that is useful, if
accurate.

### Places

The *places* data are POIs of places. This appears to be for
amenities, and contains tags related to that OSM category. This
dataset is from Meta, and the data appears derived from Facebook.

The columns that are of interest to OSM are:

* freeform - The address of the amenity, although the format is not
consistent
* socials - An array of social media links for this amenity.
* phone - The phone number if it has one
* websites - The website URL if it has one
* value - The name of the amenity if known

### Highways

In the current highway *segment* data files, the only source is
OSM. In that cases it's better to use uptodate OSM data. It'll be
interesting to see if Overture imports the publically available
highway datasets from the USGS, or some state governments. That would
be very useful.

The Overture *segments* data files are equivalent to an OSM way, with
tags specific to that highway linestring. There are separate data
files for *connections*, that are equivalant to an OSM relation.

### Admin Boundaries

The administrative boundaries data is only OSM data, so there is no
reason to care about these files.

# Special Columns

## names
Expand All @@ -81,6 +120,9 @@ a language value as well.
* alternate
* short

Each of these can have multiple values, each of which consists of a
value and the language.

## sources

The sources column is an array of with two entries. The first entry is
Expand Down

0 comments on commit 09f109f

Please sign in to comment.