Skip to content

Commit 09f109f

Browse files
committed
fix: Add more content on Overture data files
1 parent 85e8b30 commit 09f109f

File tree

1 file changed

+56
-14
lines changed

1 file changed

+56
-14
lines changed

docs/overture.md

Lines changed: 56 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ time. Each file has features spread across the planet, instead of a
1818
subset in a geographical region. If you wish to get all the data for a
1919
region, you have to load all 120 files into a database.
2020

21-
While the Overture recommends using [Amazon
21+
While Overture recommends using [Amazon
2222
Athena](https://aws.amazon.com/athena/) or [Microsoft
2323
Synapse](https://learn.microsoft.com/en-us/azure/synapse-analytics/get-started-create-workspace),
2424
you can also use a database.
@@ -30,28 +30,22 @@ import a parquet file into postgres. In these cases the database
3030
schema will resemble the Overture schema. Since HOT maintains it's own
3131
database schema that is also optimized for query performance, you can
3232
use the [importer](https://hotosm.github.io/osm-rawdata/importer/)
33-
program to import into the Underpass schema.
33+
program to import into the Underpass schema. The importer utility can
34+
parse any of the data files that are using the V2 schema into GeoJson.
3435

3536
## Schema
3637

38+
There are two versions of the file schema. The original schemawas had
39+
less columns in it, and each data type had a schema oriented towards
40+
that data type. The new schema (Oct 2023) is larger, but all the data
41+
types are supported in the same schema.
42+
3743
The schema used in the Overture data files is [documented here](
3844
https://docs.overturemaps.org/reference). This document is just a
3945
summary with some implementation details.
4046

4147
### Buildings
4248

43-
* id: tmp_[Giant HEX number]
44-
* updatetime: The last time a feature was updated
45-
* version: The version of the feature
46-
* names: The names of the buiding
47-
* height: The heigth of the feature in meters
48-
* numfloors: The numbers of floors in the building
49-
* class: The type of building, residential, commericial, etc...
50-
* geometry: The feature geometry
51-
* sources: A list of dataset sources with optional recordId
52-
* level: This appears to be unused
53-
* bbox: A bounding box of the feature
54-
5549
The current list of buildings datasets is:
5650

5751
* Austin Building Footprints Year 2013 2D Buildings
@@ -69,6 +63,51 @@ The current list of buildings datasets is:
6963
* USGS Lidar
7064
* Washington DC Open Data 3D Buildings
7165

66+
Since the Microsoft ML Buildings and the OpenStreetMap data is
67+
available elsewhere, and is more up-to-date for global coverage, all
68+
of the other datasets are US only at this time.
69+
70+
The primary columns of interest to OSM are the number of building
71+
floors, the height in meters, and the name if it has one. These
72+
columns are not set in all of the datasets, but where they exist, they
73+
can be added to OSM during conflation.
74+
75+
As a warning, the USGS Lidar dataset has many really bad building
76+
geometries, so it's only the height column that is useful, if
77+
accurate.
78+
79+
### Places
80+
81+
The *places* data are POIs of places. This appears to be for
82+
amenities, and contains tags related to that OSM category. This
83+
dataset is from Meta, and the data appears derived from Facebook.
84+
85+
The columns that are of interest to OSM are:
86+
87+
* freeform - The address of the amenity, although the format is not
88+
consistent
89+
* socials - An array of social media links for this amenity.
90+
* phone - The phone number if it has one
91+
* websites - The website URL if it has one
92+
* value - The name of the amenity if known
93+
94+
### Highways
95+
96+
In the current highway *segment* data files, the only source is
97+
OSM. In that cases it's better to use uptodate OSM data. It'll be
98+
interesting to see if Overture imports the publically available
99+
highway datasets from the USGS, or some state governments. That would
100+
be very useful.
101+
102+
The Overture *segments* data files are equivalent to an OSM way, with
103+
tags specific to that highway linestring. There are separate data
104+
files for *connections*, that are equivalant to an OSM relation.
105+
106+
### Admin Boundaries
107+
108+
The administrative boundaries data is only OSM data, so there is no
109+
reason to care about these files.
110+
72111
# Special Columns
73112

74113
## names
@@ -81,6 +120,9 @@ a language value as well.
81120
* alternate
82121
* short
83122

123+
Each of these can have multiple values, each of which consists of a
124+
value and the language.
125+
84126
## sources
85127

86128
The sources column is an array of with two entries. The first entry is

0 commit comments

Comments
 (0)