-
Notifications
You must be signed in to change notification settings - Fork 6
Internal model GTFS'
The internal model used, GTFS', is close to GTFS but simplified / normalized / expanded for ease of use.
The Export plug-in re-exports the data according to the GTFS' model, for example:
$ gtfsrun tatrobus.sqlite GtfsExport --bundle=tatrobus.zip
A calendar is a simple list of calendar dates. There is no date range, day of the week and positive/negative exceptions anymore.
We also force calendars to exists if they are defined in calendars OR calendar_dates (in the GTFS model the calendars table is optional, calendar_dates can be defined only).
This simplify greatly the following queries:
- List of calendars active a given day or a set of days (
SELECT ... WHERE calendar_dates.date = ?
) - List of calendars active before/after a given date or an interval (
SELECT ... WHERE calendar_dates.date <= ?
) - Computing the number of days a calendar is active (
SELECT calendar.id, COUNT(calendar_dates.*) ...
) - Computing the union of days a set of calendars is active (
SELECT DISTINCT calendar_dates.date ...
)
A few fields have their name changed:
-
stop.parent_station
has been renamed tostop.parent_station_id
, for consistency with other objects, and because it conflict with theparent_station
field that now refers to the linked parent station object.
All other fields and class/table names are identical to GTFS.
A new Zone
class is introduced for normalizing the stop-fare rule relationship. This class does not contains any fields except the feed and zone ID. The associated table is called zones
.
All optional fields with a default value are initialized to the default value if non defined. Below the list:
stop.location_type
stop.wheelchair_boarding
trip.wheelchair_accessible
trip.bikes_allowed
-
agency.agency_id
(in case a single agency exists) -
route.agency
(in case a single agency exists)
This allow for simpler queries, the caller not having to check for missing values.
All missing stop times are interpolated based on the distance between stops. Interpolated stop times have the field interpolated
set to True.
This allow for simpler processing of trip times, a stop time always have a stop time set (except first arrival and last departure). For example to query for all departures in a given hour range: ... WHERE stop_times.departure_time >= ? AND stop_times.departure_time <= ?
.
The first arrival time and last departure time of each trip are set to NULL (None).
This allows simpler queries, for example all departures from a stop only need to select non-null departure times (... WHERE stop_times.departure_time IS NOT NULL
), this will make sure the last stop times from each trip are not included in the result. The same for all arrivals to a stop (as first stop time should not be included).
A new Shape
class is introduced for normalizing the trip->shape relationship. Shape points are using the ShapePoint
class. The table shapes
is used for storing normalized Shape
entities, a new shape_pts
table is used for storing ShapePoint
entities. ShapePoint::shape_pt_sequence
are re-numbered using a consecutive index starting from 0 (same concept than StopTime::stop_sequence
). Shape distances are converted to meters and computed if missing (see below).
All missing traveled distances (stop_times.shape_dist_traveled
, shape.shape_dist_traveled
) are computed if missing, and all (including existing distances) are converted to meters. If no shapes are available, distance is simply the straight-line distance between stops.
This allow for simpler queries based on distance (... WHERE stb.shape_dist_traveled - sta.shape_dist_traveled > ?
) or speed (... WHERE (stb.shape_dist_traveled - sta.shape_dist_traveled) / (stb.departure_time - sta.departure_time) > ?
).
Please note that while shape.shape_dist_traveled
is always starting at 0.0, there is no guarantee that stop_times.shape_dist_traveled
will start at 0.0 for any given trip (a trip can start at any point alongside a shape). If shapes are missing for a trip, stop_times.shape_dist_traveled
will start at 0.0; but it's safer to never make that assumption. If you want to compute the traveled distance since the start of the trip, subtract the offset for the first stoptime:
distance = stop_time.shape_dist_traveled - stop_time.trip.stop_times[0].shape_dist_traveled
All stop_times.stop_sequence
are re-numbered from 0 using a consecutive index (0, 1, 2, 3...). The number of stop times for a trip is always equals to the last stop sequence + 1.
This allow for simpler queries for hops. For example:
- all hops between two stops (
... WHERE sta.stop_sequence = stb.stop_sequence + 1
) - the number of stops between two stop_times (
... stb.stop_sequence - sta.stop_sequence
) - trips passing by stop A then stop B (
... stb.stop_sequence > sta.stop_sequence
); although this would also work with a non-consecutive numbering. - selecting trip hop count (
SELECT trip.trip_id, MAX(stop_times.stop_sequence)+1 ...
); although it can also be done using a simple SQLCOUNT()
.
All frequencies are expanded to normal trips, and flagged as such with the Boolean frequency_generated
. The exact_times
flag is back-ported to the trip ("standard" trips having exact_times=1
, that is exactly scheduled). Both the initial frequencies and trips are deleted.
The ID for frequency-expanded trips is constructed by appending the trip departure time to the original trip ID, such as trip42@8:30:00
, trip42@8:40:00
, etc... This assume frequency-expanded trips do not overlap for the same original trip. (Note: this should be true according to the GTFS specifications, but may be wrong if two frequency rows associated to the same trip overlaps.)
In GTFS, a transfer that is defined for a station will apply, if not redefined, to all the station stops. A proposal to GTFS' is to expand any station transfer to sub-stops, if a transfer is not already redefined for the stops.
Goal: to provide for the API user an easy access of transfers between stops w/o having to check for transfers between stations. The check/load sequence can be rather complex (stop to stop, stop to station, station to stop, station to station...)
For example let's assume we have station A with stops A1 and A2, and station B with stops B1 and B2, and the following transfers:
from to type
A B 0
A1 B1 3
B A 1
B1 A 3
The transfer expansion process would expand to the following:
from to type source
A B 0 Original
A1 B1 3 Original
A1 B2 0 Expanded
A2 B1 0 Expanded
A2 B2 0 Expanded
B A 1 Original
B1 A 3 Original
B1 A1 3 Expanded
B1 A2 3 Expanded
B2 A1 1 Expanded
B2 A2 1 Expanded