Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

Blocking query with postgresql #67

Open
mhugo opened this issue May 19, 2017 · 3 comments
Open

Blocking query with postgresql #67

mhugo opened this issue May 19, 2017 · 3 comments
Labels

Comments

@mhugo
Copy link

mhugo commented May 19, 2017

Trying to import twice the same gtfs feed with a different feed_id results in a blocking query the second time.

Steps to reproduce:

  1. create an empty postgresql database
  2. gtfdbloader postgresql://db --load=gtfs.zip --id=T1 => ok
  3. gtfdbloader postgresql://db --load=gtfs.zip --id=T2 => blocks on "Normalizing shapes and trips..."

The issue seems to come from https://github.com/afimb/gtfslib-python/blob/master/gtfslib/converter.py#L596 but I am not sure exactly what happens.

For information, if I change prefetch_stop_times=True, prefetch_stops=True to False, there is no problem anymore (but the import is slower). Not sure to understand ...

@laurentg any clue ?

@laurentg
Copy link
Contributor

I just tested with a sample GTFS with shapes to normalize and it works fine loading twice on a postgresql database. Can you share the GTFS you are experiencing problem with?

@laurentg laurentg added the bug label May 21, 2017
@mhugo
Copy link
Author

mhugo commented May 22, 2017

You can test with this GTFS : https://ressources.data.sncf.com/explore/dataset/sncf-ter-gtfs/

@laurentg
Copy link
Contributor

laurentg commented Jun 3, 2017

I do not really understand. Apparently the bug is caused by PostgreSQL choking on a simple query (100% CPU usage and never returning). The query, as generated by SqlAlchemy, is:

SELECT stop_times.feed_id AS stop_times_feed_id, stop_times.trip_id AS stop_times_trip_id, stop_times.stop_sequence AS stop_times_stop_sequence, stop_times.stop_id AS stop_times_stop_id, stop_times.arrival_time AS stop_times_arrival_time, stop_times.departure_time AS stop_times_departure_time, stop_times.interpolated AS stop_times_interpolated, stop_times.shape_dist_traveled AS stop_times_shape_dist_traveled, stop_times.timepoint AS stop_times_timepoint, stop_times.pickup_type AS stop_times_pickup_type, stop_times.drop_off_type AS stop_times_drop_off_type, stop_times.stop_headsign AS stop_times_stop_headsign, anon_1.trips_feed_id AS anon_1_trips_feed_id, anon_1.trips_trip_id AS anon_1_trips_trip_id FROM (SELECT trips.feed_id AS trips_feed_id, trips.trip_id AS trips_trip_id FROM trips WHERE trips.feed_id = 'TER1' AND trips.trip_id IN ('OCEDB018800F030015447', 'OCEDB018801F010015448', 'OCEDB018802F030015449')) AS anon_1 JOIN stop_times ON stop_times.trip_id = anon_1.trips_trip_id AND stop_times.feed_id = anon_1.trips_feed_id;

The funny thing is that:

  1. reducing batch size to low values (3) has no effect (still blocking)
  2. executing the exact same query manually works fine (using psql)
  3. changing a (to my taste unrelated) debug system parameter make PostgreSQL works:
    ALTER SYSTEM SET track_activity_query_size = 65536;

Leaving this as open for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants