(work in progress)
It uses xml.etree.ElementTree.iterparse, which I found helped with parsing large files without using too much memory.
It has no dependencies, noGDAL.
It doesn't implement the whole of the TransXChange standard, but attempts to handle all of the data available in Great Britain. On bustimes.org, I use it with data from:
- Transport for London (
https://tfl.gov.uk/tfl/syndication/feeds/journey-planner-timetables.zip
) - the Traveline National Dataset
- the Bus Open Data Service
- Stagecoach
- Passenger
It does some non-standard things in order to cope with the realities of the data that's out there:
WaitTime – where a vehicle waits for a period of time at a stop. The PTI profile says this must be included in both the To
and From
elements for a stop:
<JourneyPatternTimingLink id="jptl_33">
<From SequenceNumber="9">
<StopPointRef>249000000327</StopPointRef>
<TimingStatus>otherPoint</TimingStatus>
</From>
<To SequenceNumber="10">
<!-- here -->
<WaitTime>PT2M</WaitTime>
<StopPointRef>249000000328</StopPointRef>
<TimingStatus>principalTimingPoint</TimingStatus>
</To>
<RouteLinkRef>rl_0002_9</RouteLinkRef>
<RunTime>PT1M</RunTime>
</JourneyPatternTimingLink>
<JourneyPatternTimingLink id="jptl_34">
<From SequenceNumber="10">
<!-- and here -->
<WaitTime>PT2M</WaitTime>
<StopPointRef>249000000328</StopPointRef>
<TimingStatus>principalTimingPoint</TimingStatus>
</From>
<To SequenceNumber="11">
<StopPointRef>249000000301</StopPointRef>
<TimingStatus>otherPoint</TimingStatus>
</To>
<RouteLinkRef>rl_0002_10</RouteLinkRef>
<RunTime>PT1M</RunTime>
</JourneyPatternTimingLink>
As this is a bit controversial, and contradicts some other documentation that says the two wait times should be added (i.e. giving a total wait time of 4 minutes),
the parser will log a warning like this for each combination of WaitTime
, From/StopPointRef
and To/StopPointRef
:
correctly ignored second journey pattern wait time 00:02:00 at 249000000328
Due to a misunderstanding, some publishers do this instead – implying a wait at both consecutive stops:
<JourneyPatternTimingLink id="jptl_33">
<From SequenceNumber="9">
<StopPointRef>249000000327</StopPointRef>
<TimingStatus>otherPoint</TimingStatus>
</From>
<To SequenceNumber="10">
<StopPointRef>249000000328</StopPointRef>
<TimingStatus>principalTimingPoint</TimingStatus>
</To>
<RouteLinkRef>rl_0002_9</RouteLinkRef>
<RunTime>PT1M</RunTime>
</JourneyPatternTimingLink>
<JourneyPatternTimingLink id="jptl_34">
<From SequenceNumber="10">
<!-- here -->
<WaitTime>PT2M</WaitTime>
<StopPointRef>249000000328</StopPointRef>
<TimingStatus>principalTimingPoint</TimingStatus>
</From>
<To SequenceNumber="11">
<!-- and here -->
<WaitTime>PT2M</WaitTime>
<StopPointRef>249000000301</StopPointRef>
<TimingStatus>otherPoint</TimingStatus>
</To>
<RouteLinkRef>rl_0002_10</RouteLinkRef>
<RunTime>PT1M</RunTime>
</JourneyPatternTimingLink>
This parser will still interpret that "correctly", as a 2 minute wait at stop 249000000328
only, outputting a warning like this:
dodgily ignored second wait time 0:02:00 from 249000000328 to 249000000301
Think carefully whether you need to parse TransXChange data at all. GTFS (General Transit Feed Specification) is simpler, and it's used all around the world. The Bus Open Data services offers timetable data in GTFS format, converted from TransXChange, for the whole of Great Britain or just the regions you're interested in:
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/all/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/england/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/east_anglia/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/east_midlands/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/east_anglia/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/london/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/north_east/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/north_west/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/south_east/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/south_west/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/scotland/
https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/wales/
If you've thought about it and still want to use TransXChange, there may be better parsers available. For example, pytxc.