Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track long outages #61

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Track long outages #61

wants to merge 8 commits into from

Conversation

hamima-halim
Copy link
Contributor

ISSUE

part 1 of a 2 parter to address stop_id/direction outages in our vehicle pings #51

in part 1, build out a cache/data structure that keeps track of degenerate GPS ping sequences that don't have stop_id or directional info. it should probably be purged daily
in part 2, we'll single out ping sequences that have been dark for sufficiently long (~1 minute) and start trying to intuit their route progress with a shape interpolation algorithm using shapes.txt.

BACKGROUND

as far as i can tell, there's 3 kinds of outages (c+p from the original issue)

  • short stretch outages, which occur for less than a minute and tend to happen at the beginning/end of the stop. these are short enough that we could probably ignore them and have reasonable calculations, even if they happen in the middle of a trip.
  • medium stretch outages, which occur for maybe 2-10 minutes at a time. we see these a lot on the 39 (potentially caused by a glitchy AVL) and they can cause us to lose information for a couple of stops.
  • long stretch outages, which might be because the AVL for a vehicle wasn't turned on but GPS was still reporting info.

THIS PR

i've set the cutoff for suspected extended outages to be 1 minute, after which we start trying to do shape interpolation things. there's some logic here to try and figure out just whether a bad ping is a continuation of a sequence we're already tracking, or if this is a new instance of an outage.

i really do not like this dict-as-a-cache setup here at the moment. yall got cleaner suggestions?

src/util.py Outdated
@@ -21,7 +21,7 @@ def output_dir_path(route_id: str, direction_id: str, stop_id: str, ts: datetime
delimiter = "_"
mode = "cr"
# rapid transit may rarely have dashes AND SPACES in stop id/route id!
# ex, Green_D_1-Union Square-02
# ex, Green-D_1-Union Square-02
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fake news-ass comment

@rudiejd
Copy link
Contributor

rudiejd commented Feb 2, 2024

if you're looking for a builtin solution for caching on the file-system to avoid ballooning memory usage, I think shelve could work for your needs here: https://docs.python.org/3/library/shelve.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants