Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Covid Recovery Dash generation code to ingestor #112

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

idreyn
Copy link
Contributor

@idreyn idreyn commented Sep 2, 2024

This PR finally ports the Covid Recovery Dash generation code to the data ingestor so we can run it daily and feed the resulting JSON into a CRD-like page on the Data Dashboard. The new code is very similar to what the CRD has, with a few major changes:

  • We grab data from the Ridership and ScheduledServiceDaily dynamo tables instead of reading from raw CSV files 🎉 Luckily most of that logic has long since been ported from CRD -> ingestor.
  • We provide weekly medians in dict[str, float] format instead of daily values in list[float] form — this tends to be easier to work with on the Data Dashboard side.

Going forward, the formats will continue to diverge, to remove covid-specific benchmarks and add by-mode totals, etc. But I've plugged this data into my local Covid Recovery Dash and confirmed that it's basically working.

The resulting JSON files are stored in the new tm-service-ridership-dash bucket (though surely a better name exists), indexed by date. It also always writes to latest.json for easy retrieval.

@idreyn idreyn linked an issue Sep 2, 2024 that may be closed by this pull request
@idreyn idreyn changed the title [WIP] Port Covid Recovery Dash generation code to ingestor Port Covid Recovery Dash generation code to ingestor Sep 2, 2024
@idreyn idreyn marked this pull request as ready for review September 2, 2024 15:38
@@ -99,6 +109,7 @@ def ingest_feed_to_dynamo(
"lineId": total.line_id,
"count": total.count,
"serviceMinutes": total.service_minutes,
"hasServiceExceptions": total.has_service_exceptions,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "recovery dash" code needs to know, in general, whether the service levels on a given day include any service exceptions (usually additions/removals for holidays). It was easiest just to add this as a column to the ScheduledServiceDaily table — the migration has already run.

Copy link
Member

@devinmatte devinmatte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just a few questions

@@ -112,6 +123,7 @@ def ingest_feeds(
force_rebuild_feeds: bool = False,
):
for feed in feeds:
feed.use_compact_only()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this okay for all the s3 uploads? This is what mbta-performance will use every half hour

@@ -148,3 +149,9 @@ def store_landing_data(event):
ridership_data = landing.get_ridership_data()
landing.upload_to_s3(json.dumps(trip_metrics_data), json.dumps(ridership_data))
landing.clear_cache()


# 9:00 UTC -> 4:30/5:30am ET every day (after GTFS and ridership have bene ingested)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 9:00 UTC -> 4:30/5:30am ET every day (after GTFS and ridership have bene ingested)
# 9:00 UTC -> 4:30/5:30am ET every day (after GTFS and ridership have been ingested)

res = {}
if isinstance(key_getter, str):
key_getter_as_str = key_getter
key_getter = lambda dict: dict[key_getter_as_str] # noqa: E731
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of disabling each line, we can disable this rule repo wise if it's not helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a new daily job to build Covid Recovery Dash data.json
2 participants