Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix errors on GTFS RT validation #3513

Merged
merged 7 commits into from
Nov 18, 2024
Merged

Fix errors on GTFS RT validation #3513

merged 7 commits into from
Nov 18, 2024

Conversation

erikamov
Copy link
Contributor

@erikamov erikamov commented Oct 25, 2024

Description

This PR fixes some errors on GTFS RT validation that were found investigating the issue #2780:

  • It resolves a race condition where all parse_and_validate calls shared the same temporary directory
  • That contention meant that processes would overwrite the existing GTFS schedule with the same name
  • This also resulted in an elevated number of skipped protobuf validations
  • The gtfs-realtime-validator skips protobufs with the same MD5
  • The race condition caused elevated MD5 collisions for protobufs

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

Tested locally using poetry run pytest which replicates running the validator tool on the command line, e.g.:

GTFS_RT_VALIDATOR_JAR="gtfs-realtime-validator-lib-1.0.0-20220223.003525-2.jar" CALITP_BUCKET__GTFS_SCHEDULE_RAW="gs://test-calitp-gtfs-schedule-raw-v2" CALITP_BUCKET__GTFS_RT_RAW="gs://test-calitp-gtfs-rt-raw-v2" CALITP_BUCKET__GTFS_RT_PARSED="gs://test-calitp-gtfs-rt-parsed" CALITP_BUCKET__GTFS_RT_VALIDATION="gs://test-calitp-gtfs-rt-validation" GTFS_RT_VALIDATOR_VERSION="v1.0.0" poetry run python3 gtfs_rt_parser.py validate vehicle_positions 2024-10-17T00:00:00  --verbose

Post-merge follow-ups

  • No action required
  • Actions required (specified below)

Monitoring next DAG runs and query cal-itp-data-infra.staging.int_gtfs_quality__rt_validation_outcomes to see less errors happening.

@vevetron
Copy link
Contributor

Should the test be failing for this?

@erikamov erikamov force-pushed the mov/2780-gtfs-rt-validation branch 2 times, most recently from 0d2a924 to 6edb9a0 Compare November 15, 2024 20:36
ohrite and others added 7 commits November 18, 2024 11:10
* This simplifies the flow of control so that every command runs the same code

Signed-off-by: Doc Ritezel <[email protected]>
* This commit resolves a race condition where all parse_and_validate calls shared the same temporary directory
* That contention meant that processes would overwrite the existing GTFS schedule with the same name
* This also resulted in an elevated number of skipped protobuf validations
* The gtfs-realtime-validator skips protobufs with the same MD5
* The race condition caused elevated MD5 collisions for protobufs

Signed-off-by: Doc Ritezel <[email protected]>
@erikamov
Copy link
Contributor Author

Should the test be failing for this?

We added a tag to skip the tests until we can use GCS on Github Actions.

@erikamov erikamov requested a review from ohrite November 18, 2024 19:19
Copy link
Contributor

@ohrite ohrite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍐 Pairing w/ Erika

@erikamov erikamov merged commit d205f58 into main Nov 18, 2024
4 checks passed
@erikamov erikamov deleted the mov/2780-gtfs-rt-validation branch November 18, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants