Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect lap times in 2023, Round 9, Austrian GP #13

Open
theOehrly opened this issue Mar 10, 2024 · 7 comments
Open

Incorrect lap times in 2023, Round 9, Austrian GP #13

theOehrly opened this issue Mar 10, 2024 · 7 comments
Labels
bug Something isn't working data Related to the data returned

Comments

@theOehrly
Copy link
Collaborator

Some lap times in the 2023 Austrian GP are incorrect. Specifically, the following combinations of driver/lap number have incorrect lap times:

alonso: 48
bottas: 46
de_vries: 46
gasly: 57
hamilton: 49
kevin_magnussen: 45
norris: 50
ocon: 46
piastri: 49
russell: 48
sainz: 54
stroll: 48
tsunoda: 37
zhou: 50

Notably, all these lap times are the personal best lap times of some other driver, and they were set on that exact lap.

Example

Alonso is listed with a lap time of "1:08.739" on lap 48. The correct time would be "1:09.634". But "1:08.739" was the fastest lap of Norris and Norris set that lap time on lap 48. Norris' lap time on lap 48 is correct, therefore, the lap time is duplicated, not swapped.

Guess

The fastest lap times are specially highlighted in the source PDF. I would assume that this might have been an error when parsing the PDF for old Ergast. The data likely was imported incorrectly from there in an old database dump.
grafik
What speaks against this theory is that this is correct in current production Ergast but it was seemingly never reported as an error there.

@theOehrly theOehrly added the bug Something isn't working label Mar 10, 2024
harningle added a commit to harningle/fia-doc that referenced this issue Mar 13, 2024
@harningle
Copy link
Collaborator

I haven't looked at Ergast code very carefully. There seems to be two sources for lap time: "Race History Chart" and "Race Lap Analysis", which is your screenshot.

My current parsing uses Race History Chart and the results are the same as Ergast csv database:

You can check my code at https://github.com/harningle/fia-doc/blob/main/parse_race_history_chart.py and https://github.com/harningle/fia-doc/blob/main/notebook/cross_validate.ipynb

@harningle
Copy link
Collaborator

As a side note, we can do a lot of automated sanity checks. E.g., we have pit stops as "P" in "Race Lap Analysis" and also in "Pit Stop Summary". The parsing results shall be the same from both PDFs. I feel like this can be a potential data quality test when we put the code in production

@theOehrly
Copy link
Collaborator Author

@harningle the data on current Ergast is correct. My speculation is that this was imported in an old database dump where it was incorrect. And given how PDFs are structured internally (or not really structured at all) and how the old Ergast PDF parser works, I think it may be possible that this was originally parsed incorrectly and then manually corrected at some point.

The double/sanity checks would certainly be great to have.

@jolpica jolpica added the data Related to the data returned label Mar 14, 2024
@jolpica
Copy link
Owner

jolpica commented Mar 17, 2024

I've found that this is because of incorrect data in the Ergast results table.
In the new database scheme, we no longer duplicate laptimes and fastest laptimes, so we choose the fastest laptime instead of the laps table time when its available.

This query to the results endpoint for Fernando Alonso
https://ergast.com/api/f1/2023/9/drivers/alonso/results.json
Returns:

...
"FastestLap": {
  "rank": "4",
  "lap": "48",
  "Time": {
    "time": "1:08.739"
  },
...

Which is what is listed as Alonso's lap 48.

I'm not sure theres much that can be done about this until we are live and perform corrections of the Ergast data.
Could you see if you confirm these findings?

@theOehrly
Copy link
Collaborator Author

@jolpica I agree, that seems to be the problem

And to correct my previous statement, this is NOT fixed in Ergast currently. The lap times are correct on the /laps endpoint. But the fastest lap returned by the /results endpoint is incorrect.

I also agree that it is probably best to hold back with fixing this, until we are independent of Ergast and import our own data.

@jolpica
Copy link
Owner

jolpica commented Mar 12, 2025

This issue is blocked by getting the import erroring on this round https://github.com/jolpica/jolpica-f1-pdf/tree/main-data/json/2023/09-Austrian%20Grand%20Prix/race

@jolpica jolpica moved this from Todo to In Review in jolpica-f1 Mar 12, 2025
@harningle
Copy link
Collaborator

This issue is blocked by getting the import erroring on this round https://github.com/jolpica/jolpica-f1-pdf/tree/main-data/json/2023/09-Austrian%20Grand%20Prix/race

yes it's due to harningle/fia-doc#26. Not yet fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data Related to the data returned
Projects
Status: In Review
Development

No branches or pull requests

3 participants