Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: pr/issue data by year #74

Open
lwasser opened this issue Jan 22, 2025 · 0 comments
Open

bug: pr/issue data by year #74

lwasser opened this issue Jan 22, 2025 · 0 comments

Comments

@lwasser
Copy link
Member

lwasser commented Jan 22, 2025

in the data directory we collect issues and pull requests. Initially I pulled all data from 2019-2023 and added it to a single csv. The idea was that the data would be kept for each year to keep the files small and avoid pulling data from previous years that we already have.

Right now, we have

2019-2023
2024_
2025

however, when I look at the files, I see issues and PRs from previous years in the current files. We should document what these files contain and then check the scripts to ensure that we are collecting data properly.

We also should have a date_opened and date_closed item on each row.
The data should be kept so the the 2024 data contains all issues and pr's OPENED in that year. they may be closed in 2025. The challenge here will be CI. there will be issues opened in 2019 that were closed in 2022 because we got funding, etc and more work got done. Or issues that opened in late 2024 and were resoled in 2025.

So we might want a cron job to go back and add date_Closed to issues and pr's opened in a previous year - maybe that runs monthly and parses all data. vs the bi-weekly updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant