-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load GitHub data into Postgres #2665
Comments
This would most likely use AWS step functions to run the CLI command for loading the data. @coilysiren can support on how to build/run the step function. |
Beep boop: Automatically setting the point and sprint values for this issue in project HHS/13 because they were unset when the issue was closed. |
1 similar comment
Beep boop: Automatically setting the point and sprint values for this issue in project HHS/13 because they were unset when the issue was closed. |
PR #2759 |
Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot. |
@widal001 can you tell us why this keeps getting moved to done? |
## Summary Fixes #2665 ### Time to review: __1 min__ ## Changes proposed > What was added, updated, or removed in this PR. Added `gh-transform-and-load` command to existing `make gh-data-export` command. I'm not sure if this is sufficient or correct, but I'm taking a guess based on what I see in #2546 and #2506. ## Context for reviewers > Testing instructions, background context, more in-depth details of the implementation, and anything else you'd like to call out or ask reviewers. Explain how the changes were verified. In the analytics work stream, we have a new CLI command `make gh-transform-and-load` for transforming and loading (some) GitHub data. Per issue #2665, that command should be run daily, after the existing `gh-data-export` command which exports data from Github. I see that `scheduled_jobs.tf` seems to be the mechanism by which `make gh-data-export` runs daily. In this PR I'm taking and educated guess and attempting to add `gh-transform-and-load` to the existing job, and requesting feedback from @coilysiren as to whether this is the correct approach. ## Additional information > Screenshots, GIF demos, code examples or output to help show the changes working as expected. Co-authored-by: kai [they] <[email protected]>
PR #2778 |
FYI - This task has many PRs, and every time I merge one of them, a bot closes the ticket! That's why I keep reopening the ticket. I will close it manually when the task is actually done. |
Related PR: #2779 |
## Summary Fixes #2665 ### Time to review: __1 min__ ## Changes proposed > What was added, updated, or removed in this PR. Added `gh-transform-and-load` command to existing `make gh-data-export` command. I'm not sure if this is sufficient or correct, but I'm taking a guess based on what I see in #2546 and #2506. ## Context for reviewers > Testing instructions, background context, more in-depth details of the implementation, and anything else you'd like to call out or ask reviewers. Explain how the changes were verified. In the analytics work stream, we have a new CLI command `make gh-transform-and-load` for transforming and loading (some) GitHub data. Per issue #2665, that command should be run daily, after the existing `gh-data-export` command which exports data from Github. I see that `scheduled_jobs.tf` seems to be the mechanism by which `make gh-data-export` runs daily. In this PR I'm taking and educated guess and attempting to add `gh-transform-and-load` to the existing job, and requesting feedback from @coilysiren as to whether this is the correct approach. ## Additional information > Screenshots, GIF demos, code examples or output to help show the changes working as expected. Co-authored-by: kai [they] <[email protected]>
## Summary Fixes #2665 ### Time to review: __1 min__ ## Changes proposed > What was added, updated, or removed in this PR. Added scheduled job to run `make init-db` ## Context for reviewers > Testing instructions, background context, more in-depth details of the implementation, and anything else you'd like to call out or ask reviewers. Explain how the changes were verified. The GitHub data export, transform, and load job (see #2759) depends on a certain schema existing in Postgres. This PR creates a job to ensure the schema exists. ## Additional information > Screenshots, GIF demos, code examples or output to help show the changes working as expected.
Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot. |
Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot. |
Beep boop: Automatically setting the point and sprint values for this issue in project HHS/13 because they were unset when the issue was closed. |
## Summary Relates to #2665 ### Time to review: __1 mins__ ## Changes proposed Adds `ENVIRONMENT` env var to the analytics container ## Context for reviewers https://betagrantsgov.slack.com/archives/C05TSL64VUH/p1731362475436509
Related: #2803 |
Bumped points to 5 due to amount of work put into this |
PR #2816 |
PR #2826 |
## Summary Partially Fixes #2665 ### Time to review: __1 min__ ## Changes proposed > What was added, updated, or removed in this PR. Adds db name to Postgres connection url; removes logging ## Context for reviewers > Testing instructions, background context, more in-depth details of the implementation, and anything else you'd like to call out or ask reviewers. Explain how the changes were verified. ## Additional information > Screenshots, GIF demos, code examples or output to help show the changes working as expected.
Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot. |
PR #2828 |
Summary
Load the GitHub issue data into Postgres so that we can start analyzing this data in Metabase.
Note: Ideally we'd be loading this in using the new data schema that @DavidDudas-Intuitial has been working on, but at as a fall back, we can use the
GitHubIssue.to_sql()
method to dump the flattened data into Postgres, and we should consider switching to that fallback strategy if we can't get the new data schema working by 11/5.TODO:
make gh-transform-and-load
make init-db
/analytics
db clientAcceptance criteria
The text was updated successfully, but these errors were encountered: