Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load GitHub data into Postgres #2665

Open
3 of 6 tasks
widal001 opened this issue Oct 30, 2024 · 26 comments · Fixed by #2744, #2759, #2778, #2786 or #2796
Open
3 of 6 tasks

Load GitHub data into Postgres #2665

widal001 opened this issue Oct 30, 2024 · 26 comments · Fixed by #2744, #2759, #2778, #2786 or #2796
Assignees

Comments

@widal001
Copy link
Collaborator

widal001 commented Oct 30, 2024

Summary

Load the GitHub issue data into Postgres so that we can start analyzing this data in Metabase.

Note: Ideally we'd be loading this in using the new data schema that @DavidDudas-Intuitial has been working on, but at as a fall back, we can use the GitHubIssue.to_sql() method to dump the flattened data into Postgres, and we should consider switching to that fallback strategy if we can't get the new data schema working by 11/5.

TODO:

  • Run daily step function to execute make gh-transform-and-load
  • Add new step function to execute make init-db
  • Add IAM support to /analytics db client
  • Troubleshoot failed db connection attempts from ECS container

Acceptance criteria

  • On a daily basis, we load data exported from GitHub into the Postgres DB
  • Users with Metabase access can query this data in Metabase
@widal001
Copy link
Collaborator Author

This would most likely use AWS step functions to run the CLI command for loading the data. @coilysiren can support on how to build/run the step function.

@widal001
Copy link
Collaborator Author

widal001 commented Nov 5, 2024

Beep boop: Automatically setting the point and sprint values for this issue in project HHS/13 because they were unset when the issue was closed.

1 similar comment
@widal001
Copy link
Collaborator Author

widal001 commented Nov 5, 2024

Beep boop: Automatically setting the point and sprint values for this issue in project HHS/13 because they were unset when the issue was closed.

@DavidDudas-Intuitial
Copy link
Collaborator

DavidDudas-Intuitial commented Nov 6, 2024

PR #2759

@widal001
Copy link
Collaborator Author

widal001 commented Nov 7, 2024

Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot.

@widal001 widal001 closed this as completed Nov 7, 2024
@sarahknoppA6
Copy link
Collaborator

@widal001 can you tell us why this keeps getting moved to done?

DavidDudas-Intuitial added a commit that referenced this issue Nov 7, 2024
## Summary
Fixes #2665 

### Time to review: __1 min__

## Changes proposed
> What was added, updated, or removed in this PR.

Added `gh-transform-and-load` command to existing `make gh-data-export`
command. I'm not sure if this is sufficient or correct, but I'm taking a
guess based on what I see in
#2546 and
#2506.

## Context for reviewers
> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

In the analytics work stream, we have a new CLI command `make
gh-transform-and-load` for transforming and loading (some) GitHub data.
Per issue #2665, that command should be run daily, after the existing
`gh-data-export` command which exports data from Github.

I see that `scheduled_jobs.tf` seems to be the mechanism by which `make
gh-data-export` runs daily. In this PR I'm taking and educated guess and
attempting to add `gh-transform-and-load` to the existing job, and
requesting feedback from @coilysiren as to whether this is the correct
approach.

## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.

Co-authored-by: kai [they] <[email protected]>
@DavidDudas-Intuitial
Copy link
Collaborator

PR #2778

@DavidDudas-Intuitial
Copy link
Collaborator

FYI - This task has many PRs, and every time I merge one of them, a bot closes the ticket! That's why I keep reopening the ticket. I will close it manually when the task is actually done.

@DavidDudas-Intuitial
Copy link
Collaborator

Related PR: #2779

babebe pushed a commit that referenced this issue Nov 7, 2024
## Summary
Fixes #2665 

### Time to review: __1 min__

## Changes proposed
> What was added, updated, or removed in this PR.

Added `gh-transform-and-load` command to existing `make gh-data-export`
command. I'm not sure if this is sufficient or correct, but I'm taking a
guess based on what I see in
#2546 and
#2506.

## Context for reviewers
> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

In the analytics work stream, we have a new CLI command `make
gh-transform-and-load` for transforming and loading (some) GitHub data.
Per issue #2665, that command should be run daily, after the existing
`gh-data-export` command which exports data from Github.

I see that `scheduled_jobs.tf` seems to be the mechanism by which `make
gh-data-export` runs daily. In this PR I'm taking and educated guess and
attempting to add `gh-transform-and-load` to the existing job, and
requesting feedback from @coilysiren as to whether this is the correct
approach.

## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.

Co-authored-by: kai [they] <[email protected]>
babebe pushed a commit that referenced this issue Nov 7, 2024
## Summary
Fixes #2665 

### Time to review: __1 min__

## Changes proposed
> What was added, updated, or removed in this PR.
Added scheduled job to run `make init-db` 

## Context for reviewers
> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

The GitHub data export, transform, and load job (see
#2759) depends on a
certain schema existing in Postgres. This PR creates a job to ensure the
schema exists.

## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.
@widal001
Copy link
Collaborator Author

Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot.

@widal001
Copy link
Collaborator Author

Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot.

@widal001
Copy link
Collaborator Author

Beep boop: Automatically setting the point and sprint values for this issue in project HHS/13 because they were unset when the issue was closed.

coilysiren added a commit that referenced this issue Nov 12, 2024
## Summary

Relates to #2665

### Time to review: __1 mins__

## Changes proposed

Adds `ENVIRONMENT` env var to the analytics container

## Context for reviewers

https://betagrantsgov.slack.com/archives/C05TSL64VUH/p1731362475436509
@DavidDudas-Intuitial
Copy link
Collaborator

Related: #2803

@DavidDudas-Intuitial
Copy link
Collaborator

Bumped points to 5 due to amount of work put into this

@DavidDudas-Intuitial
Copy link
Collaborator

PR #2816

@DavidDudas-Intuitial
Copy link
Collaborator

PR #2826

DavidDudas-Intuitial added a commit that referenced this issue Nov 13, 2024
## Summary
Partially Fixes #2665 

### Time to review: __1 min__

## Changes proposed
> What was added, updated, or removed in this PR.

Adds db name to Postgres connection url; removes logging

## Context for reviewers
> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.
@widal001
Copy link
Collaborator Author

Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot.

@DavidDudas-Intuitial
Copy link
Collaborator

PR #2828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment