Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Observability for the AWS Glue ETL Pipeline #200

Open
arjun2189 opened this issue Apr 19, 2022 · 0 comments
Open

Data Observability for the AWS Glue ETL Pipeline #200

arjun2189 opened this issue Apr 19, 2022 · 0 comments

Comments

@arjun2189
Copy link

Problem

To begin with not all companies may have a full grown datawarehouse and might use the datalake itself as a single place to start with. Our use case is kind of similar where s3 is our datalake, Glue jobs are our Transform step and then Athena is our query engine. AWS Glue provides its own monitoring dashboard but its only at the job levels, like how many jobs were run, how many successful and how many failed

Solution

It would be great to not only have the Job level metrics but also the Data level metrics, like counts of each table corresponding to a particular Glue job (if a table is exposed). All these can be easily pulled from the Glue Catalog Metadata. Was there any anomaly in the counts for the regular jobs. Some of the metrics can be exposed from your current solution, where we can have when was the last time the job was run/updated, table counts were updated etc
Glue job also comes with some metadata within itself like the number of workers used for a particular job, the timeout associated with it, Python version etc. Any way to observe that would also be a great addition.

Requirements

Any requirements that will be necessary for the feature to work.

Additional Context

Add any other context, screenshots, or related issues about the feature request here.

Questions, or need help getting started?

Feel free to ask below, or ping us on Slack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant