Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: use AWS Data Wrangler instead of pyathena #37

Open
ndrluis opened this issue Dec 3, 2021 · 3 comments
Open

Suggestion: use AWS Data Wrangler instead of pyathena #37

ndrluis opened this issue Dec 3, 2021 · 3 comments

Comments

@ndrluis
Copy link

ndrluis commented Dec 3, 2021

Hello people, I'm starting to use this target and I'm missing some features that I'm already working to make some contributions here, but I think that we can make this codebase more simpler using AWS Data Wrangler instead of pyathena.

IDK if anyone here has worked before with this library, but aws data wr abstracts all the AWS calls and catalog/database manipulation and data upload to s3 making easier to implement the parquet writer #9 for example.

Can we discuss about?

References:
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/006%20-%20Amazon%20Athena.html
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/005%20-%20Glue%20Catalog.html
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/003%20-%20Amazon%20S3.html
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/012%20-%20CSV%20Crawler.html
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/017%20-%20Partition%20Projection.html

@ndrluis ndrluis changed the title Sugestion: use AWS Data Wrangler instead of pyathena Suggestion: use AWS Data Wrangler instead of pyathena Dec 3, 2021
@yummydum
Copy link

yummydum commented Dec 18, 2021

As another user of wrangler, I strongly agree. Many functionalities are already implemented in wrangler. I think this codebase can be a thin wrapper around wrangler to make it compliant to the Singer protocol.

@andrewcstewart
Copy link
Collaborator

Definitely worth consideration, especially as there is some discussion of rewriting the entire target at some point.

I've also come across https://github.com/akamai/pallas if anyone is familiar and can compare/contrast.

@ndrluis
Copy link
Author

ndrluis commented Jan 18, 2022

I created a target-s3-parquet using aws data wrangler to solves our problems with target-athena https://github.com/gupy-io/target-s3-parquet

The codebase has some hardcoded configuration, but we pretend to evolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants