Suggestion: use AWS Data Wrangler instead of pyathena #37

ndrluis · 2021-12-03T11:45:28Z

Hello people, I'm starting to use this target and I'm missing some features that I'm already working to make some contributions here, but I think that we can make this codebase more simpler using AWS Data Wrangler instead of pyathena.

IDK if anyone here has worked before with this library, but aws data wr abstracts all the AWS calls and catalog/database manipulation and data upload to s3 making easier to implement the parquet writer #9 for example.

Can we discuss about?

References:
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/006%20-%20Amazon%20Athena.html
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/005%20-%20Glue%20Catalog.html
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/003%20-%20Amazon%20S3.html
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/012%20-%20CSV%20Crawler.html
https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/017%20-%20Partition%20Projection.html

yummydum · 2021-12-18T08:39:53Z

As another user of wrangler, I strongly agree. Many functionalities are already implemented in wrangler. I think this codebase can be a thin wrapper around wrangler to make it compliant to the Singer protocol.

andrewcstewart · 2022-01-16T00:15:49Z

Definitely worth consideration, especially as there is some discussion of rewriting the entire target at some point.

I've also come across https://github.com/akamai/pallas if anyone is familiar and can compare/contrast.

ndrluis · 2022-01-18T12:27:45Z

I created a target-s3-parquet using aws data wrangler to solves our problems with target-athena https://github.com/gupy-io/target-s3-parquet

The codebase has some hardcoded configuration, but we pretend to evolve.

ndrluis changed the title ~~Sugestion: use AWS Data Wrangler instead of pyathena~~ Suggestion: use AWS Data Wrangler instead of pyathena Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: use AWS Data Wrangler instead of pyathena #37

Suggestion: use AWS Data Wrangler instead of pyathena #37

ndrluis commented Dec 3, 2021

yummydum commented Dec 18, 2021 •

edited

Loading

andrewcstewart commented Jan 16, 2022

ndrluis commented Jan 18, 2022

Suggestion: use AWS Data Wrangler instead of pyathena #37

Suggestion: use AWS Data Wrangler instead of pyathena #37

Comments

ndrluis commented Dec 3, 2021

yummydum commented Dec 18, 2021 • edited Loading

andrewcstewart commented Jan 16, 2022

ndrluis commented Jan 18, 2022

yummydum commented Dec 18, 2021 •

edited

Loading