Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native S3 transfer driver #1010

Merged
merged 67 commits into from
Apr 5, 2024
Merged

Native S3 transfer driver #1010

merged 67 commits into from
Apr 5, 2024

Conversation

gcglinton
Copy link
Contributor

Per many discussions, voila, a native S3-compatible transfer driver.

Built on the backs of the good work done by @reidsunderland, and @tysonkaufmann, and a collaborative programming session with @petersilva, @reidsunderland, and @andreleblanc11.

It does all the things it's supposed to do, at least as far as I can tell.

I ran a very simple set of configs, and there was no errors:

  • A poll to find files in an S3 bucket
  • A subscribe to download and save them, and then post a new message for..
  • A sender to pick up those local files and re-upload them to a different bucket

It's only been tried against AWS S3 proper, so it may or may not work with MinIO, or others, but theoretically, everything is in place to do that.

More testing will likely be required to ensure it's working exactly as intended, but this might be a good start.

There's also some unit tests, because they're tasty, and fun, and I like writing them (what does that say about me?). It also meant figuring how to mock boto3 in order to keep from having to spin up a dependent service, which turned out to be super easy using moto.

The only issue left is that these new tests fail on Ubuntu 20.04, but they work fine on 22.04. It seems like the Ubuntu provided python3-openssl is broken, or incompatible with boto3. Following a combination of answers from this SO question led me to fixing it by forcing pip to upgrade the pyOpenSSL package: sudo pip3 install pyopenssl --upgrade

Fixing the packages in the runner before doing the unit tests is the best way to go about it, but that would also mean documenting that fix somewhere so users can actually make it work on Ubuntu 20.04, and potentially other distributions/versions as well.

It might be also be possible to conditionally skip the s3 tests, based on Python version, or possibly even OS, but from what I've been able to find, that's not very reliable, and very much doesn't solve the issue.

gcglinton and others added 30 commits March 26, 2024 15:59
- Add all the methods I think it needs
- Re-order them alphabetically
- Work on how to configure it
Worked with Reid and Peter to get the S3 transfer class connecting to S3, and recusively LS-ing objects in a bucket.
It also inherently gets `__credentials`, and `cd` working too.
It does send, the paths are weird, and I don't know what they're supposed to do.

I suspect there's some changes to make.
Also cleaned up some test code, and did some re-formatting.
Also fix statically assigned bucket and key names in object size fetch on puts
Just the basics for now, but more to come.
They're not 100% complete, but overall coverage of the class is at 90%, which seems like it's good enough.
petersilva and others added 16 commits April 4, 2024 16:14
- Add all the methods I think it needs
- Re-order them alphabetically
- Work on how to configure it
Worked with Reid and Peter to get the S3 transfer class connecting to S3, and recusively LS-ing objects in a bucket.
It also inherently gets `__credentials`, and `cd` working too.
It does send, the paths are weird, and I don't know what they're supposed to do.

I suspect there's some changes to make.
Also cleaned up some test code, and did some re-formatting.
Also fix statically assigned bucket and key names in object size fetch on puts
Just the basics for now, but more to come.
They're not 100% complete, but overall coverage of the class is at 90%, which seems like it's good enough.
@petersilva
Copy link
Contributor

The PR looks really good. I read over the commits, and they seem fine. I'm a bit puzzled about what to do about the unit tests.
I'm looking at unit tests in other contexts, and don't understand how to fix these other things either... some thought needed, but I don't know if the unit test issues show slow/hamper this PR. more thoughts welcome.

Copy link
Contributor

@petersilva petersilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I'm not sure what to do about the unit tests... they have other issues also.
perhaps merge first and fix after?

@petersilva
Copy link
Contributor

OK... well I tried to get the unit tests to pass. I'm pretty sure they will AFTER this gets merged... but I can't figure out why it isn't passing already.

@petersilva petersilva marked this pull request as ready for review April 5, 2024 06:33
@petersilva petersilva merged commit 071ca49 into development Apr 5, 2024
2 of 4 checks passed
@petersilva
Copy link
Contributor

sigh... I merged it, but the unit tests still have a failure on ubuntu 20.04 as @gcglinton had mentioned.
have to look at that some more, but deal with it post merge.

@gcglinton gcglinton deleted the Issue939_S3Transfer branch April 5, 2024 13:35
@gcglinton gcglinton mentioned this pull request Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants