-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPE-4487] Add Integration Tests for Azure Storage #89
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code does what it should, but I have a comment to maybe make the tests more maintainable and reusable, possibly also improving in consistency.
Right now we handle s3 setup and azure setup a bit differently, so for instance: s3 configuration are fed into the spark-submit command, while azure are fed using kubectl
commands. Also a number of business logic in the iceberg test is just copied and pasted. I'm wondering whether we could rewrite the tests such that it reads:
(setup_user_context && setup_object_storage_s3 && test_iceberg_example_in_pod && cleanup_user_success) || cleanup_user_failure_in_pod
(setup_user_context && setup_object_storage_azure && test_iceberg_example_in_pod && cleanup_user_success) || cleanup_user_failure_in_pod
The custom part (between the two) is just the setup_object_storage_*
part, where we both setup using the CLI but also we inject the right configuration using the spark-client.service-account-registry
that is embedded in the OCI image.
Then, the iceberg test should be just using these configuration already injected and add the spark-submit command with the iceberg configuration only
It is not too critical, but I would honestly spend some time right now to do this, such that when adding other backends (blob storage, abfs, etc) it should be easier and more straight forward. Also this should be super easy to translate this into more structured tests, where the azure or s3 configuration is setup by the integration-hub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I have only a comment regarding the massive usage of bash. As future work, strongly believe we should starting using the spark-test
library and move all our tests with pytest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! The tests look really great! thanks!
edit I noticed that sql tests is with S3 only. Would it be possible to make it both for S3 and azure?
Thanks @deusebio. I've just updated the PR to make the SQL tests for both S3 and Azure storage. |
great!!! Thanks! Feel free to merge! I'm very happy with this PR, I believe it provides also an improved structure of functionalites! |
No description provided.