-
Notifications
You must be signed in to change notification settings - Fork 329
Support wasb://
and wasbs://
#1663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There is also an open issue on the Regarding fsspec/adlfs#493, is the protocol identical? |
I am not sure but have been testing it with Azurite localy and it works as expected. I am going to try use it on the cloud. |
@christophediprima Thanks for testing that, appreciate it. We also test against |
We have been testing it on Azure Blob Storage with my team and we had no issues. What kind of tests can you think about? |
Looks like we have a few adls integration tests against the azurite docker iceberg-python/tests/io/test_fsspec.py Line 298 in b86d7d5
perhaps we can extend these to include wasb and wasbs |
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Starting from version 20, PyArrow supports ADLS filesystem. This PR adds Pyarrow Azure support to Pyiceberg. PyArrow is the [default IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369) for Pyiceberg catalogs. In Azure environment it handles wider spectrum of auth strategies then Fsspec, including, for instance, [Managed Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview). Also, prior to this PR #1663 (that is not merged yet) there was no support for wasb(s) with Fsspec. See the corresponding issue for more details: #2112 # Are these changes tested? Tests are added under tests/io/test_pyarrow.py. # Are there any user-facing changes? There are no API breaking changes. Direct impact of the PR: Pyarrow FileIO in Pyiceberg supports Azure cloud environment. Examples of impact for final users: - Pyiceberg is usable in services with Managed Identities auth strategy. - Pyiceberg is usable with wasb(s) schemes in Azure. <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Kevin Liu <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Starting from version 20, PyArrow supports ADLS filesystem. This PR adds Pyarrow Azure support to Pyiceberg. PyArrow is the [default IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369) for Pyiceberg catalogs. In Azure environment it handles wider spectrum of auth strategies then Fsspec, including, for instance, [Managed Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview). Also, prior to this PR apache#1663 (that is not merged yet) there was no support for wasb(s) with Fsspec. See the corresponding issue for more details: apache#2112 # Are these changes tested? Tests are added under tests/io/test_pyarrow.py. # Are there any user-facing changes? There are no API breaking changes. Direct impact of the PR: Pyarrow FileIO in Pyiceberg supports Azure cloud environment. Examples of impact for final users: - Pyiceberg is usable in services with Managed Identities auth strategy. - Pyiceberg is usable with wasb(s) schemes in Azure. <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Kevin Liu <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
This will work as soon as this is merged: fsspec/adlfs#493