Skip to content

Support wasb:// and wasbs:// #1663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

christophediprima
Copy link

This will work as soon as this is merged: fsspec/adlfs#493

@Fokko Fokko changed the title support wasb and wasbs Support wasb:// and wasbs:// Feb 14, 2025
@Fokko
Copy link
Contributor

Fokko commented Feb 14, 2025

There is also an open issue on the adlfs side: fsspec/adlfs#403

Regarding fsspec/adlfs#493, is the protocol identical?

@christophediprima
Copy link
Author

I am not sure but have been testing it with Azurite localy and it works as expected.
They have the corrrect endpoint for reaching AZBlob: https://github.com/fsspec/adlfs/blob/main/adlfs/spec.py#L488 and the java version supports it already.

I am going to try use it on the cloud.

@Fokko
Copy link
Contributor

Fokko commented Feb 18, 2025

@christophediprima Thanks for testing that, appreciate it.

We also test against azurite. Maybe we can add some simple tests as well to check the connection and make sure that we don't break it in the future.

@christophediprima
Copy link
Author

christophediprima commented Mar 10, 2025

We have been testing it on Azure Blob Storage with my team and we had no issues. What kind of tests can you think about?

@kevinjqliu
Copy link
Contributor

Looks like we have a few adls integration tests against the azurite docker

input_file = adls_fsspec_fileio.new_input(f"abfss://tests/{filename}")

perhaps we can extend these to include wasb and wasbs

kevinjqliu added a commit that referenced this pull request Jun 20, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

Starting from version 20, PyArrow supports ADLS filesystem. This PR adds
Pyarrow Azure support to Pyiceberg.

PyArrow is the [default
IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369)
for Pyiceberg catalogs. In Azure environment it handles wider spectrum
of auth strategies then Fsspec, including, for instance, [Managed
Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview).
Also, prior to this PR
#1663 (that is not merged
yet) there was no support for wasb(s) with Fsspec.

See the corresponding issue for more details:
#2112

# Are these changes tested?

Tests are added under tests/io/test_pyarrow.py.

# Are there any user-facing changes?

There are no API breaking changes. Direct impact of the PR: Pyarrow
FileIO in Pyiceberg supports Azure cloud environment. Examples of impact
for final users:
- Pyiceberg is usable in services with Managed Identities auth strategy.
 - Pyiceberg is usable with wasb(s) schemes in Azure.

<!-- In the case of user-facing changes, please add the changelog label.
-->

---------

Co-authored-by: Kevin Liu <[email protected]>
Co-authored-by: Kevin Liu <[email protected]>
amitgilad3 pushed a commit to amitgilad3/iceberg-python that referenced this pull request Jul 7, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

Starting from version 20, PyArrow supports ADLS filesystem. This PR adds
Pyarrow Azure support to Pyiceberg.

PyArrow is the [default
IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369)
for Pyiceberg catalogs. In Azure environment it handles wider spectrum
of auth strategies then Fsspec, including, for instance, [Managed
Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview).
Also, prior to this PR
apache#1663 (that is not merged
yet) there was no support for wasb(s) with Fsspec.

See the corresponding issue for more details:
apache#2112

# Are these changes tested?

Tests are added under tests/io/test_pyarrow.py.

# Are there any user-facing changes?

There are no API breaking changes. Direct impact of the PR: Pyarrow
FileIO in Pyiceberg supports Azure cloud environment. Examples of impact
for final users:
- Pyiceberg is usable in services with Managed Identities auth strategy.
 - Pyiceberg is usable with wasb(s) schemes in Azure.

<!-- In the case of user-facing changes, please add the changelog label.
-->

---------

Co-authored-by: Kevin Liu <[email protected]>
Co-authored-by: Kevin Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants