Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Union dtype support #10827

Open
mkarbo opened this issue Aug 31, 2023 · 4 comments
Open

Feature request: Union dtype support #10827

mkarbo opened this issue Aug 31, 2023 · 4 comments
Labels
A-dtype Area: data types in general enhancement New feature or an improvement of an existing feature

Comments

@mkarbo
Copy link

mkarbo commented Aug 31, 2023

Problem description

Hi, I looked through the docs and code following an error we had trying to serialize/deserialize union-typed data with polars.

From the data types documentation, it seems that the arrow union type is not implemented/supported (https://arrow.apache.org/docs/format/Columnar.html#union-layout, https://arrow.apache.org/docs/python/generated/pyarrow.UnionType.html)

This is a core data type, so I guess it makes sense to request it / support it. I didn't find any other issues mentioning this.

@mkarbo mkarbo added the enhancement New feature or an improvement of an existing feature label Aug 31, 2023
@orlp
Copy link
Collaborator

orlp commented Aug 31, 2023

As a data analysis library I think having native union types would result in a lot of very annoying corner-cases for... pretty much any feature ever. Almost every part of the code base would have to accommodate them in some fashion and I personally am not convinced that this is a good idea. Could you give an example of how you would want to use them?

@mkarbo
Copy link
Author

mkarbo commented Aug 31, 2023

Classic examples would be analysis or ETL.

It's not always that you control the upstream generators of data.

For instance someone might be lucky enough to interact with an API that implements union-types, or need to consume a file created by another system, e.g., duckdb (https://duckdb.org/docs/sql/data_types/union.html).

Parquet also has an open RFC for supporting this (https://issues.apache.org/jira/browse/PARQUET-756, apache/parquet-format#44) though it seems somewhat stale.

It might not be feasible in polars, I don't have the depth in the rust implementation to have a take on this.

@orlp
Copy link
Collaborator

orlp commented Aug 31, 2023

I would not be opposed to adding some method of ingesting data that has union types, allowing some choice to be made regarding how to convert it to a Polars type. I'm mainly hesitant in having union types inside Polars itself.

@mkarbo
Copy link
Author

mkarbo commented Aug 31, 2023

I don't have strong opinions on the level of support, but for context our main issue was not analysis / modification of data, but during IO where we had issues due to lack of support of union types (read & write parquet/other).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype Area: data types in general enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants