Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] Expression.astype("uint") works for numpy but not arrow #2191

Open
NickCrews opened this issue Aug 30, 2022 · 3 comments
Open

Comments

@NickCrews
Copy link
Contributor

See added xfailing test: 538a5a6

@JovanVeljanoski
Copy link
Member

Actually I do not thing this is a bug..

Look in at the arrow documentation there is not such thing as uint.

So in your test, if you use .astype(uint64) for instance, things will work..

I guess we could make an alias for uint to be uint64 to account for this..
what do you think @maartenbreddels @NickCrews ?

@NickCrews
Copy link
Contributor Author

hmm, that makes sense why it doesn't work.

If we were starting from scratch, I might actually lean the opposite way: Make uint fail for BOTH numpy and arrow, and force users to be explicit with asking for uint64. But that would break people, so probably we can't change to that behavior now.

If vaex is trying to be a higher level abstraction that hides the differences between numpy and arrow (I think this would be a great goal, but IDK how attainable it actually is) then I would like the alias proposal. However, if there are other cases where I DO need to know which is the backend for my data (eg #2192), then I would prefer if vaex explicitly left things as is and didn't try to do something clever. So IDK, I think it depends on the larger goals.

I'm fine closing this as "not a bug" and just being more explicit in the docstring for astype().

@JovanVeljanoski
Copy link
Member

I think we generally agree.

I think the main idea (as much as we can make it) is that an average user should not care or even know whether the data lives in arrow or numpy underneath it all, as long as it is handled via vaex. When you get it out of vaex (like with .values or .to_numpy() for example, that's a different story.

And we do want most obvious things to work out of the box with safe general assumptions. I still think that many users are not so knowledgeable about (py)arrow yet.. so it is nice to have some higher abstraction.

I am curious to hear @maartenbreddels opinion on this , so let's keep this open for now, and thanks for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants