Skip to content

Categorical DataType #1440

Discussion options

You must be logged in to vote

What are you trying to achieve by turning a string column to a category? This is supported in vaex but it might have a different meaning compared to pandas.

Since vaex works fully out of core, if your data is on disk (hdf5, arrow), you don't have to worry about memory issues, which in large part is the reason for using categories in pandas.

In vaex, turning strings into categories can speed up certain operations (like groupby, binby etc.). In any case, it is done like this:

df = df.ordinal_encode('my_column')

This turns (encodes) that column as ints, so operations are faster. You can see the mapping dictionary in df._categories.

Also for now, printing out the df will give you the encoded…

Replies: 2 comments 5 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
5 replies
@dharik-arsath
Comment options

@JovanVeljanoski
Comment options

@dharik-arsath
Comment options

@JovanVeljanoski
Comment options

@dharik-arsath
Comment options

Answer selected by dharik-arsath
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants