Categoricals: correctly handle missing values, speed up simple string comparisons #46
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds two new features:
XXX.isna()
whenXXX
is a categorical variable. Previously, the missing value code of-1
was mapped to the last categorical value rather than correctly indicating a missing value. Theisna
code has been patched to get the correct result in these cases.XXX
is a categorical variable with categories['Bus', 'Car', 'Walk']
, then compilingXXX == 'Car'
will be converted essentially toXXX.cat.codes == 1
as'Car'
is category index 1. CompilingXXX == 'Bike'
, on the other hand, will remain unchanged, and now emit a warning as this expression can never be true.