Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categoricals: correctly handle missing values, speed up simple string comparisons #46

Merged
merged 5 commits into from
Mar 21, 2024

Conversation

jpn--
Copy link
Member

@jpn-- jpn-- commented Mar 20, 2024

This PR adds two new features:

  • Fix bug in compiling XXX.isna() when XXX is a categorical variable. Previously, the missing value code of -1 was mapped to the last categorical value rather than correctly indicating a missing value. The isna code has been patched to get the correct result in these cases.
  • Simple string comparisons against categorical variable are now converted into integer comparisons. So, if XXX is a categorical variable with categories ['Bus', 'Car', 'Walk'], then compiling XXX == 'Car' will be converted essentially to XXX.cat.codes == 1 as 'Car' is category index 1. Compiling XXX == 'Bike', on the other hand, will remain unchanged, and now emit a warning as this expression can never be true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant