Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve input data types #149

Closed
wants to merge 4 commits into from

Conversation

hacktuarial
Copy link
Contributor

As discussed in #138, when using DataFrameMapper with default=None, the current behavior is to create a np.array with the unselected columns. This has the undesired side effect of casting them to a common data type. This PR preserves the data types of unselected columns when default=None, input_df=True, output_df=True.

@hacktuarial
Copy link
Contributor Author

hacktuarial commented Apr 16, 2018

Tests failed on README.rst, line 187 for 2 reasons

  1. Columns are out of order. I will fix this
  2. Data types have changed; when df_out=True, it seems unnecessary to promote int values to float since a pd.DataFrame can accommodate heterogeneous types.

@hacktuarial
Copy link
Contributor Author

The only failing check now is number 2. With permission, I would like to edit the test so that the output columns of LabelBinarizer() are of type int, not float.

Copy link
Collaborator

@dukebody dukebody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! I was going to comment requesting changes in your PR, but I did them myself in https://github.com/scikit-learn-contrib/sklearn-pandas/pull/153/files. If everything looks good I'm gonna merge that one.

@@ -504,3 +509,4 @@ Other contributors:
* Ritesh Agrawal (@ragrawal)
* Vitaley Zaretskey (@vzaretsk)
* Zac Stewart (@zacstewart)
* Timothy Sweetser (@hacktuarial)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please preserve the alphabetic order. :)

@hacktuarial
Copy link
Contributor Author

Looks great! Thanks for reviewing.

@hacktuarial hacktuarial closed this May 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants