-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose parameters from transformers as parameters of the mapper #159
Comments
@gwerbin That would be helpful to make transformers from the mapper "grid-searchable". Just need to be sure that these "deep" parameters are appropriately assigned to the nested classes with the |
@devforfu One thing I just thought of is how to handle mappers like this: pipeline = Pipeline([
('vectorizer',
DataFrameMapper([
('document_contents', [TextCleaner(), CountVectorizer()])
], df_out=False)),
('classifier', MultinomialNB())
]) (I made up the What would the steps names be in this case? Maybe something like
|
@gwerbin I guess it could be a name of class as well. As I can recall,
|
@gwerbin Thanks for your contribution! It would be certainly a very interesting feature, since currently it is impossible to adjust the internal parameters of the dataframe mapper in the pipeline in any optimization. Would you be willing to implement such a feature? Ideally it should be as similar in interface to sklearn as possible, to be compatible with sklearn's grid or randomized searches. |
@dukebody I'm willing, but can't make any guarantees on a timeline. I've been pretty busy lately and don't want to commit to anything I can't deliver. I would also need time to familiarize myself with how parameters are passed in the current code. If anyone else wants to pick this up, I won't be offended. |
@gwerbin If nobody else had started working on this PR, I could make a try to come up with some basic solution. Of course, we can unite our efforts as soon as you become more available. |
Ok, I've started work on the proposed feature in my fork. There is a couple of new tests as well. Probably some code required to implement Testing getters/setters now, next going to check if methods are compliant with |
Marking as "good first issue" to review the PR you created @devforfu |
Currently, it can be hard to use a "parametric" transformer in a DataFrameMapper because the parameters of the underlying transformers aren't exposed. This means you can't adjust the parameters of one of those transformers using GridSearchCV or RandomizedSearchCV.
Example:
These are the params I get:
Naively, I would expect something like this
which would be very handy for, say, using GridSearchCV to compare word and character analyzers.
This seems like it shouldn't be too hard to implement. If there's interest I can start digging around the codebase to try to spend some time on it.
The text was updated successfully, but these errors were encountered: