Three changes (one in _glm.py and two in _glm_cv.py) to enable more detailed analysis of CV performance #935
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
At the moment:
The deviances in deviance_path_ are too low. This is because w_test (the test weights) are not rescaled to 1.
The predict method of GeneralizedLinearRegressorCV does not work. This seems to be because it inherits from the glm, and in the linear_predictor it uses X @ self.coef_path_[alpha_index]. This is not correct when coef_path_ comes from CV since the first dimension is then the number of folds.
The CV method provides the deviance_path_ for (average) validation performance but not for train performance. Knowing how train and validation performance compare as penalization is reduced, is useful in practice.
To address some of the above, I have made, compiled and tested three changes
These 3 changes allow the user to create any predictions needed and to create both train and validation curves on the data used for CV. I have a notebook which does this using a version of glum which I have built.
I have not tried to fix the predict method for CV.
A gist which will run on the new build, and which demonstrates use of the changes is at:
https://gist.github.com/alanchalk/cbb68ff9741ec89504d6f21b4b1ff344
(The gist mentions that if you are on a mac you can pip install the revised build from test pypi. If you need windows or linux I can try to add.