-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scorers might need to know about training and testing data #3
Comments
Thanks for the note!
LOO is really a bad cross-validation strategy [*]. I wonder if we should
base our design for it to work, or just push even more for people not to
use it.
[*] I had an insight yesterday on a simple reason why: the measurement
error of the score on the test set goes as sqrt(n_test), as any unbiased
statistic. sqrt climbs very fast in the beginning. In this part of the
regime, you are better off depleting the train set to benefit from the
steep rise.
|
Is LOO more acceptable when used like some_score(cross_val_predict(X, y, On 23 August 2016 at 16:11, Gael Varoquaux [email protected] wrote:
|
Is LOO more acceptable when used like some_score(cross_val_predict(X, y, cv=LOO()), y)?
No. I believe that that's actually wrong. You are no longer computing the
expectancy of the error of the predictive model.
One way of convincing you that you are not computing the same thing is to
think of the correlation score: it's quite clear that it can be very
different between the two approaches.
To convince you that it's the "wrong" thing, I think that the right
though to have in mind is that the cross-val score is the expectation on
the test data, of the prediction error of the model (formula 1 in
http://arxiv.org/pdf/1606.05201.pdf), it's actual a double expectation: if
l_M, is the expectation of the error of the model, the score is E[l_M]
where the expectation is taken on the data to train the model.
http://projecteuclid.org/download/pdfview_1/euclid.ssu/1268143839 has a
good analysis of this, including the classic split of l_M in
approximation error and estimation error.
Using score(cross_val_predict) is not computing that. It's computing the
expectation of l_M jointly on the train and test data. Given that the 2
are not independent, it's not the same thing as the successive
expectation.
Actually, now that I realize it, "cross_val_predict" is probably used
massively to compute things that shouldn't be computed.
|
Thanks for the response. Yes, the case of correlation (or ROC, or anything where output over samples is compared) is convincing, but not immediately convincing that this issue extends to sample-wise measures. I'm a bit weak on this theory, but I think I get the picture. I hope I find time to read Arlot and Celisse to solidify it. And while the proposed intension of |
So the thing is that R^2, our default regression metric, is not a sample Sent from phone. Please excuse spelling and brevity. On Aug 23, 2016 02:55, "Joel Nothman" [email protected] wrote:
|
This is not a PR because I didn't write this yet. It's more a very loose RFC.
I think scorers might need to be able to distinguish between training and test data.
I think there were more cases but there are two obvious ones:
the R^2 is currently computed using the test-set mean. That seems really odd, and breaks for LOO.
When doing cross-validation, the classes that are present can change, which can impact things like macro-f1 in weird ways, and can also lead to errors in LOO (scikit-learn/scikit-learn#4546)
I'm not sure if this is a good enough case yet, but I wanted somewhere to take a note ;)
The text was updated successfully, but these errors were encountered: