Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scorers might need to know about training and testing data #3

Open
amueller opened this issue Aug 22, 2016 · 5 comments
Open

Scorers might need to know about training and testing data #3

amueller opened this issue Aug 22, 2016 · 5 comments

Comments

@amueller
Copy link
Member

This is not a PR because I didn't write this yet. It's more a very loose RFC.

I think scorers might need to be able to distinguish between training and test data.
I think there were more cases but there are two obvious ones:
the R^2 is currently computed using the test-set mean. That seems really odd, and breaks for LOO.
When doing cross-validation, the classes that are present can change, which can impact things like macro-f1 in weird ways, and can also lead to errors in LOO (scikit-learn/scikit-learn#4546)

I'm not sure if this is a good enough case yet, but I wanted somewhere to take a note ;)

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Aug 23, 2016 via email

@jnothman
Copy link
Member

Is LOO more acceptable when used like some_score(cross_val_predict(X, y,
cv=LOO()), y)?

On 23 August 2016 at 16:11, Gael Varoquaux [email protected] wrote:

Thanks for the note!

LOO is really a bad cross-validation strategy [*]. I wonder if we should
base our design for it to work, or just push even more for people not to
use it.

[*] I had an insight yesterday on a simple reason why: the measurement
error of the score on the test set goes as sqrt(n_test), as any unbiased
statistic. sqrt climbs very fast in the beginning. In this part of the
regime, you are better off depleting the train set to benefit from the
steep rise.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz676LUKPMAknFcLJNcoMKV8ZG160Yks5qio8ZgaJpZM4JqWap
.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Aug 23, 2016 via email

@jnothman
Copy link
Member

Thanks for the response.

Yes, the case of correlation (or ROC, or anything where output over samples is compared) is convincing, but not immediately convincing that this issue extends to sample-wise measures.

I'm a bit weak on this theory, but I think I get the picture. I hope I find time to read Arlot and Celisse to solidify it.

And while the proposed intension of cross_val_predict was visualisation, you're probably right that it's licensing some invalid conclusions. :/

@amueller
Copy link
Member Author

So the thing is that R^2, our default regression metric, is not a sample
wise measurement.
Also, for ROC curves (and AUC and average precision) there is an issue with
interpolation, which should be done using the training set or a validation
set.
Actually I'm currently not sure what the right way to compute AUC is.

Sent from phone. Please excuse spelling and brevity.

On Aug 23, 2016 02:55, "Joel Nothman" [email protected] wrote:

Thanks for the response.

Yes, the case of correlation (or ROC, or anything where output over
samples is compared) is convincing, but not immediately convincing that
this issue extends to sample-wise measures.

I'm a bit weak on this theory, but I think I get the picture. I hope I
find time to read Arlot and Celisse to solidify it.

And while the proposed intension of cross_val_predict was visualisation,
you're probably right that it's licensing some invalid conclusions. :/


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#3 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAbcFklFbA_rosqGw6iLhx_k5v1xuEw8ks5qiplsgaJpZM4JqWap
.

jnothman pushed a commit to jnothman/enhancement_proposals that referenced this issue Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants