Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: local c2st metric #1109

Merged
merged 32 commits into from
May 17, 2024
Merged

feat: local c2st metric #1109

merged 32 commits into from
May 17, 2024

Conversation

JuliaLinhart
Copy link
Contributor

@JuliaLinhart JuliaLinhart commented Mar 22, 2024

What does this implement/fix? Explain your changes

L-C2ST(-NF) diagnostic: class, tutorial and tests.

Does this close any currently open issues?

Fixes #1005

Any relevant code examples, logs, error output, etc?

...

Any other comments?

...

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. We're here to
help! This is simply a reminder of what we are going to look for before merging
your code.

  • [x ] I have read and understood the contribution
    guidelines
  • [x ] I agree with re-licensing my contribution from AGPLv3 to Apache-2.0.
  • [ x] I have commented my code, particularly in hard-to-understand areas
  • [x ] I have added tests that prove my fix is effective or that my feature works
  • [ x] I have reported how long the new tests run and potentially marked them
    with pytest.mark.slow.
  • [ x] New and existing unit tests pass locally with my changes
  • [x ] I performed linting and formatting as described in the contribution
    guidelines
  • I rebased on main (or there are no conflicts with main)

sbi/analysis/plot.py Outdated Show resolved Hide resolved
sbi/analysis/plot.py Outdated Show resolved Hide resolved
sbi/analysis/plot.py Outdated Show resolved Hide resolved
sbi/analysis/plot.py Outdated Show resolved Hide resolved
sbi/analysis/plot.py Outdated Show resolved Hide resolved
sbi/simulators/gaussian_mixture.py Show resolved Hide resolved
tests/lc2st_test.py Outdated Show resolved Hide resolved
tests/lc2st_test.py Outdated Show resolved Hide resolved
tests/lc2st_test.py Outdated Show resolved Hide resolved
tests/lc2st_test.py Outdated Show resolved Hide resolved
@JuliaLinhart
Copy link
Contributor Author

Thanks for this review! I'll do the changes, review the doc and fix typing issues.

@JuliaLinhart
Copy link
Contributor Author

JuliaLinhart commented Mar 23, 2024

Almost all the suggestions by @agramfort have been addressed. Except:

  • pandas is still used for the .groupby() method in the marginal_plot_with_proba_intensity function from sbi.analysis.plot.py.
  • the added simulator has no corresponding pytest script.

Future additional features could include:

  • function that regroups the marginal_plot_with_proba_intensity into a single pairplot.
  • generic HypothesisTest class to centralize sbc, lc2st and other diagnostics relying on hypothesis testing.

@janfb janfb mentioned this pull request Mar 25, 2024
Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks a lot for adding this!
Looks great already, I added a couple of comments and questions. I think this needs some renaming here and there to match our variable names conventions and PEP 8.

sbi/analysis/plot.py Outdated Show resolved Hide resolved
sbi/analysis/plot.py Outdated Show resolved Hide resolved
sbi/analysis/plot.py Outdated Show resolved Hide resolved
sbi/analysis/plot.py Outdated Show resolved Hide resolved
sbi/analysis/test_utils.py Outdated Show resolved Hide resolved
sbi/diagnostics/lc2st.py Outdated Show resolved Hide resolved
sbi/diagnostics/lc2st.py Outdated Show resolved Hide resolved
sbi/diagnostics/lc2st.py Outdated Show resolved Hide resolved
sbi/simulators/gaussian_mixture.py Show resolved Hide resolved
tests/lc2st_test.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Apr 8, 2024

Codecov Report

Attention: Patch coverage is 65.87537% with 115 lines in your changes are missing coverage. Please review.

Project coverage is 83.04%. Comparing base (9a8c7c0) to head (7e0bc11).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1109      +/-   ##
==========================================
- Coverage   83.93%   83.04%   -0.90%     
==========================================
  Files          90       92       +2     
  Lines        6930     7272     +342     
==========================================
+ Hits         5817     6039     +222     
- Misses       1113     1233     +120     
Flag Coverage Δ
unittests 83.04% <65.87%> (-0.90%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
sbi/analysis/__init__.py 100.00% <ø> (ø)
sbi/diagnostics/lc2st.py 96.03% <96.03%> (ø)
sbi/utils/analysis_utils.py 52.94% <20.00%> (-47.06%) ⬇️
sbi/simulators/gaussian_mixture.py 44.44% <44.44%> (ø)
sbi/analysis/plot.py 61.32% <10.12%> (-5.92%) ⬇️

... and 1 file with indirect coverage changes

@JuliaLinhart
Copy link
Contributor Author

I think I have addressed all your comments and requests @janfb, except the one where I should get rid of the groupby method from pandas. I will try to fix this as soon as I can.

@JuliaLinhart
Copy link
Contributor Author

All done @janfb

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

I added a couple of additional comments, but overall I think we are close to convergence 🙂

I had a look at the tutorial as well - I reads very well! I think it's a great to have a comprehensive introduction to the method so that users know how to use it and how to interpret the results. Just one comment: At the moment you are generating the different plots but you are not explaining them. I think it would be essential to add an explaination and interpretation to each diagnostic plot.

Thanks for the effort!

sbi/analysis/test_utils.py Outdated Show resolved Hide resolved
sbi/diagnostics/lc2st.py Outdated Show resolved Hide resolved
sbi/diagnostics/lc2st.py Outdated Show resolved Hide resolved
sbi/simulators/gaussian_mixture.py Show resolved Hide resolved
tests/lc2st_test.py Outdated Show resolved Hide resolved
tutorials/17_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
@psteinb
Copy link
Contributor

psteinb commented Apr 15, 2024

I am happy to help review this PR. But given the activity that is already visible, I'd push this effort to a later stage. Feel free to ping me if my help is needed.

@JuliaLinhart
Copy link
Contributor Author

JuliaLinhart commented Apr 22, 2024

Response to review from @janfb: the above commit fixes following requests

  • rename tutoral to 18_..., plots and results description
  • change content of anamysis/test_utils.py to sbi/utils/analysis_utils.py
  • description of p-value computation in lc2st.py
  • explicit name and description of the tests in lc2st_tests.py and adapt code to test the true postitive and negative rates of the hypothesis test. Runtime is longer for 100 test runs (7min), but otherwise the "rate" is not a trusworthy empirical result in my opinion)

Only remaing question: do you want to make the theta_o generation in the LC2ST_NF test the user's responsibility?

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates - looks good!

One more thing, the theta_o generation in the LC2ST_NF needs clarification.

Also, 8 tests are failing at the moment.

sbi/diagnostics/lc2st.py Outdated Show resolved Hide resolved
tests/lc2st_test.py Outdated Show resolved Hide resolved
@janfb
Copy link
Contributor

janfb commented Apr 22, 2024

I am happy to help review this PR. But given the activity that is already visible, I'd push this effort to a later stage. Feel free to ping me if my help is needed.

That's great @psteinb ! Do you have capacity to review the tutorial? That'd be great 🙏

@JuliaLinhart
Copy link
Contributor Author

JuliaLinhart commented Apr 22, 2024

Oh i don't know why the tests fail... they pass when I run them locally!
Any idea why that would be the case?

@JuliaLinhart
Copy link
Contributor Author

JuliaLinhart commented Apr 23, 2024

I think the npe.sample method (where npe is a DensityEstimator object) is different for the tests vs. on the branch I am working on. It seems to be the handling of the context variable, but I am not sure because I can't verify anything locally

@michaeldeistler
Copy link
Contributor

michaeldeistler commented Apr 23, 2024

just rebase on main

git checkout main
git pull
git checkout 1005-implement-l-c2st-metric
git rebase main
git push -f

@JuliaLinhart JuliaLinhart force-pushed the 1005-implement-l-c2st-metric branch from 6e7f3f7 to 32b375a Compare April 23, 2024 09:52
@JuliaLinhart
Copy link
Contributor Author

Oh right.. Sorry! So I fixed it, but had to reshape a lot...

@JuliaLinhart
Copy link
Contributor Author

JuliaLinhart commented Apr 29, 2024

I did some experiments and I was wondering how you choose the parameters for theMLPClassifier from sklearn corresponding to the default classifier=mlp in c2st and the LC2ST class. Especially early_stopping seems to be a limitation in some application examples... (and also a little the regularizarion parameter alpha)

@psteinb you worked on that right?

For me the MLPClassifier from sklearn with default parameters with alpha=0 and max_iter=25000 yields pretty stable results, but is prone to overfitting. I therefore suggest that if we stick with your mlp, to do ensembling with different seeds (over 5 models by default). It yields more stable results with smaller confidence regions, but is slower (and the small confidence regions can lead to high rejection rates). This is my last commit and I added tests.

Let me know what you think :)

Copy link
Contributor

@psteinb psteinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome PR. Thanks @JuliaLinhart and all reviewers so far. I felt very humble reviewing it as a lot of dedication, discipline and rigor went into it.

I focused on reviewing the tutorial. Note, I couldn't directly suggest edits to the notebook towards the latter quarter of the notebook, for some reason the github webpage always wanted to remove images whenever I made a code suggestion.

sbi/analysis/plot.py Show resolved Hide resolved
tutorials/18_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
tutorials/18_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
tutorials/18_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
tutorials/18_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
tutorials/18_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
tutorials/18_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
tutorials/18_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
tutorials/18_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
tutorials/18_diagnostics_lc2st.ipynb Outdated Show resolved Hide resolved
@JuliaLinhart
Copy link
Contributor Author

There you go @psteinb :) Thanks a lot for your review!

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for looking into the classifier performance and variance!

There is one open question from my previous review and one question regarding the ensemble training.

Thanks! 🙏

sbi/diagnostics/lc2st.py Outdated Show resolved Hide resolved
sbi/diagnostics/lc2st.py Outdated Show resolved Hide resolved
@psteinb psteinb self-requested a review May 13, 2024 11:11
Copy link
Contributor

@psteinb psteinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tutorial looks good to me!

@JuliaLinhart
Copy link
Contributor Author

Hello everyone. I propose in this last commit a solution to the ensembling / cross-val issue stated above.

  • I chose to create a EnsembleClassifier class whose predicted probabilities are the average prediction over all classifiers (that differ only by their random_state). Cross-val is just a training strategy that can also be performed on an ensemble classifier.

Here's something to think about: The cross-val scores are the test statistics obtained for each fold. If someone wishes to perform a test, i.e. compute p-values, with the cross-val strategy, the test statistic becomes the average statistic over all folds. This is different from ensembling, where the test statistic is computed on the average prediction.

  • I also added a section ## Classifier choice and calibration data size: how to ensure meaningful test results in the tutorial if you want to check it out @psteinb .

Finally, I added a small description for the LC2ST_NF to be more explicit on what the theta_o are. @janfb let me know if that answers your questions.

@JuliaLinhart JuliaLinhart requested a review from janfb May 16, 2024 13:23
Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JuliaLinhart for these additional changes and for the commitment during this long PR! 👏
It all looks good to me now!

Thanks also @psteinb for your review on this.

Will be merged soon 🎉

@janfb janfb self-assigned this May 16, 2024
@janfb janfb added the enhancement New feature or request label May 16, 2024
@janfb janfb changed the title 1005 implement l c2st metric feat #1005: local c2st metric May 16, 2024
@janfb janfb changed the title feat #1005: local c2st metric feat: local c2st metric May 16, 2024
@JuliaLinhart
Copy link
Contributor Author

Thank you for your valuable comments and reviews !!

@janfb janfb merged commit 3c1e725 into main May 17, 2024
7 checks passed
@janfb janfb deleted the 1005-implement-l-c2st-metric branch May 17, 2024 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement l-c2st (local validation without reference samples)
5 participants