Mini sbibm #1335

manuelgloeckler · 2024-12-18T17:30:06Z

What does this implement/fix? Explain your changes

This is a draft for some "benchmarking" capabilities integrated into sbi.

With pytest, we can roughly check that everything works by passing all tests. Some tests will ensure that the overall methodology works "sufficiently" well on simplified Gaussian analytic examples. Certain changes might still pass all tests but, in the end, negatively impact the performance/accuracy.

Specifically, when implementing new methods or, e.g., changing default parameters, it is important to check that what was implemented not just only passes the tests but that it works sufficiently well.

Does this close any currently open issues?

Prototype for #1325

Any relevant code examples, logs, error output, etc?

So it now should work that one simply has to use:

pytest --bm

Which is a custom tag that will disable testing and instead switch to a "benchmark" mode, which will only run tests that are marked as such and will always pass. Instead, these tests cache a metric on how well an implemented method solved a specific task (currently some examples in "bm_test.py").

Once it finishes, instead of passed/failed, it will return a table with the metric (we still can kinda color some methods that are worse than expected).

Any other comments?

What tasks to incldue - clearly they must be somewhat "fast" to solve.
What methods to incldue (i.e. just standard methods with default parameters or more)

manuelgloeckler · 2024-12-19T11:28:37Z

Alright, on the current examples, the output looks like this:

Runtime linearly increases with a number of train simulations (currently 2k ~ 10 min on my laptop, with 1k, it was like 5 min). It would maybe also be nice to print runtimes on the right.

Overall runtime, of course, also depends on how many different methods should be included. I think some limited control over what is run would be nice i.e

pytest --bm # All base inference classes on defaults (similar to current behavior)
pytest --bm=NPE   # NPE with e.g. different density estimators
pytest --bm=SNPE # SNPE_ABC  2 round test
...

Either way, there needs to be a limit on what is run, and every configuration should finish in a reasonable amount of time.

manuelgloeckler · 2025-01-13T13:23:02Z

Alright, it is kinda now ready for review. The overall "framework" is done, one from now go into a few directions, i.e., depending on what scope we want this to have. Current interface is

pytest --bm # Runs all major methods on default
pytest --bm --bm-mode=npe # Runs all major npe methods with e.g. different density estimators ...
pytest --bm --bm-mode=nre # Runs all major nre methods with different classfiers ...
pytest --bm --bm-mode=snpe # Runs all sequential NPE methods 2 rounds
pytest --bm --bm-mode=snle/snre # As above
pytest --bm --bm-mode=fmpe # Runs fmpe with different nets
pytest --bm --bm-mode=npse # Runs NPSE with different nets and others.

Not sure how much "configurability" we want this to have.

Tests cannot fail based on performance currently, but you will get a report which shows relative performance of each method on each task i.e. like this:

janfb

Overall, this is really great to have - thanks a lot for pushing this!
Love the relative coloring of the results 🎉

Added a couple of comments and questions for clarification.

tests/bm_test.py

tests/conftest.py

tests/mini_sbibm/__init__.py

tests/mini_sbibm/base_task.py

codecov · 2025-01-20T10:26:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.28%. Comparing base (43ae353) to head (a9978a5).
Report is 1 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (43ae353) and HEAD (a9978a5). Click for more details.

HEAD has 3 uploads less than BASE

Flag BASE (43ae353) HEAD (a9978a5)

unittests 4 1

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1335       +/-   ##
===========================================
- Coverage   89.38%   78.28%   -11.11%     
===========================================
  Files         119      119               
  Lines        8905     8905               
===========================================
- Hits         7960     6971      -989     
- Misses        945     1934      +989

Flag	Coverage Δ
unittests	`78.28% <ø> (-11.11%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 35 files with indirect coverage changes

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

janfb

Thanks for the edits!

All looks very good. I just have a couple of suggestions for renaming and removing comments.

The samples files are small, so no need to save them via git-lfs?

tests/bm_test.py

tests/conftest.py

tests/mini_sbibm/__init__.py

Co-authored-by: Jan <[email protected]>

manuelgloeckler · 2025-02-20T12:54:10Z

Thanks for the review.

In total, the sample files are 2.78 MB, so small but not super small (we could reduce this by saving only 1000 out of the 10_000 posterior samples (we anyway only use 1k for evaluation currently).

janfb · 2025-02-25T13:38:00Z

Thanks for the review.

In total, the sample files are 2.78 MB, so small but not super small (we could reduce this by saving only 1000 out of the 10_000 posterior samples (we anyway only use 1k for evaluation currently).

Yes, this will live in tests/ anyways, right? So it will not be distributed during packaging.
We can still reduce the samples to 1000 if we are evaluating it uses 1000 samples anyways.

janfb

This looks good now, thanks a ton! 🙏

What we would need is one place with a bit of documentation on how to use this. It would be for developers, so no need to make a tutorial. But maybe add a paragraph about this to contribute.md, e.g., where we write about the tests?

manuelgloeckler · 2025-02-27T08:14:29Z

Yeah, the contribute.md is a good place to do this. And yeah, should not interfere with packaging just adds a bit to the git.

…human-readable

manuelgloeckler · 2025-03-11T10:56:25Z

This should now be ready to be merged. The last version, "xdist," support, did unfortunately not print out the results to the console and saved them in pickle files.

The new version improved the xdist support:

Saves now all results in a human-readable .csv. If running xdist, the results of each worker are collected in this .csv.
On Linux, the printing to terminal now works reliably with and without xdist.
Saving results did also simplify the internal coda a bit and does not relay on ad-hoc global variables anymore.

janfb

Thanks again for pushing this. This is great!

docs/docs/contribute.md

Co-authored-by: Jan <[email protected]>

manuelgloeckler added 3 commits December 16, 2024 12:05

Start of mini_sbibm

65bd398

Working prototype

621a875

Rename to get globally discovered

52acf9c

manuelgloeckler marked this pull request as draft December 18, 2024 17:30

manuelgloeckler added 3 commits December 18, 2024 18:32

ruff

a4f060d

extended to something reasonable

6fa9b96

remove example

b50d09b

manuelgloeckler added 5 commits January 13, 2025 10:59

different modes

b7abb2d

Added docs and formating

7e9987c

colored numbers

5df2517

formating and stuff

8f2c48f

formating?

e41a771

manuelgloeckler self-assigned this Jan 13, 2025

manuelgloeckler marked this pull request as ready for review January 13, 2025 13:23

manuelgloeckler requested a review from janfb January 13, 2025 13:24

janfb reviewed Jan 14, 2025

View reviewed changes

manuelgloeckler added 3 commits January 20, 2025 10:26

Fixing suggestions and adding dependency

45b8fcf

Using taskt dict

922ad2a

licence note

8ff30c1

janfb mentioned this pull request Jan 23, 2025

FMPE not working even on simple tasks? #1374

Open

manuelgloeckler requested a review from janfb January 31, 2025 15:55

janfb reviewed Feb 8, 2025

View reviewed changes

manuelgloeckler and others added 4 commits February 20, 2025 12:10

Update tests/bm_test.py

8d69e9c

Co-authored-by: Jan <[email protected]>

num_eval_obs fix, remove xdist comment

777780d

Clarifing the fixture

bec8a37

remove comment

1a096fc

janfb approved these changes Feb 25, 2025

View reviewed changes

manuelgloeckler added 7 commits February 27, 2025 09:18

Adding xdist cache to gitignore

329fa06

Merge branch 'main' into mini_sbibm

96b71e6

load with weights only False to have pytorch 2.6 compatibility

c5bffae

torch 2.6 compatibility

b7e5a34

Small user guide to it

461e4c0

Updated with working xdist support now. Simplified API. Saves result …

6e9a9dc

…human-readable

formating

5aa5065

janfb approved these changes Mar 11, 2025

View reviewed changes

docs/docs/contribute.md Outdated Show resolved Hide resolved

manuelgloeckler and others added 3 commits March 11, 2025 13:30

Update docs/docs/contribute.md

928932b

Co-authored-by: Jan <[email protected]>

Merge branch 'main' into mini_sbibm

ae11598

Fix negative interaction with normal pytest xdist

a9978a5

manuelgloeckler merged commit 0ea6f63 into main Mar 12, 2025
3 checks passed

janfb mentioned this pull request Mar 13, 2025

Mini-SBIBM #1325

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mini sbibm #1335

Mini sbibm #1335

manuelgloeckler commented Dec 18, 2024

manuelgloeckler commented Dec 19, 2024

manuelgloeckler commented Jan 13, 2025

janfb left a comment

codecov bot commented Jan 20, 2025 •

edited

Loading

janfb left a comment

manuelgloeckler commented Feb 20, 2025

janfb commented Feb 25, 2025

janfb left a comment

manuelgloeckler commented Feb 27, 2025

manuelgloeckler commented Mar 11, 2025

janfb left a comment

Mini sbibm #1335

Mini sbibm #1335

Conversation

manuelgloeckler commented Dec 18, 2024

What does this implement/fix? Explain your changes

Does this close any currently open issues?

Any relevant code examples, logs, error output, etc?

Any other comments?

manuelgloeckler commented Dec 19, 2024

manuelgloeckler commented Jan 13, 2025

janfb left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 20, 2025 • edited Loading

Codecov Report

janfb left a comment

Choose a reason for hiding this comment

manuelgloeckler commented Feb 20, 2025

janfb commented Feb 25, 2025

janfb left a comment

Choose a reason for hiding this comment

manuelgloeckler commented Feb 27, 2025

manuelgloeckler commented Mar 11, 2025

janfb left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 20, 2025 •

edited

Loading