Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Importance sampling #47

Merged
merged 12 commits into from
Nov 24, 2023
Merged

[DRAFT] Importance sampling #47

merged 12 commits into from
Nov 24, 2023

Conversation

yallup
Copy link
Collaborator

@yallup yallup commented Oct 26, 2023

This PR implements importance sampling from a margarine flow. This is something like neural importance sampling only with nested sampling to do the information acquisition, neural importance nested sampling I guess?

Workflow as follows:

  1. run_pypolychord.py - A copy of the standard pypolychord script, 4D gaussian, restricted to a hypercube prior for now for simplicity.
  2. train_maf.py - Trains a margarine MAF on the polychord run.
  3. importance.py - Uses the trained MAF to importance sample the original likelihood again. I've added am integrate function to the margarine.marginal_stats.calculate class to do this:
IS integral: 0.064 +/- 0.000
IS efficiency: 0.901
NS integral: 0.063 +/- 0.011
NS efficiency: 0.004

This may have broader use (provided I've done this right) as an afterburner to improve nested sampling error estimate for moderate dimension problems. To be discussed, and tested to see if this actually works as well as I claim!

@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (5fcb811) 81.07% compared to head (60fe6b0) 82.35%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #47      +/-   ##
==========================================
+ Coverage   81.07%   82.35%   +1.27%     
==========================================
  Files           5        5              
  Lines         539      578      +39     
==========================================
+ Hits          437      476      +39     
  Misses        102      102              
Files Coverage Δ
margarine/marginal_stats.py 86.60% <96.15%> (+7.15%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@htjb
Copy link
Owner

htjb commented Oct 26, 2023

Hi @yallup, this looks very cool! We discussed a bit offline but to record some of my thoughts on this. My understanding is that you want to perform integration using importance sampling between the trained flow $\tilde{L}$ and the actual likelihood $L$ which amounts too (according to the code)

$Z = \int \frac{L(x)}{\tilde{L}(x)} \tilde{L}((x) dx = \mathcal{E} \bigg[ \frac{L(x)}{\tilde{L}((x)}\bigg] \approx \frac{1}{N} \sum_{i=1} \frac{L(x)}{\tilde{L}((x)}$

where $x \sim \tilde{L}(x)$. To me this looks a bit strange because there is no prior term and we discussed that there we might need to account for the prior offline. I think the prior would come in here

$Z = \int \frac{L(x)}{\tilde{L}(x)} \tilde{L}(x) \pi(x) dx \approx \frac{1}{N} \sum_{i=1} \frac{L(x)}{\tilde{L}(x)}\pi(x)$

but not 100% sure.

Am I correct in my interpretation of the efficiency as a measure of how accurate the flow is? We define the weights as $w = \frac{L(x)}{\tilde{L}((x)}$ and the efficiency as the ratio of $n_{eff}$ over the total number of samples. So if the flow was perfect then $n_{eff} = N$ and $\mathrm{eff} = N/N = 100%$.

It's not in any way a measure of how efficient the nested sampling run is? Which is what we get when we calculate $n_{eff}$ with the nested sampling weights and divide by the number of samples drawn. This last point is relevant to some other discussions we have been having offline where the sampling efficiency of a nested sampling run is a measure of how quickly it reaches the posterior bulk --> useful for working out whether you have made a good prior choice.

margarine/marginal_stats.py Outdated Show resolved Hide resolved
@htjb
Copy link
Owner

htjb commented Oct 30, 2023

Could we move run_polychord.py, train.py and importance.py into the tutorial notebook in the notebook/ folder please?

Ohh and write a test to check that the evidence recovered from importance.py is equivalent to the value from run_polychord.py? Thanks!

@yallup
Copy link
Collaborator Author

yallup commented Nov 1, 2023

Could we move run_polychord.py, train.py and importance.py into the tutorial notebook in the notebook/ folder please?

Ohh and write a test to check that the evidence recovered from importance.py is equivalent to the value from run_polychord.py? Thanks!

Test added and files removed. Wasn't sure how you wanted to approach documenting this in the tutorial so I just left that for now (probably to be added after we decide if this works)

@htjb
Copy link
Owner

htjb commented Nov 8, 2023

@yallup sure no worries we can add a tutorial later down the line. Can we bump the version number to 1.2.0 here? Needs doing in readme and setup.py.

I just changed the KL divergence function so that it returns a dictionary rather than a pandas table. Bit nicer to use and more consistent with the new importance sampling function you have added here.

@htjb
Copy link
Owner

htjb commented Nov 8, 2023

Ohh I think master branch might need merging here too...

@yallup yallup removed the request for review from williamjameshandley November 10, 2023 11:26
Copy link
Owner

@htjb htjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Go ahead and squash and merge when you get a chance! Thanks @yallup! 🚀

margarine/marginal_stats.py Outdated Show resolved Hide resolved
@yallup yallup merged commit dff0971 into htjb:master Nov 24, 2023
4 checks passed
@yallup yallup deleted the importance branch November 24, 2023 09:33
@htjb htjb mentioned this pull request Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants