Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importance sampling unexpected behaviour [and plot gallery broken] #132

Open
Stefan-Heimersheim opened this issue Jan 11, 2021 · 22 comments
Labels
bug Something isn't working
Milestone

Comments

@Stefan-Heimersheim
Copy link
Collaborator

Stefan-Heimersheim commented Jan 11, 2021

Describe the bug
NestedSamples.importance_sample changes weights in a weird way. E.g. importance sampling by a constant offset of logL_new=0.1 changes the weights which it should not (I think) a=chain.importance_sample(0.1, action="add"). This change of weights influences posteriors, mean, std etc. I think the easiest is to look at the weights.

Resampling with a negative offset also appears to shorten the chain, I would not expect that either.

I might just be misunderstanding the importance sampling procedure in Nested Sampling, but this is not at all what I expected from e.g. MCMC samples. Edit: My original goal was using importance sampling to understand which of my likelihoods produces which constraints on my parameters, this simple "add 0.1" example was just the minimal thing behaving weirdly.

To Reproduce

a=chain.importance_sample(0.1, action="add")
b=chain.importance_sample(-0.1, action="add")
plt.plot(chain.weights, label='original weights')
plt.plot(a.weights, label='+0.1 weights')
plt.plot(b.weights, label='-0.1 weights')
plt.legend()
plt.show()

Expected behavior
I would expect the weights to stay the same, except for re-normalization. Instead they change.

Screenshots
Here I just simply plot the weights.
Figure_1

Additional context
My samples have been generated with cobaya running pypolychord, then loading the "_polychord_raw" directory with anesthetic. I tried both the current git master branch and version python-anesthetic-git r319.2351380-2 from the AUR).

I tried to reproduce the issue with the sample file from the plot gallery but the code there does no longer work:

import requests
import tarfile

for filename in ["plikHM_TTTEEE_lowl_lowE_lensing.tar.gz","plikHM_TTTEEE_lowl_lowE_lensing_NS.tar.gz"]:
    github_url = "https://github.com/williamjameshandley/cosmo_example/raw/master/"
    url = github_url + filename
    open(filename, 'wb').write(requests.get(url).content)
    tarfile.open(filename).extractall()

from anesthetic import NestedSamples
nested_root = 'plikHM_TTTEEE_lowl_lowE_lensing_NS/NS_plikHM_TTTEEE_lowl_lowE_lensing'
nested = NestedSamples(root=nested_root)

returns RuntimeError: Not a valid nested sampling run. Require logL > logL_birth.

@lukashergt
Copy link
Collaborator

Hi Stefan,

the thing with nested sampling is that you need to watch out that the logL stay greater than the logL_birth. This corresponds to one nested sampling "step" needing to increase in likelihood (otherwise the step would be discarded). Thus, when you are subtracting from the logL values, quite a few might end up less than the corresponding logL_birth values and consequently need to be discarded. This effectively decreases the number of live points and thus the accuracy of your nested sampling run (reflected in larger errors on the evidence for example).

In any case, changing the logL will affect the weights, which are computed from logL and logL_birth, and which in the current version I believe are normalised such that the maximum weight is 1. Is that correct @williamjameshandley? So I am not surprised that the wights change. Keep in mind that the volume of the shell of the loglikelihood dlogX factors in as well for the weights (which is quite different from MCMC runs).

@williamjameshandley
Copy link
Collaborator

Hi Stefan,

You should be able to get the behavior you're looking for by loading samples as/turning samples into MCMCSamples and then reweighting. The critical difference here is that at the moment you are reweighting NestedSamples into another valid nested sampling run (with all the advantages e.g. can compute evidences and KL divergences). Suffice to say that this process is non-trivial and Lukas and I plan to write this up in the near future so that the theory behind this is made much clearer (rather than being hidden across several pull requests).

Best,
Will

@andrewfowlie
Copy link
Collaborator

Just to be sure we’re on the same page, in general, we expect this function to alter the weights. But in the specific case of adding a constant to the log-likelihood, we expect no change (up to normalisation). Yet your plot shows something else going on.

Indeed it might be the logL births subtlety @lukashergt mentioned. I don’t remember why logL_birth isn’t updated simultaneously as logL. Was that a choice? Or is it a bug? Why shouldn’t Stefan’s code do logL += 0.1 and logL_birth += 0.1? Leaving the weights unchanged (up to normalisation)

@Stefan-Heimersheim
Copy link
Collaborator Author

@lukashergt Thanks, that explains the cutoff. I'd still expect the weights to approximately stay the same though, at least for the +0.1 case. (But I might well be wrong, noticing I don't know about all the Nested Sampling details.)

@williamjameshandley Thanks! I don't need the evidences for this part of my analysis so converting to MCMCSamples works and then importance sampling gives the expected results.

@Stefan-Heimersheim
Copy link
Collaborator Author

Stefan-Heimersheim commented Jan 11, 2021

Actually, importance sampling the MCMC-converted chains correctly changes the logL but does not affect the weights at all, also if not reweighing by a trivial constant offset:

chain = MCMCSamples(orig_chain)
chain.tex = orig_chain.tex

fig, axes = chain.plot_1d("Omega_m", label='Original', color='blue')

logL_new = scipy.stats.norm.logpdf(chain.Omega_m, loc=0.1, scale=0.01)
reweighted = chain.importance_sample(logL_new, action='add')

reweighted.plot_1d(axes, label='Reweighted', color='red', linestyle='dashed')

if np.all(reweighted.weights == chain.weights):
    print("Weights equal")


plt.legend()
plt.show()

The weights are equal and posteriors identical, even though logL_new is not constant.
Figure_1

@lukashergt
Copy link
Collaborator

@Stefan-Heimersheim What exactly is the first plot showing? How did you generate it? Normally the samples should be ordered by weights if I am not mistaken...

@Stefan-Heimersheim
Copy link
Collaborator Author

@Stefan-Heimersheim What exactly is the first plot showing? How did you generate it? Normally the samples should be ordered by weights if I am not mistaken...

I just took them in the order chain.weights returned them, they seem to go up and then down ..

@lukashergt
Copy link
Collaborator

@Stefan-Heimersheim Importance sampling only works in the bulk posterior region of parameter space (that goes for both nested sampling and MCMC. With a Gaussian logL_new where the narrow peak is outside of the sampled region I am not surprised that it doesn't pick up on the new logL...

As an example I'd try maybe with a mean Omega_m=0.305 to see if the posterior gets shifted as expected.

@Stefan-Heimersheim
Copy link
Collaborator Author

Stefan-Heimersheim commented Jan 11, 2021

@Stefan-Heimersheim What exactly is the first plot showing? How did you generate it? Normally the samples should be ordered by weights if I am not mistaken...

I just took them in the order chain.weights returned them, they seem to go up and then down ..

Can you post the line that generates the plot? Still not sure what's on the x- and what on the y-axis....

The x axis is the index of the weight list, y axis is the weight value. Sorry I was lazy omitting the x axis argument in the original post, here is the equivalent line:

plt.plot(range(len(a.weights)), a.weights, label='+0.1 weights')

Edit: Original code

a=chain.importance_sample(0.1, action="add")
b=chain.importance_sample(-0.1, action="add")
plt.plot(chain.weights, label='original weights')
plt.plot(a.weights, label='+0.1 weights')
plt.plot(b.weights, label='-0.1 weights')
plt.legend()
plt.show()

@Stefan-Heimersheim
Copy link
Collaborator Author

@Stefan-Heimersheim Importance sampling only works in the bulk posterior region of parameter space (that goes for both nested sampling and MCMC. With a Gaussian logL_new where the narrow peak is outside of the sampled region I am not surprised that it doesn't pick up on the new logL...

As an example I'd try maybe with a mean Omega_m=0.305 to see if the posterior gets shifted as expected.

Same. (I would still expect a shift towards the left in the former case)

chain = MCMCSamples(orig_chain)
chain.tex = orig_chain.tex

fig, axes = chain.plot_1d("Omega_m", label='Original', color='blue'); #hist, kde, scatter

logL_new = sst.norm.logpdf(chain.Omega_m, loc=0.305, scale=0.1)
reweighted = chain.importance_sample(logL_new, action='replace')

if np.all(reweighted.weights == chain.weights):
    print("Weights equal")

reweighted.plot_1d(axes, label='Reweighted', color='red', linestyle='dashed'); #hist, kde, scatter
plt.legend()
plt.show()

Figure_1

@lukashergt
Copy link
Collaborator

In the last example the scale is huge, which isn't ideal either.
But since you used 'replace' that shouldn't be the issue and there indeed seems to be nothing happening here.
What do the posteriors look like with NestedSamples instead of MCMCSamples?

@Stefan-Heimersheim
Copy link
Collaborator Author

Without converting to MCMC samples, it works indeed:

Figure_1

@Stefan-Heimersheim
Copy link
Collaborator Author

Just talked to @lukashergt, it seems my chains might be a bit weird due to cobaya's treatment of non-uniform priors (see this pull request) and that might be causing some of the issues with NestedSamples.

The importance sampling function of MCMCSamples seems to be wrong, missing the parts where weights are changed. I'll make a separate bug report for that.

@Stefan-Heimersheim
Copy link
Collaborator Author

Okay so now I am at the point where I need evidences of my importance-samples -- do I understand this correctly that importance_sample for NestedSamples is not (correctly) implemented yet?

Can I just compute the evidence of the importance-sampled run by computing Z_new = Z_old * np.average(additional_likelihood, weights=samples.weights)? additional_likelihood is the value of the new likelihood (not log) that is to be added. Derived from

Z_old = integral P(D|theta) P(theta) d theta
Z_new = integral P_additional(D|theta) P(D|theta) P(theta) d theta
= integral P_additional(D|theta) [P(D|theta) P(theta) / Z_old] d theta * Z_old
= np.average(additional_likelihood, weights=samples.weights) * Z_old

The term in square brackets is the normalized posterior so we can interpret the integral as the expectation value of the added likelihood P_additional.

I tried this out with the run_pypolychord example code, the normal run finds an evidence of around -3+/-0.2, and together with the additional likelihood the combined likelihood gives an evidence of around 0.1+/-0.2.
Using the expectation value formula gives roughly the right result of ~0 although it varies between runs.
importance_sample seems to give values around +2.8 which seems closer to the difference between the log evidences, but as said before I don't think this is correctly implemented yet. I also the produced triangle plot (grey: Old likelihood, blue: New likelihood, red: Importance sample) which seems to be not correct.
anesthetic_test

Full code to reproduce:

from numpy import pi, log, sqrt
import pypolychord
from pypolychord.settings import PolyChordSettings
from pypolychord.priors import UniformPrior
try:
    from mpi4py import MPI
except ImportError:
    pass

from copy import deepcopy
import scipy.stats as sst
import matplotlib.pyplot as plt
import numpy as np
#| Define a four-dimensional spherical gaussian likelihood,
#| width sigma=0.1, centered on the 0 with one derived parameter.
#| The derived parameter is the squared radius

nDims = 4
nDerived = 1
sigma = 0.1

def likelihood(theta):
    """ Simple Gaussian Likelihood"""

    nDims = len(theta)
    r2 = sum(theta**2)
    logL = -log(2*pi*sigma*sigma)*nDims/2.0
    logL += -r2/2/sigma/sigma

    return logL, [r2]

def importaance_likelihood(theta):
    return sst.norm.logpdf(theta[0], scale=0.1, loc=0)+sst.norm.logpdf(theta[1], scale=0.1, loc=0)+sst.norm.logpdf(theta[2], scale=0.1, loc=0)

def likelihood2(theta):
    """ Simple Gaussian Likelihood"""

    nDims = len(theta)
    r2 = sum(theta**2)
    logL = -log(2*pi*sigma*sigma)*nDims/2.0
    logL += -r2/2/sigma/sigma

    logL += importaance_likelihood(theta)

    return logL, [r2]



#| Define a box uniform prior from -1 to 1

def prior(hypercube):
    """ Uniform prior from [-1,1]^D. """
    return UniformPrior(-1, 1)(hypercube)

#| Optional dumper function giving run-time read access to
#| the live points, dead points, weights and evidences

def dumper(live, dead, logweights, logZ, logZerr):
    print("Last dead point:", dead[-1])

#| Initialise the settings

settings = PolyChordSettings(nDims, nDerived)
settings.file_root = 'gaussian'
settings.nlive = 200
settings.do_clustering = True
settings.read_resume = False

settings2 = PolyChordSettings(nDims, nDerived)
settings2.file_root = 'gaussian2'
settings2.nlive = 200
settings2.do_clustering = True
settings2.read_resume = False

#| Run PolyChord

output = pypolychord.run_polychord(likelihood, nDims, nDerived, settings, prior, dumper)
# | log(Z) =           -3.05018 +/-            0.18038 |

output2 = pypolychord.run_polychord(likelihood2, nDims, nDerived, settings2, prior, dumper)
# | log(Z) =            0.05955 +/-            0.19555 |


#| Create a paramnames files

paramnames = [('p%i' % i, r'\theta_%i' % i) for i in range(nDims)]
paramnames += [('r*', 'r')]
output.make_paramnames_files(paramnames)
output2.make_paramnames_files(paramnames)

#| Make an anesthetic plots

from anesthetic import NestedSamples
samples = NestedSamples(root= settings.base_dir + '/' + settings.file_root)
samples2 = NestedSamples(root= settings.base_dir + '/' + settings2.file_root)
fig, axes = samples.plot_2d(['p0','p1','p2','p3','r'], alpha=1, color='grey')
samples2.plot_2d(axes, alpha=0.5, color='blue')

importance_loglikes = importaance_likelihood([samples.p0,samples.p1,samples.p2])

# a) Resample using anesthetic
resamples = deepcopy(samples).importance_sample(importance_loglikes)
print("New evicende (a)", resamples.logZ()) #2.76

resamples.plot_2d(axes, alpha=0.3, color='red')
plt.show()

# b) Manually estimate new evidence
importance_likes = np.exp(importance_loglikes)
factor = np.average(importance_likes, weights=samples.weights)
print("Evidence changes by factor of", factor)
print("New evidence (b)", samples.logZ()+np.log(factor)) #-0.01

@williamjameshandley
Copy link
Collaborator

Hi @Stefan-Heimersheim. Many thanks for this extremely useful working example, which demonstrates the issue unambiguously. This does show that there is clearly a problem with the importance sampling method for nested samples. In theory the evidence should work, but the fact that contour-wise grey!=red means that there is a deeper problem. This is something @lukashergt and I have discussed in passing (e.g. it's kind of clear that the current approach would fail if one added a constant log-likelihood). I will look into this in the next two weeks, as I have a good idea as to how one fixes this problem -- in analogy to how nested sampling can compute evidences and posteriors at different thermodynamic temperatures, but working this into an importance sampling framework is a small project which I'll have time for in april.

In the meantime, your approach using the posterior average of the importance weights is both mathematically and functionally correct, and is in the spirit of nested sampling that a base run can give you the overall normalisation, and then the importance sampled run giving you the adjusted Bayes factor for the new likelihood. If you want to get the errors on this, this will be dominated by the evidence error, which can be computed as usual by sampling using samples.logZ(1000).std().

@lukashergt
Copy link
Collaborator

lukashergt commented Mar 19, 2021

Very strange, I did test something very similar in the past, getting this plot:

grafik

where the N(mu,sigma) refer to normal distributions. The green importance sample of L1 + N(2,1) nicely recovers the orange L2...

Unfortunately the mock data for the corresponding notebook is still on a ship on the Atlantik... so I can't rerun this, yet. But this could indicate that this worked as expected in the past, but was broken subsequently...?

Could the MCMCSamples.importance_sample from #139 have messed this up?

@williamjameshandley
Copy link
Collaborator

@lukashergt, this came up in conversation with @htjb this afternoon -- did you get a chance to investigate the discrepancy?

@Stefan-Heimersheim
Copy link
Collaborator Author

Could the MCMCSamples.importance_sample from #139 have messed this up?

This sounds plausible. NestedSamples.importance_sample calls MCMCSamples.importance_sample in this line I believe

class NestedSamples(MCMCSamples):
    # ...
    def importance_sample(self, logL_new, action='add', inplace=False):
        # ...
        samples = super().importance_sample(logL_new, action=action)

Previously MCMCSamples.importance_sample was broken and did "nothing", now that it "does something" it could break NestedSamples.importance_sample which might have relied on a bug in MCMCSamples.importance_sample?
Unfortunately I am not familiar enough with nested sampling at the moment to correctly implement NestedSamples.importance_sample myself / confidently check review that code.

@lukashergt
Copy link
Collaborator

lukashergt commented Jul 9, 2022

Sorry I didn't get around to this sooner. I just did a quick mock test (anesthetic version 2.0.0-beta.12), and my impression is that things still work.

I've used the following likelihoods, once only L1 (blue):

likelihood: 
  L1: 'lambda x, y: stats.multivariate_normal.logpdf([x, y], mean=[  0, 0], cov=1)

the other time both L1 and L2 (orange):

likelihood: 
  L1: 'lambda x, y: stats.multivariate_normal.logpdf([x, y], mean=[  0, 0], cov=1)
  L2: 'lambda x, y: stats.multivariate_normal.logpdf([x, y], mean=[0.5, 0], cov=0.25)

I then importance sample the run that only used L1 with L2 (green):

ns3 = ns1.importance_sample(logL_new=stats.multivariate_normal.logpdf(ns1[['x', 'y']].values, mean=[0.5, 0], cov=0.25), 
                            action='add', 
                            inplace=False)

Orange and green should and do agree:

image

image

I haven't taken a closer look at @Stefan-Heimersheim's example, yet. I might find some time later today.

@lukashergt
Copy link
Collaborator

lukashergt commented Jul 13, 2022

Huh, that's curious. Turns out that these test cases depend a lot on the choice of scale/std... I've reduced the example to two parameters.

Case 1: this works fine

2D-Gaussian with sigma=0.5 updated with 1D-Gaussian with sigma=0.5

For this first case anesthetic works fine. Green matches orange, where orange shows again the true target distribution (i.e. both likelihoods were sampled), and green shows the importance sampled update of blue.

Cobaya yaml file for orange (same for blue, just without mvg2):

likelihood:
  mvg1: 'lambda a, b: stats.multivariate_normal.logpdf([a, b], mean=[0, 0], cov=0.5**2)'
  mvg2: 'lambda a:    stats.multivariate_normal.logpdf([a],    mean=[0],    cov=0.5**2)'
params:
  a:
    prior:
      min: -10
      max: 10
  b:
    prior:
      min: -10
      max: 10
sampler:
  polychord:
    nlive: 1000
    num_repeats: 5d
    do_clustering: false

Importance sampling with anesthetic:

green = blue.importance_sample(
    logL_new=stats.multivariate_normal.logpdf(blue[['a']].values, mean=[0], cov=0.5**2), 
    action='add', 
    inplace=False
)

Posterior distributions, evidence, and KL-divergence all match between orange and green:

image
image

Case 2: failing

2D-Gaussian with sigma=0.1 updated with 1D-Gaussian with sigma=0.1

In this second case, the only change that I did compared to the first case is to change the width of the Gaussian likelihoods from sigma=0.5 to sigma=0.1. Now the importance sampling fails on all levels.

Cobaya yaml file for orange (same for blue, just without mvg2):

likelihood:
  mvg1: 'lambda a, b: stats.multivariate_normal.logpdf([a, b], mean=[0, 0], cov=0.1**2)'
  mvg2: 'lambda a:    stats.multivariate_normal.logpdf([a],    mean=[0],    cov=0.1**2)'
params:
  a:
    prior:
      min: -10
      max: 10
  b:
    prior:
      min: -10
      max: 10
sampler:
  polychord:
    nlive: 1000
    num_repeats: 5d
    do_clustering: false

Importance sampling with anesthetic:

green = blue.importance_sample(
    logL_new=stats.multivariate_normal.logpdf(blue[['a']].values, mean=[0], cov=0.1**2), 
    action='add', 
    inplace=False
)

Green posteriors are very non-Gaussian (high kurtosis) and most definitely don't agree with orange, and evidence and KL-estimates disagree as well:

image
image

 

 

Not sure what to make of this... Ideas?

  • Is this hinting at numerical issues?
  • Is this hinting at a limit in the scope of importance sampling with nested samples? If so, it is not the known issue that there have to be sufficient samples in the importance sampled region. We know about that one, and it doesn't apply to this example...

@Stefan-Heimersheim
Copy link
Collaborator Author

Not sure what to make of this... Ideas?

Thanks for those experiments @lukashergt! Can you check if Case 2 behaves similarly if you convert the NestedSamples to MCMCSamples before doing the importance sampling? This would show if it is something to do with sampling density / numerical issues, or with the particular importance sampling implementation for NestedSamples.

@lukashergt
Copy link
Collaborator

Nice test! And indeed, converting the nested samples from Case 2 to MCMC samples and then doing the importance sampling gets you to the correct posterior:

mc7 = MCMCSamples(data=ns7[['a', 'b']], logL=ns7.logL)
mc8 = MCMCSamples(data=ns8[['a', 'b']], logL=ns8.logL)
mc9 = mc7.importance_sample(logL_new=stats.multivariate_normal.logpdf(ns7[['a']].values, mean=[0], cov=0.1**2), 
                            action='add', 
                            inplace=False)

image

So it looks like it isn't a numerical issue...

But then why does sigma=0.5 look fine, while sigma=0.1 fails utterly?

@lukashergt lukashergt added the bug Something isn't working label Aug 8, 2022
@lukashergt lukashergt added this to the 2.0.1 milestone Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants