Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add the Option to Choose Number of Draws in PPD #871

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jgyasu
Copy link

@jgyasu jgyasu commented Jan 20, 2025

Closes #739

This PR modifies the predict method in the Model class such that it now accepts an optional argument num_draws and then randomly selects the number of draws from the posterior predictive distribution, i.e, num_draws from the total draws without replacement. If the num_draws is more than the total draws then it gives a warning and falls back to the total draws.

  • Passes pylint
  • Passes black

I will appreciate help with the testing of the method with this new parameter.

@tomicapretto
Copy link
Collaborator

@jgyasu thanks for the contribution! I want to check that I understand the logic. You're sub-setting the InferenceData object before doing any computation. Thus, everything that happens later, like computation of likelihood parameters and generation of predictive draws, is based on that number of draws. Is that correct?

@tomicapretto
Copy link
Collaborator

Two suggestions

  • Add the option to pass a random_seed for reproducibility
  • Raise an error if num_draws is not None and inplace is True. If users do that, they will be dropping random draws that they won't be able to recover.

@jgyasu
Copy link
Author

jgyasu commented Jan 20, 2025

@jgyasu thanks for the contribution! I want to check that I understand the logic. You're sub-setting the InferenceData object before doing any computation. Thus, everything that happens later, like computation of likelihood parameters and generation of predictive draws, is based on that number of draws. Is that correct?

Yes, right. All the computation remains same, they just run on the subset of the total draws.

@jgyasu
Copy link
Author

jgyasu commented Jan 20, 2025

Two suggestions

  • Add the option to pass a random_seed for reproducibility
  • Raise an error if num_draws is not None and inplace is True. If users do that, they will be dropping random draws that they won't be able to recover.

Sure, thanks! I will apply these changes and make a new commit.

bambi/models.py Outdated Show resolved Hide resolved
@tomicapretto
Copy link
Collaborator

@jgyasu thanks for the additional changes. I asked another change, and I also realized we should add a test for this case. Just let me know if you need any guidance. But what we want is a test for this num_draws argument. We want to make sure it works as expected (using fewer draws when it's smaller than the number of samples, raising an error when num_draws is not None and inplace is True, etc. etc.

@jgyasu
Copy link
Author

jgyasu commented Jan 21, 2025

@jgyasu thanks for the additional changes. I asked another change, and I also realized we should add a test for this case. Just let me know if you need any guidance. But what we want is a test for this num_draws argument. We want to make sure it works as expected (using fewer draws when it's smaller than the number of samples, raising an error when num_draws is not None and inplace is True, etc. etc.

Thanks, I will make the changes requested. And yes, this is what I was thinking! I'll try to write the tests and if I face any problem, I'll let you know!

@jgyasu
Copy link
Author

jgyasu commented Jan 22, 2025

@jgyasu thanks for the additional changes. I asked another change, and I also realized we should add a test for this case. Just let me know if you need any guidance. But what we want is a test for this num_draws argument. We want to make sure it works as expected (using fewer draws when it's smaller than the number of samples, raising an error when num_draws is not None and inplace is True, etc. etc.

Hi @tomicapretto I made the requested change and I was going through the tests directory, from my understanding the test should go in the test_models.py, please correct me if I am wrong. And there, I have to add tests similar to:

class FitPredictParent:
    def fit(self, model, **kwargs):
        return model.fit(tune=TUNE, draws=DRAWS, **kwargs)

    def predict_oos(self, model, idata, data=None):
        # Reuse the original data
        if data is None:
            data = model.data.head()
        return model.predict(idata, kind="response", data=data, inplace=False)

Should I make an additional class or add an additional method changing the arguments in the predict_oos method?

@tomicapretto
Copy link
Collaborator

@jgyasu thanks!

You don't need to create a new class. The one you shared is a parent class used by classes for specific model families.
You can create one or two functions like this one

https://github.com/bambinos/bambi/blob/6b66691ed08d88d98acbb5a229f5aeb258e7ab44/tests/test_models.py#L1319C1-L1340C42

In the .predict() method you pass the differrent options for the parameters you implemented and check the behavior is as intended. For example, you need to catch an error when num_draws is not None and inplace is True. For that you can see

with pytest.raises(ValueError, match="Model is not built yet"):
. For the num_draws, you can check the number of draws in the returned idata matches the one you passed, etc.

@jgyasu
Copy link
Author

jgyasu commented Jan 26, 2025

@jgyasu thanks!

You don't need to create a new class. The one you shared is a parent class used by classes for specific model families. You can create one or two functions like this one

https://github.com/bambinos/bambi/blob/6b66691ed08d88d98acbb5a229f5aeb258e7ab44/tests/test_models.py#L1319C1-L1340C42

In the .predict() method you pass the differrent options for the parameters you implemented and check the behavior is as intended. For example, you need to catch an error when num_draws is not None and inplace is True. For that you can see

with pytest.raises(ValueError, match="Model is not built yet"):

. For the num_draws, you can check the number of draws in the returned idata matches the one you passed, etc.

Hi, I have written the tests but I was trying to test them on my system and I tried installing all the dependencies but I get some errors during the building of pytensor.

I am on linux and in an isolated conda environment, I get this error - https://gist.github.com/jgyasu/0c767099cfa201169a703b7b886d76d9

I am wondering if I am missing some steps during installation, I will appreciate any help. Thanks!

@GStechschulte
Copy link
Collaborator

Hey @jgyasu thanks for the error output. Sorry for the naive recommendation, but have you tried creating a new environment and installing everything from scratch?

@tomicapretto
Copy link
Collaborator

@jgyasu it's usually the case that it's painful to work with PyMC and PyTensor, especially when installing new things.

What environment manager are you using? If you're using conda, it should be as easy as

Assuming you are at the root of the bambi repo, you can do

conda create --name bambi-dev python=3.11
conda activate bambi-dev
conda install -c conda-forge pymc
pip install -e . # to install bambi in editable mode
pip install -e .[dev] # dev deps 
pip install -e .[jax] # jax deps

unfortunately it's not possible to do something like pip install -e .[jax, dev] (as far as I know)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possibility to pick number of samples for calls to posterior predictive?
3 participants