Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling UX #11

Open
rlouf opened this issue Mar 26, 2020 · 2 comments
Open

Sampling UX #11

rlouf opened this issue Mar 26, 2020 · 2 comments

Comments

@rlouf
Copy link
Owner

rlouf commented Mar 26, 2020

Opening this to have a discussion with myself about sampling UX.

@rlouf
Copy link
Owner Author

rlouf commented Mar 27, 2020

Let the user know what is happening

A user should get at least some feedback, at best interesting information, when performing an action.

  • Progress bars are a must (at least when doing exploratory analysis) when sampling; they make the sampling times more bearable.
  • How do we manage progress bars for the warmup which is currently managed program-side? I would like to avoid adding progress bar logic there.
  • JIT-compilation of logpdfs and kernel can take some time; inform the user what is being done, and how long it took to do it.
  • It is important to warn users about the number of divergences as the chains are sampling: many users won't want to keep sampling when there are too many divergences early on. Caveat: how do we do so when we have a large number of chains? -> set_postfix in tqdm
  • Interactive reporting of ESS -> set_postfix in tqdm
  • Give some important statistics on the sampled variables at the end of sampling: median value, variance across chains, Rhat, ESS / chain, average acceptance rate.

Inference data

There are several things we need to consider when thinking about how to represent inference data in mcx:

  • Full interoperability with ArviZ. It seems that many libraries add a functionality to transform their internal format to the ArviZ library. We can take care of that in mcx with a to_arviz() method if we go the object way;
  • Since sequential sampling is central in the library, we need to be able to add samples as we go. Dictionaries would make this cumbersome.

This begs the question of the diagnostics when we append new samples to the trace. If we decide to keep track of them in the trace, besides divergences we have to think about how to handle them: do we manage their value at the execution level, or the in the trace?

I tend to lean towards the execution: why should the trace be anything else than a data store and manage calculations as well? The generate executor can easily keep track of the state of the algorithms used to compute diagnostics; for sample we would have to add these states to the class’ state.

@rlouf
Copy link
Owner Author

rlouf commented Sep 22, 2020

A thought for people using mcx in a production environment: the sampler should provide clear and actionable logs.

@rlouf rlouf added this to the 0.1 milestone Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant