Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0035 pre model gqs #54

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

0035 pre model gqs #54

wants to merge 7 commits into from

Conversation

bob-carpenter
Copy link
Collaborator

This is ready to review.

I tried to flesh out how we'd actually code this and document it in the Reference Manual.

@WardBrian
Copy link
Member

designs/0035-pre-model-gqs.md Show resolved Hide resolved
designs/0035-pre-model-gqs.md Show resolved Hide resolved
designs/0035-pre-model-gqs.md Show resolved Hide resolved
designs/0035-pre-model-gqs.md Show resolved Hide resolved
Comment on lines +290 to +292
The result will be a multiple imputation, not a coherent joint
Bayesian model. This will work even if the missing data is discrete
or the conditional distributions are not coherent.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Plummer (2015) paper says that naive cut algorithm does not converge to multiple imputation distribution, and I don't immediately see how a latent data block could implement the suggested tempered cut algorithm.

In my opinion, Plummer's discussion is pretty pessimistic about the utility of a latent data block:

Exact MCMC sampling from a cut model seems to require evaluating the marginal
likelihood p(Y | ϕ) which is computationally expensive
(Gelman and Meng 1998). It might be possible, as suggested
by a reviewer, that auxiliary variable methods such as the one
proposed by Møller et al. (2006) may allow this evaluation to
be skipped. Otherwise, it appears unlikely that any MCMC
method can sample exactly from the target density p∗(θ, ϕ).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This I have to think about more. I'll have to go back and read Plummer's paper again!

Copy link
Collaborator Author

@bob-carpenter bob-carpenter Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by Plummer's proof (p. 40). He assumes you have to satisfy a factored form of detailed balance that would be sufficient, but isn't necessary. Here's the full Bayes graphical model example from the paper, with parameters $\phi, \theta$ and data $y, z$,

$z \sim p(\cdot \mid \phi)$ and $y \sim p(\cdot \mid \phi, \theta)$.

Then BUGS lets you put a "cut" between $z, \phi$ and $y, \theta$ (at least that's how it's drawn---I like the graphic of a dashed line as "cut here").

Plummer says (section 2, right after equation (4)) that we can sample from the cut distribution using multiple imputation as

$\phi^{(n)} \sim p(\phi \mid z)$ and $\theta^{(n)} \sim p(\theta \mid \phi^{(n)}, y, z)$.

I'm pretty sure this is exactly what our latent data or cut parameters block will do. Each iteration $n$, it will sample $\phi^{(n)}$ in the cut parameters block (that gives one of the multiple imputations) and then sample `\theta^{(n)}$ as an ordinary parameter with target density $\theta^{(n)}$ for each $\phi^{(n)}$ given as above. Clearly if we sampled $\phi^{(n)}$ once and then $\theta^{(n)}$ multiple times, we'd get exactly the multiple imputation result someone might compute. So I don't see why it won't work if we only take one draw of $\theta^{(n)}$ for each $\phi^{(n)}$.

In Figure 3, Plummer shows that the "adaptive Metropllis 1D" sampler gets roughly the same answer as multiple imputation. We should also get roughly the same answer as multiple imputation if HMC is able to sample from
$p(\theta \mid \phi^{(n)}, y, z)$. The main worry we've had in the past about this is that it's a moving target for adaptation. But then not necessarily more so than other complex posteriors.

I added section at the end on functional testing and added a note saying that we'll test by making sure we can recreate the multiple imputation results in an example like Plummer's.

I may be missing something here, so I'm going to leave this open pending further comments.

designs/0035-pre-model-gqs.md Show resolved Hide resolved
Comment on lines +374 to +377
#### Iteration number

The iteration number can be optionally set in the latent data block
from the outside. There will be an `iteration_number__` parameter
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this needed for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewgelman's been bugging us to do this for ages so that

a. he can write something like if (mod(iteration_number, 100)) print(...), which I think is probably not going to do what he wants because it'll print every leapfrog iteration, and

b. so we can implement something like tempering by using iteration number to modify density. Again, all sorts of issues with these algorithms.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels adjacent to the main point of the proposal. As long as we don't introduce state to the model class I'm fine with this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also add a big note that this proposal does not include any mutable state in the model.

Yes, iteration number is adjacent and we could put it in a separate proposal, but it's going to be really easy to do this way, I think. I can take it out if people think it's over-reaching.

designs/0035-pre-model-gqs.md Show resolved Hide resolved
Comment on lines +533 to +534
There is no alternative for the Gibbs-based discrete parameter
inference.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first alternative is marginalization.
The second alternative is to implement the HMC+Gibbs algorithm in the Stan algorithms library and just allow models to declare discrete parameters.
Also we should consider special syntax for discrete parameters that may allow either marginalization or Gibbs chosen at runtime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify that I meant there isn't an existing alternative for Gibbs within Stan.

You can marginalize only when tractable, so it rules out high dimensional categorical and unbounded count data and also things like variable selection (not that those work well with Gibbs).

The original design of the model class included discrete parameters because I was thinking we could just use the entire density to update with Metropolis or with Metropolis-within-Gibbs style updates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks---added both clarifications as suggested. I'll leave this open for comment now rather than closing.

I think a proposal to add automatic Gibbs sampling to Stan should be another design document. We could do generic Metropolis-within-Gibbs, but I asked the PyMC devs, and they say they can compute the Markov blanket to do this more efficiently.

designs/0035-pre-model-gqs.md Show resolved Hide resolved
Comment on lines +469 to +527
```cpp
struct latent_data : public latent_data_base_crtp<latent_data> {
int iteration_number__; // included by default in all latent_data
double sens;
double spec;
};
```

Here, the `latent_data_base_crtp` will extend `latent_data_base`,
following the pattern used for the model class.

### Constructing the latent data generically

Just like the model class, this will require a generic factory method,
though in this case, no arguments are required as it will be used like
a simple `struct`.

```cpp
latent_data_base& new_latent_data();
```

This must then get passed where necessary and cast into the relevant
type in the class definition, as shown in the next section.


### Generating the latent data

An additional method in the model class will be required to populate
the latent data.

```cpp
template <typename RNG> inline void
set_latent_data(
const VectorXd& params_r,
latent_data_base& ldb,
RNG& base_rng,
std::ostream* pstream=nullptr) const {
latent_data& latent_data__ = static_cast<latent_data&>(ldb);
... code generated to populate latent_data__ ...
}
```

Because it does not use anything specific to the specific model class,
this can be implemented inthe `latent_data_base_crtp` class using the
CRTP.

```
template <class Derived>
class latent_data_base_crtp<Derived> {
template <typename RNG> inline void
set_latent_data(
const VectorXd& params_r,
latent_data_base& ldb,
RNG& base_rng,
std::ostream* pstream=nullptr) const {
static_cast<Derived&>(ldb).set_latent_data(params_r, ldb,
base_rng, pstream);
};
```
Copy link
Collaborator

@SteveBronder SteveBronder Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's just holding data that is made once I think we can use an allocator scheme instead. So we define a latent data struct inside of the model class which will also use the allocator for acquiring memory.

struct latent_data {
  int iteration_number__;  // included by default in all latent_data
  // Will hold memory owned by allocator
  Eigen::Map<Eigen::Matrix<double, -1, 1>> sens; 
  std::vector<double, stan::math::local_allocator> spec;  
};

For the allocator, we will then also make a member function in the model block that will tell us the size of the latent data struct.

std::size_t get_latent_size_bytes() {
  // Assume `sens` is size 5 and spec is size 6
  return sizeof(latent_data) + 5 * sizeof(double) + 6 * sizeof(double);
}

The allocator will look very similar to Stan's arena allocator, but instead of storing a vector of unsigned char pointers this will only store one void pointer. We will use just one void pointer so that it's easy to cast the memory inside of the allocator to the latent_data class when we need it.

struct local_allocator {
  void* mem_;
  // Must be fixed size
  local_allocator(std::size_t init_bytes) : mem_(malloc(init_bytes));
  template<typename T>
  void* allocate(std::size_t amount);

  template <typename T>
  T* cast_this() {
    return reinterpret_cast<T*>(mem_);
  }
};

Since the sizes of variables have to come from data and transformed data we can know at initialization the amount of memory we need.

stan::math::local_allocator latent_alloc(model->get_latent_size_bytes());

Then the model class will have a latent_data member function that uses the allocator as an in/out parameter

inline void
set_latent_data(stan::math::allocator& alloc) {
  latent_data* = new (alloc.allocate<latent_data>()) latent_data{
    0, 
    Eigen::Map<Eigen::Matrix<double, -1, 1>>(
      new (alloc.allocate<double>(5), 5, 1)),
    std::vector<double, stan::math::allocator>(5, alloc)
  };
  // Fill in latent_data...
  return;
}

So now we can have a new log_prob that can take in the allocator. The log_prob will call the allocators cast function to get back the pointer to the latent_data. We can either use the latent_data pointer for the rest of the code or pull out references to the members of latent_data.

auto log_prob(stan::math::allocator& alloc, ...) {
  latent_data* = alloc.cast<latent_data>();
  // Just makes a reference, very low cost
  auto&& spec = latent_data->spec;
  // Rest of the program written as normal
  auto X = Y * spec;
}

And with this scheme the model class would also have new member functions for printing latent data names and latent data sizes etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's just holding data that is made once I think we can use an allocator scheme instead.

What's the advantage to doing it the way you propose? It looks very similar to me, so I'm sure I'm missing something important.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It removes inheritance, does not need a generic factory method, and goes more into how to handle dynamically allocated objects. I don't think we need to have a base class that we upcast to the actual class. Since this is just used in the model class the rest of the stan code can just worry about passing around an allocator that holds the memory for the latent data struct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants