-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0035 pre model gqs #54
base: master
Are you sure you want to change the base?
Conversation
The result will be a multiple imputation, not a coherent joint | ||
Bayesian model. This will work even if the missing data is discrete | ||
or the conditional distributions are not coherent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Plummer (2015) paper says that naive cut algorithm does not converge to multiple imputation distribution, and I don't immediately see how a latent data
block could implement the suggested tempered cut algorithm.
In my opinion, Plummer's discussion is pretty pessimistic about the utility of a latent data
block:
Exact MCMC sampling from a cut model seems to require evaluating the marginal
likelihood p(Y | ϕ) which is computationally expensive
(Gelman and Meng 1998). It might be possible, as suggested
by a reviewer, that auxiliary variable methods such as the one
proposed by Møller et al. (2006) may allow this evaluation to
be skipped. Otherwise, it appears unlikely that any MCMC
method can sample exactly from the target density p∗(θ, ϕ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This I have to think about more. I'll have to go back and read Plummer's paper again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced by Plummer's proof (p. 40). He assumes you have to satisfy a factored form of detailed balance that would be sufficient, but isn't necessary. Here's the full Bayes graphical model example from the paper, with parameters
Then BUGS lets you put a "cut" between
Plummer says (section 2, right after equation (4)) that we can sample from the cut distribution using multiple imputation as
I'm pretty sure this is exactly what our latent data
or cut parameters
block will do. Each iteration cut parameters
block (that gives one of the multiple imputations) and then sample `\theta^{(n)}$ as an ordinary parameter with target density
In Figure 3, Plummer shows that the "adaptive Metropllis 1D" sampler gets roughly the same answer as multiple imputation. We should also get roughly the same answer as multiple imputation if HMC is able to sample from
I added section at the end on functional testing and added a note saying that we'll test by making sure we can recreate the multiple imputation results in an example like Plummer's.
I may be missing something here, so I'm going to leave this open pending further comments.
#### Iteration number | ||
|
||
The iteration number can be optionally set in the latent data block | ||
from the outside. There will be an `iteration_number__` parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this needed for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewgelman's been bugging us to do this for ages so that
a. he can write something like if (mod(iteration_number, 100)) print(...)
, which I think is probably not going to do what he wants because it'll print every leapfrog iteration, and
b. so we can implement something like tempering by using iteration number to modify density. Again, all sorts of issues with these algorithms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels adjacent to the main point of the proposal. As long as we don't introduce state to the model class I'm fine with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should also add a big note that this proposal does not include any mutable state in the model.
Yes, iteration number is adjacent and we could put it in a separate proposal, but it's going to be really easy to do this way, I think. I can take it out if people think it's over-reaching.
There is no alternative for the Gibbs-based discrete parameter | ||
inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first alternative is marginalization.
The second alternative is to implement the HMC+Gibbs algorithm in the Stan algorithms library and just allow models to declare discrete parameters.
Also we should consider special syntax for discrete parameters that may allow either marginalization or Gibbs chosen at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clarify that I meant there isn't an existing alternative for Gibbs within Stan.
You can marginalize only when tractable, so it rules out high dimensional categorical and unbounded count data and also things like variable selection (not that those work well with Gibbs).
The original design of the model class included discrete parameters because I was thinking we could just use the entire density to update with Metropolis or with Metropolis-within-Gibbs style updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks---added both clarifications as suggested. I'll leave this open for comment now rather than closing.
I think a proposal to add automatic Gibbs sampling to Stan should be another design document. We could do generic Metropolis-within-Gibbs, but I asked the PyMC devs, and they say they can compute the Markov blanket to do this more efficiently.
```cpp | ||
struct latent_data : public latent_data_base_crtp<latent_data> { | ||
int iteration_number__; // included by default in all latent_data | ||
double sens; | ||
double spec; | ||
}; | ||
``` | ||
|
||
Here, the `latent_data_base_crtp` will extend `latent_data_base`, | ||
following the pattern used for the model class. | ||
|
||
### Constructing the latent data generically | ||
|
||
Just like the model class, this will require a generic factory method, | ||
though in this case, no arguments are required as it will be used like | ||
a simple `struct`. | ||
|
||
```cpp | ||
latent_data_base& new_latent_data(); | ||
``` | ||
|
||
This must then get passed where necessary and cast into the relevant | ||
type in the class definition, as shown in the next section. | ||
|
||
|
||
### Generating the latent data | ||
|
||
An additional method in the model class will be required to populate | ||
the latent data. | ||
|
||
```cpp | ||
template <typename RNG> inline void | ||
set_latent_data( | ||
const VectorXd& params_r, | ||
latent_data_base& ldb, | ||
RNG& base_rng, | ||
std::ostream* pstream=nullptr) const { | ||
latent_data& latent_data__ = static_cast<latent_data&>(ldb); | ||
... code generated to populate latent_data__ ... | ||
} | ||
``` | ||
|
||
Because it does not use anything specific to the specific model class, | ||
this can be implemented inthe `latent_data_base_crtp` class using the | ||
CRTP. | ||
|
||
``` | ||
template <class Derived> | ||
class latent_data_base_crtp<Derived> { | ||
template <typename RNG> inline void | ||
set_latent_data( | ||
const VectorXd& params_r, | ||
latent_data_base& ldb, | ||
RNG& base_rng, | ||
std::ostream* pstream=nullptr) const { | ||
static_cast<Derived&>(ldb).set_latent_data(params_r, ldb, | ||
base_rng, pstream); | ||
}; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's just holding data that is made once I think we can use an allocator scheme instead. So we define a latent data struct inside of the model class which will also use the allocator for acquiring memory.
struct latent_data {
int iteration_number__; // included by default in all latent_data
// Will hold memory owned by allocator
Eigen::Map<Eigen::Matrix<double, -1, 1>> sens;
std::vector<double, stan::math::local_allocator> spec;
};
For the allocator, we will then also make a member function in the model block that will tell us the size of the latent data struct.
std::size_t get_latent_size_bytes() {
// Assume `sens` is size 5 and spec is size 6
return sizeof(latent_data) + 5 * sizeof(double) + 6 * sizeof(double);
}
The allocator will look very similar to Stan's arena allocator, but instead of storing a vector of unsigned char pointers this will only store one void pointer. We will use just one void pointer so that it's easy to cast the memory inside of the allocator to the latent_data
class when we need it.
struct local_allocator {
void* mem_;
// Must be fixed size
local_allocator(std::size_t init_bytes) : mem_(malloc(init_bytes));
template<typename T>
void* allocate(std::size_t amount);
template <typename T>
T* cast_this() {
return reinterpret_cast<T*>(mem_);
}
};
Since the sizes of variables have to come from data and transformed data we can know at initialization the amount of memory we need.
stan::math::local_allocator latent_alloc(model->get_latent_size_bytes());
Then the model class will have a latent_data
member function that uses the allocator as an in/out parameter
inline void
set_latent_data(stan::math::allocator& alloc) {
latent_data* = new (alloc.allocate<latent_data>()) latent_data{
0,
Eigen::Map<Eigen::Matrix<double, -1, 1>>(
new (alloc.allocate<double>(5), 5, 1)),
std::vector<double, stan::math::allocator>(5, alloc)
};
// Fill in latent_data...
return;
}
So now we can have a new log_prob
that can take in the allocator. The log_prob will call the allocators cast
function to get back the pointer to the latent_data
. We can either use the latent_data
pointer for the rest of the code or pull out references to the members of latent_data
.
auto log_prob(stan::math::allocator& alloc, ...) {
latent_data* = alloc.cast<latent_data>();
// Just makes a reference, very low cost
auto&& spec = latent_data->spec;
// Rest of the program written as normal
auto X = Y * spec;
}
And with this scheme the model class would also have new member functions for printing latent data names and latent data sizes etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's just holding data that is made once I think we can use an allocator scheme instead.
What's the advantage to doing it the way you propose? It looks very similar to me, so I'm sure I'm missing something important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It removes inheritance, does not need a generic factory method, and goes more into how to handle dynamically allocated objects. I don't think we need to have a base class that we upcast to the actual class. Since this is just used in the model class the rest of the stan code can just worry about passing around an allocator that holds the memory for the latent data struct.
This is ready to review.
I tried to flesh out how we'd actually code this and document it in the Reference Manual.