-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementation details docs page, note on propto=True for log_density #192
Conversation
Here is a demo of what I'm describing, using this model import bridgestan
import timeit
model = bridgestan.StanModel('sir.stan', 'sir.data.json')
with open('sir.init.json') as f:
params = model.param_unconstrain_json(f.read())
def one():
model.log_density(params, propto=True)
def two():
model.log_density(params, propto=False)
timeit.timeit(one, number=1000) # warms-up caches etc
time_one = timeit.timeit(one, number=1000)
time_two = timeit.timeit(two, number=1000)
print(f"propto=T: {time_one*1000:.1f}ms")
print(f"propto=F: {time_two*1000:.1f}ms") This prints
So, over 4 times slower for this model! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK as is, but I added some suggestions and am happy to re-review.
``log_density`` with ``propto=true`` | ||
------------------------------------ | ||
|
||
The log density function provided by a Stan model has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would phrase this as having the ability to drop constants. Then I'd give simple recommendations:
- If you're running MCMC that needs gradients and only density up to proportion, then use
propto = true
. Settingpropto=true
will be at least as fast. - To evaluate the log density on
double
values to match, usepropto=true
. Settingpropto=true
may be slower or faster, depending on the cost of calculating normalizing constants (propto=false
) and the cost of autodiff (required to get the right answer ifpropto=true
).
I don't think we need to say much more than that.
Why the double back ticks?
I couldn't understand what lines 34/35 were doing.
I'd just give simple recommendations:
- doing gradient-based calculations with autodiff: use propto because it's faster
- doing log density evals without autodiff: depending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double back ticks are how ReStructuredText (that sphinx uses) wants code formatted. They're equivalent to single backticks in Markdown. Lines 34/35 are also a RST detail to get a link that is also code formatted
. The result is that "|reduce_sum|" above gets rendered as reduce_sum
I agree phrasing it in terms of a suggestion for each case is clearer. I left the explanation in, but under a sub-heading for the curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having things like this in BridgeStan cause it is overall details about Stan that affect our users and I definitely want to keep it here. I just would rather offer the advice appropriate for our users and then point to Stan documentation. But in the end, I don't know where this exists in Stan documentation.
``propto=True`` will be at least as fast as setting ``propto=False`` | ||
and is generally recommended (and the default value). | ||
|
||
However, in the case of the ``log_density`` function (which does not calculate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to restate the "only needs the log density up to a proportion" bit again in this paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? We're recommending setting it to False
in this paragraph, which is safe for all usages
I think this is essentially absent from the Stan documentation, since calculating gradients is essentially taken for granted everywhere in Stan |
It's discussed in the efficiency section of the User's Guide and at length in the Reference Manual, which covers which things get autodiffed and which ones are just |
Is there any place that would obviously lead the reader to the conclusion this discusses? e.g., that for the |
Good point---we talk about everything you would need to draw this conclusion yourself, but I don't think we ever connect the dots. We probably should. I added an issue for the User's Guide efficiency chapter: |
This is a bit of a weird thing I've been thinking about recently that it is good to alert users for. It's related to #165 and #180.
Basically
propto=True
requires we passvars
to Stan. Before #165, we were even callinggrad
, wasting a lot of computation. Since #165, we no longer call grad, but it still may lead to more work than you'd expect, since the Stan math library assumes (justifiably) that if you're calling a function withvar
s you will want gradients, and so it can do some pre-computation for you. Because we never call/usegrad()
inlog_density
, this is wasted effort.The big offenders are the higher order functions like
reduce_sum
, which basically calculate their entire gradients in the "forward pass".I wrote up a docs page on implementation details and added a section about this. I'm not sure if it should be linked other places or not.