Add implementation details docs page, note on propto=True for log_density #192

WardBrian · 2023-12-05T21:55:29Z

This is a bit of a weird thing I've been thinking about recently that it is good to alert users for. It's related to #165 and #180.

Basically propto=True requires we pass vars to Stan. Before #165, we were even calling grad, wasting a lot of computation. Since #165, we no longer call grad, but it still may lead to more work than you'd expect, since the Stan math library assumes (justifiably) that if you're calling a function with vars you will want gradients, and so it can do some pre-computation for you. Because we never call/use grad() in log_density, this is wasted effort.

The big offenders are the higher order functions like reduce_sum, which basically calculate their entire gradients in the "forward pass".

I wrote up a docs page on implementation details and added a section about this. I'm not sure if it should be linked other places or not.

…sity

WardBrian · 2023-12-06T15:43:03Z

Here is a demo of what I'm describing, using this model

import bridgestan
import timeit

model = bridgestan.StanModel('sir.stan', 'sir.data.json')

with open('sir.init.json') as f:
    params = model.param_unconstrain_json(f.read())


def one():
    model.log_density(params, propto=True)

def two():
    model.log_density(params, propto=False)

timeit.timeit(one, number=1000) # warms-up caches etc

time_one = timeit.timeit(one, number=1000)
time_two = timeit.timeit(two, number=1000)

print(f"propto=T: {time_one*1000:.1f}ms")
print(f"propto=F: {time_two*1000:.1f}ms")

This prints

propto=T: 329.7ms
propto=F: 81.4ms

So, over 4 times slower for this model!

bob-carpenter

I think it's OK as is, but I added some suggestions and am happy to re-review.

bob-carpenter · 2023-12-06T19:37:41Z

docs/internals/details.rst

+``log_density`` with ``propto=true``
+------------------------------------
+
+The log density function provided by a Stan model has


I would phrase this as having the ability to drop constants. Then I'd give simple recommendations:

If you're running MCMC that needs gradients and only density up to proportion, then use propto = true. Setting propto=true will be at least as fast.

To evaluate the log density on double values to match, use propto=true. Setting propto=true may be slower or faster, depending on the cost of calculating normalizing constants (propto=false) and the cost of autodiff (required to get the right answer if propto=true).

I don't think we need to say much more than that.

Why the double back ticks?

I couldn't understand what lines 34/35 were doing.

I'd just give simple recommendations:

doing gradient-based calculations with autodiff: use propto because it's faster

doing log density evals without autodiff: depending

The double back ticks are how ReStructuredText (that sphinx uses) wants code formatted. They're equivalent to single backticks in Markdown. Lines 34/35 are also a RST detail to get a link that is also code formatted. The result is that "|reduce_sum|" above gets rendered as reduce_sum

I agree phrasing it in terms of a suggestion for each case is clearer. I left the explanation in, but under a sub-heading for the curious.

roualdes

I like having things like this in BridgeStan cause it is overall details about Stan that affect our users and I definitely want to keep it here. I just would rather offer the advice appropriate for our users and then point to Stan documentation. But in the end, I don't know where this exists in Stan documentation.

docs/internals/details.rst

roualdes · 2023-12-07T17:46:01Z

docs/internals/details.rst

+``propto=True`` will be at least as fast as setting ``propto=False``
+and is generally recommended (and the default value).
+
+However, in the case of the ``log_density`` function (which does not calculate


I think it would be good to restate the "only needs the log density up to a proportion" bit again in this paragraph.

Why? We're recommending setting it to False in this paragraph, which is safe for all usages

docs/internals/details.rst

WardBrian · 2023-12-07T19:04:12Z

I think this is essentially absent from the Stan documentation, since calculating gradients is essentially taken for granted everywhere in Stan

bob-carpenter · 2023-12-07T19:16:22Z

I think this is essentially absent from the Stan documentation, since calculating gradients is essentially taken for granted everywhere in Stan

It's discussed in the efficiency section of the User's Guide and at length in the Reference Manual, which covers which things get autodiffed and which ones are just double-based.

WardBrian · 2023-12-07T19:21:36Z

Is there any place that would obviously lead the reader to the conclusion this discusses? e.g., that for the log_density function, propto can have dramatically different performance implications than it does for log_density_gradient?

bob-carpenter · 2023-12-07T19:28:26Z

Good point---we talk about everything you would need to draw this conclusion yourself, but I don't think we ever connect the dots. We probably should. I added an issue for the User's Guide efficiency chapter:

stan-dev/docs#692

Add implementation details docs page, note on propto=True for log_den…

509608a

…sity

WardBrian added the documentation Improvements or additions to documentation label Dec 5, 2023

WardBrian requested review from roualdes and bob-carpenter December 6, 2023 15:48

bob-carpenter approved these changes Dec 6, 2023

View reviewed changes

Rephrase in terms of putting suggestions first

1c85b24

roualdes approved these changes Dec 7, 2023

View reviewed changes

Wording tweaks

2e9550b

WardBrian merged commit 97bcea3 into main Dec 7, 2023
19 checks passed

WardBrian deleted the docs/propto-true-notice branch December 7, 2023 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add implementation details docs page, note on propto=True for log_density #192

Add implementation details docs page, note on propto=True for log_density #192

WardBrian commented Dec 5, 2023

WardBrian commented Dec 6, 2023

bob-carpenter left a comment

bob-carpenter Dec 6, 2023

WardBrian Dec 6, 2023

roualdes left a comment

roualdes Dec 7, 2023

WardBrian Dec 7, 2023 •

edited

Loading

WardBrian commented Dec 7, 2023

bob-carpenter commented Dec 7, 2023

WardBrian commented Dec 7, 2023

bob-carpenter commented Dec 7, 2023

Add implementation details docs page, note on propto=True for log_density #192

Add implementation details docs page, note on propto=True for log_density #192

Conversation

WardBrian commented Dec 5, 2023

WardBrian commented Dec 6, 2023

bob-carpenter left a comment

Choose a reason for hiding this comment

bob-carpenter Dec 6, 2023

Choose a reason for hiding this comment

WardBrian Dec 6, 2023

Choose a reason for hiding this comment

roualdes left a comment

Choose a reason for hiding this comment

roualdes Dec 7, 2023

Choose a reason for hiding this comment

WardBrian Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

WardBrian commented Dec 7, 2023

bob-carpenter commented Dec 7, 2023

WardBrian commented Dec 7, 2023

bob-carpenter commented Dec 7, 2023

WardBrian Dec 7, 2023 •

edited

Loading