Implement TruncatedDistribution #121

alicanb · 2018-02-02T17:11:23Z

this PR adds:

.cdf() and icdf() methods for Distribution and tests. (Populated only for Normal for now)
TruncatedDistribution class
TruncatedNormal class

closes #78, touches #120
@tbrx I forgot you volunteered for this, want to work together? This is a very rough sketch at the moment. @fritzo it's not at the stage where I request a review, but comments welcome as always

vishwakftw · 2018-02-02T18:00:54Z

Will the .cdf methods for existing distributions be added in this PR, or a new PR altogether?

alicanb · 2018-02-02T18:03:10Z

I just added minimal working example (+tests) to get TruncatedNormal working. With cdfs for all current distributions I think this PR would be too large. What do you think?

vishwakftw · 2018-02-02T18:04:57Z

I thought the same too. Maybe after this is merged, I can start working on the populating PR. Hope that is fine.

alicanb · 2018-02-02T18:06:55Z

You can cherry-pick the first commit to your branch and start working in parallel if you want?

tbrx · 2018-02-02T18:21:10Z

I was actually just thinking about this today and was exploring how disasterous it would be to try to implement a "generic" TruncatedDistribution, where we use inverse transform sampling to generate from it.

Some plots in this gist: https://gist.github.com/tbrx/18e7579d9b7ff7c2a84c17c300555fc1

Basically, it's pretty bad numerically once you are more than four standard deviations away from the mean, on a Gaussian, and falls apart entirely a little past five. This doesn't give me high hopes for e.g. Gamma…

I looked at the Scipy code this morning, and it actually appears to use inverse transform sampling for truncated normals. Higher-precision floating point though means that they can get quite far away from the mean before this is an issue.

fritzo

Looks clean but we need a safer way to do .new()

fritzo · 2018-02-02T18:22:48Z

test/test_distributions.py

+        set_rng_seed(0)  # see Note [Randomized statistical tests]
+        for pytorch_dist, scipy_dist in self.distribution_pairs:
+            samples = pytorch_dist.sample((5,))
+            try:


It's safest to enclose as little as possible in a try-except:

try: pytorch_cdf = pytorch_dist.cdf(samples) except NotImplementedError: pass self.assertEqual(pytorch_cdf, scipy_dist.cdf(samples), message=pytorch_dist)

fritzo · 2018-02-02T18:22:59Z

test/test_distributions.py

+        set_rng_seed(0)  # see Note [Randomized statistical tests]
+        for pytorch_dist, scipy_dist in self.distribution_pairs:
+            samples = Variable(torch.rand((5,) + pytorch_dist.batch_shape))
+            try:


ditto, enclose as little as possible

fritzo · 2018-02-02T18:33:02Z

torch/distributions/truncated_distribution.py

+        super(TruncatedDistribution, self).__init__(*args, **kwargs)
+        self.base_dist = base_distribution
+        self.lower_bound, self.upper_bound, _ = broadcast_all(lower_bound, upper_bound,
+                                                              getattr(self.base_dist,


This looks really dangerous. Why do we need to broadcast? Can we simply set

self.lower_bound = lower_bound self.upper_bound = upper_bound

I was thinking about supporting batched bounds while writing that part, but I gave up that idea later & forgot to change it.

fritzo · 2018-02-02T18:49:04Z

torch/distributions/truncated_distribution.py

+        is a generic sampler which is not the most efficient or accurate around tails of base distribution.
+        """
+        shape = shape = self._extended_shape(sample_shape)
+        u = getattr(self.base_dist, list(self.base_dist.params.keys())[0]).new(shape).uniform_()


This looks dangerous. I wish we had a .new() method to create a correctly-placed tensor from given distribution.

@apaszke Is there an established pattern to do this? Can we define a .new_tensor() method or something? This has been coming up often. Some of our distributions define a private ._new() but we haven't exposed this as a general interface.

It seems simple and safe to define a method-as-property like

class Distribution(object): @property def new_tensor(self): raise NotImplementedError class Normal(Distribution): @property def new_tensor(self): return self.loc.new

We don't have a common pattern except for new on tensors, but we never needed anything else

fritzo · 2018-02-02T18:59:49Z

Re: numerical stability, one option would be to use rejection sampling to draw samples and merely use the cdf derivative to compute reparameterized gradients:

def sample(self):
    ...use rejection sampling...

def rsample(self):
    x = self.sample()  # detached
    cdf = self.cdf(x)
    pdf = self.log_prob(x).exp()
    return x + (cdf.detach() - cdf) / pdf.detach()  # or something like this...

alicanb · 2018-02-02T19:36:37Z

Do you think rejection sampling would be fast enough? Should we write low level functions for it?

…

On Fri, Feb 2, 2018, 2:07 PM Fritz Obermeyer ***@***.***> wrote: Re: numerical stability, one option would be to use rejection sampling to draw samples and merely use the cdf derivative to compute reparameterized gradients: def sample(self): ...use rejection sampling... def rsample(self): x = self.sample() # detached cdf = self.cdf(x) pdf = self.log_prob(x).exp() return x + (cdf.detach() - cdf) / pdf.detach() — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#121 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABCw1tQZo_ATLVbsWMUVQUPmhz8BtYjGks5tQ1slgaJpZM4R3fzz> .

fritzo · 2018-02-02T19:41:57Z

Do you think rejection sampling would be fast enough?

It would be cheap if we rejection sampled when cdf(lower_bound) < 0.02 and used inverse cdf otherwise. This would require four branches: rejection-sample + inverse-cdf for each of the lower_bound and upper_bound. We would not need any low level code; see e.g. pyro.distributions.Rejector.sample().

alicanb · 2018-02-02T20:02:09Z

@fritzo thanks, I'll take a look and try to come up with something.

fritzo · 2018-02-02T22:06:39Z

I think .cdf() methods have many uses outside of TruncatedDistribution, and are worth adding even if this TruncatedDistribution ends up being too tricky to add in PyTorch 0.4

tbrx · 2018-02-03T12:52:17Z

Oh, I completely agree that adding .cdf and its inverse are useful independently of truncated distributions! I'm just wary of releasing a "general" TruncatedDistribution without some caveats, or issuing warnings about lack of precision.

For the truncated normal, example it seems like there are only 58 distinct floating point values between 4.5 and infinity. It seems like if your bounds are within ±4 standard deviations though this would work pretty much fine! Maybe that is the more common case than sampling or evaluating tail probabilities anyway.

alicanb · 2018-02-05T23:55:08Z

Here are 2 gists for TruncatedNormal, one with lower_bound=3 and one with lower_bound=4. ~~Something doesn't add up with rejection sampling~~, @fritzo @tbrx rejection sampling for tails takes a lot of time, most likely due to bad proposal, do you guys have any idea for a "one size fits all" proposal distribution?

fritzo · 2018-02-06T19:50:07Z

Do I understand correctly that the difficult case is when you're truncating e.g. a Normal(0, 1) to its tail like [10, float('inf'))? If inverse-cdf doesn't work here, then I don't know what will.

alicanb · 2018-02-08T18:42:17Z

Here is an updated gist. Sampling from 4.5 sigma looks problematic, but sampling from 4 sigma looks okish

tbrx · 2018-02-08T19:44:52Z

Those plots look good! But I agree, I don't think inverse CDF sampling will work very well for a Normal(0,1) outside of the region [-4, 4] or maybe [-4.5, 4.5] in a pinch…

There are algorithms for sampling from the tail of a gaussian (e.g. on [4, \infty) ) in chapter 9 of http://www.nrbook.com/devroye/. This doesn't help, though, with computing the .log_prob once we get > 5 or so. Sorry I've been busy with a deadline — will try to look at this closer over the weekend or on Monday.

alicanb · 2018-02-12T17:52:04Z

I implemented sampling from tail algorithm, it's fast and looks good! Here's a gist. Precision problem with erf is still there though, not sure how to fix that...

tbrx · 2018-02-13T20:35:26Z

This is cool!! That would work really well for .sample, at least. For .rsample maybe we can handle a loss of precision.

I was wondering if there was a way of maybe directly approximating Phi(x) - Phi(4.0), or something like that, to help compute the denominator…?

EDIT: looking at the these numeric approximations, in particular the fourth, maybe it's possible get approximations for erf(b) - erf(a) by taking the difference of two of these expansions and canceling / re-arranging terms. (All we really need here is a stable way to compute log(erf(b) - erf(a))…)

ragulpr · 2018-02-14T09:02:00Z

I'm very interested in this, I'm working on a similar thing. I'm calling it ConditionalExcessDistribution, but I'm really only looking at right censored (truncated) things at the moment.
Truncation and discretization are closely related (~truncating into many intervals) so I'm currently thinking about an API where this works hand in hand.

alicanb · 2018-03-22T16:04:16Z

New gist time 😄 This time I also have sampling times. https://gist.github.com/alicanb/c9e6567b7c512140ed43916b4dd30106 . At this point I'm inclined towards having TruncatedDistribution only have inverse-cdf sampling implemented, and TruncatedNormal using robert's algorithm. Still working on the precision problem with erf...

fritzo · 2018-03-25T04:31:38Z

I'm inclined towards having TruncatedDistribution only have inverse-cdf sampling implemented, and TruncatedNormal using robert's algorithm

That sounds reasonable, implementing one new generic distribution and one specific special-case distribution. It even makes sense to send them in the same PR.

fritzo

Sorry I let this drop, it would be nice to get it in before 0.4 release.

fritzo · 2018-04-08T15:58:53Z

torch/distributions/truncated_distribution.py

+        self.base_dist = base_distribution
+        cdf_low, cdf_high = self.base_dist.cdf(self.lower_bound), self.base_dist.cdf(self.upper_bound)
+        if sample_method in ['rejection', 'inversion']:
+            self.sample = {'rejection': self._rejection_sample, 'inversion': self._inversion_sample}[sample_method]


This creates a circular reference and leaks memory.

fritzo · 2018-04-08T16:02:05Z

torch/distributions/truncated_distribution.py

+    def event_shape(self):
+        return self.base_dist.event_shape
+
+    def _inversion_sample(self, sample_shape=torch.Size()):


I'm inclined to implement Inversion and Rejection as different classes because the interfaces differ: Inversion allows reparametrization hence allows an .rsample() whereas Rejection is not reparametrizable and hence only implements .sample() (It can be partially reparametrized via RSVI but that requires yet a different interface). Also, Pyro defines a different Rejector class to do rejection sampling given a more general rejection criterion.

I'm inclined to omit rejection sampling from pytorch actually. It's hard to make it work efficiently oob for a range of distributions. What do you think?

I agree, let's omit rejection sampling.

fritzo · 2018-04-08T16:02:51Z

torch/distributions/truncated_normal.py

+    def __init__(self, loc, scale, lower_bound=-float('inf'), upper_bound=float('inf'), sample_method='robert', *args, **kwargs):
+        super(TruncatedNormal, self).__init__(Normal(loc, scale), lower_bound, upper_bound, *args, **kwargs)
+        if sample_method in {'exp', 'robert'}:
+            self.sample = {'exp':self._exp_proposal, 'robert': self._robert_sample}[sample_method]


This creates a circular reference. It's better to simply define an if statement in an .rsample() method.

fritzo · 2018-04-12T23:22:03Z

.jenkins/pytorch/test.sh

@@ -23,6 +23,7 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
    export ASAN_OPTIONS=detect_leaks=0:symbolize=1
    export PYTORCH_TEST_WITH_ASAN=1
    # TODO: Figure out how to avoid hard-coding these paths
+    export ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-5.0/bin/llvm-symbolizer


Looks like diff was tainted.

alicanb added the WIP label Feb 2, 2018

alicanb force-pushed the truncated-dist branch from c5ef53e to 952bf5d Compare February 2, 2018 17:35

fritzo reviewed Feb 2, 2018

View reviewed changes

vishwakftw mentioned this pull request Feb 3, 2018

Add Cumulative Distribution Function, Inverse CDF methods to Distributions #122

Closed

alicanb force-pushed the truncated-dist branch from 0f104ef to a4cee87 Compare February 5, 2018 23:47

martinjankowiak mentioned this pull request Feb 20, 2018

Sampling from truncated Gaussian pyro-ppl/pyro#790

Closed

fritzo suggested changes Apr 8, 2018

View reviewed changes

alicanb force-pushed the truncated-dist branch from 22abd3a to dc8b480 Compare April 11, 2018 17:59

pull upstream/master

e574da2

alicanb force-pushed the truncated-dist branch from dc8b480 to e574da2 Compare April 12, 2018 16:32

fritzo reviewed Apr 12, 2018

View reviewed changes

Merge branch 'master' of github.com:pytorch/pytorch into truncated-dist

36338ee

fritzo mentioned this pull request Oct 23, 2018

Add CensoredDistribution pyro-ppl/pyro#1489

Closed

fehiepsi mentioned this pull request Jan 31, 2019

Wishart / InverseWishart / LKJ priors pyro-ppl/pyro#1692

Closed

alicanb mentioned this pull request Jan 17, 2020

Truncated normal distribution pytorch/pytorch#32293

Open

fehiepsi mentioned this pull request Jan 26, 2021

Support for TruncatedDistribution pyro-ppl/numpyro#895

Closed

fehiepsi mentioned this pull request Feb 12, 2021

Add one-sided/two-sided truncated distribution and cdf/icdf method for univariate symmetric distributions pyro-ppl/numpyro#915

Merged

5 tasks

Implement TruncatedDistribution #121

Are you sure you want to change the base?

Implement TruncatedDistribution #121

Conversation

alicanb commented Feb 2, 2018 • edited Loading

vishwakftw commented Feb 2, 2018

alicanb commented Feb 2, 2018

vishwakftw commented Feb 2, 2018

alicanb commented Feb 2, 2018

tbrx commented Feb 2, 2018

fritzo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fritzo Feb 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fritzo commented Feb 2, 2018 • edited Loading

alicanb commented Feb 2, 2018 via email

fritzo commented Feb 2, 2018 • edited Loading

alicanb commented Feb 2, 2018 • edited Loading

fritzo commented Feb 2, 2018

tbrx commented Feb 3, 2018

alicanb commented Feb 5, 2018 • edited Loading

fritzo commented Feb 6, 2018

alicanb commented Feb 8, 2018

tbrx commented Feb 8, 2018 • edited Loading

alicanb commented Feb 12, 2018 • edited Loading

tbrx commented Feb 13, 2018 • edited Loading

ragulpr commented Feb 14, 2018 • edited Loading

alicanb commented Mar 22, 2018

fritzo commented Mar 25, 2018

fritzo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alicanb commented Feb 2, 2018 •

edited

Loading

fritzo Feb 2, 2018 •

edited

Loading

fritzo commented Feb 2, 2018 •

edited

Loading

fritzo commented Feb 2, 2018 •

edited

Loading

alicanb commented Feb 2, 2018 •

edited

Loading

alicanb commented Feb 5, 2018 •

edited

Loading

tbrx commented Feb 8, 2018 •

edited

Loading

alicanb commented Feb 12, 2018 •

edited

Loading

tbrx commented Feb 13, 2018 •

edited

Loading

ragulpr commented Feb 14, 2018 •

edited

Loading