WIP implementation of multivariate normal distribution by tbrx · Pull Request #52 · probtorch/pytorch

tbrx · 2017-12-28T15:47:05Z

Doesn't support batched covariance matrices at all
Does support arbitrary batch sizes for the mean
Takes a mean argument, plus (either) cov or scale_tril
Uses torch.gesv for computing log_prob; if requires_grad=False then we could do a (cheaper) torch.potrs… probably worth using a solver-helper here like @dwd31415 has in the Pyro PR.

Argument naming convention at the moment is: mean and cov to match scipy.stats.multivariate_normal, and scale_tril to match the Pyro PR.

Test coverage is spotty at the moment (in particular I had some issue with the _gradcheck_log_prob helper), but shapes seem okay and logprob values match scipy.

One question is when we should compute the Cholesky decomposition if passed a cov argument instead of scale_tril. I opted to call it initially up front in the constructor -- we're ultimately going to need it no matter what, either for sampling, or for computing the log determinant in the log_prob or entropy.

…ed covariance matrices at all.

fritzo

Nice! Could you also add a section to docs/source/distributions.rst and maybe cd docs; make html to ensure docs still build (I've caught my own typos this way).

You can also take a look a the recent OneHotCategorical tests, since that is also a "multivariate" distribution with nontrivial event_shape.

fritzo · 2017-12-28T23:35:06Z

test/test_distributions.py

Could you also add some example parameters in EXAMPLES below, ideally one that specifies cov and another that specifies scale_tril?

fritzo · 2017-12-28T23:35:46Z

torch/distributions/multivariate_normal.py

Is it possible to support batched .rsample() by using torch.bmm() here instead of torch.matmul()? I'm not sure what's blocking batched covariance.

That might work for distributions specified via scale_tril as opposed to covariance_matrix. The primary blocker is a batched torch.potrf. We also need a batched solver (batched torch.gesv or otherwise) to compute the log probability.

I think we can maybe do this by using torch.btrifact and torch.btrisolve instead of potrf and gesv. Haven't looked into it yet.

Unfortunately, I didn't realize that torch.btrifact doesn't actually support .backward() calls.

fritzo · 2017-12-28T23:38:45Z

torch/distributions/multivariate_normal.py

nit: It would be nice to stay maximally compatible with Tensorflow.distributions and name this covariance_matrix.

Yeah, I just left this matching the scipy mean and cov because they were so much shorter. Happy to change to loc and covariance_matrix if that is what we've settled on.

Yeah I'd like to keep the interfaces similar if possible, but I defer to your judgement here.

fritzo · 2018-01-18T21:29:05Z

@tbrx How's this going? I might have time this weekend to try add some bits of our Pyro implementation into this branch. I think it's fine to provide a batching-complete interface even if we might need to do some python iteration under the hood for now, until pytorch#4612 merges.

tbrx · 2018-01-19T10:05:17Z

I haven't touched it since the last push — was waiting to see if a batched torch.gesv or batched torch.potrf magically would make its way into upstream master :)

At the moment this actually works fine, with the caveat that a batch shape on the covariance_matrix or scale_tril parameters will throw NotImplementedError.

Adding a version of this which handles batches by using python loops or list comprehensions shouldn't be too difficult…

fritzo · 2018-01-19T18:38:44Z

Oh great, if it already works could we merge it now, then add full batched support in a follow-up PR? It would be nice to help motivate batched linear algebra work in PyTorch by claiming that "if xxx operation were batched then torch.distributions.MultivariateNormal would get batched covariance support for free".

tbrx · 2018-01-20T12:06:58Z

Okay — I think for that, all we need to do is update / expand the tests. If there are any other updates that happened to the Pyro version it would be nice to merge them in too.

Maybe it is worth implementing a slow version with batched covariance matrices first just to the the API correct, though. If the batch size is reasonably small it shouldn't be too slow.

fritzo · 2018-01-20T17:11:07Z

@tbrx It would make @neerajprad 's and my job easier if you could merge this PR soon, simply adding tests and pushing further enhancements to follow-up PRs.

The Pyro team has already migrated to PyTorch distributions and we're working around lack of MultivariateNormal in PyTorch master by building our own PyTorch--style TorchMultivariateNormal and wrapping it as we do with all other PyTorch distributions pyro-ppl/pyro#693 Until there is a MultivariateNormal in PyTorch master, any enhancements from our side will land in our fork of MultivariateNormal.

tbrx · 2018-01-20T17:55:40Z

That makes sense — the lack of a batch dimension on the covariance_matrix doesn't cause issues for you in Pyro if I understand correctly?

I can update this PR and add the remaining tests Monday morning my time.

…onstraint

…ded tests

tbrx · 2018-01-22T14:15:50Z

What sort of constraint should we use here for the loc vector and the support? The existing constraints.real trivially works, but possibly we would want a constraint which also takes into account the event_shape for vector- or matrix- valued distributions.

fritzo · 2018-01-22T16:06:50Z

Yeah, I've been thinking about that. I think we should introduce new constraints:

constraints.real_vector
constraints.positive_definite
constraints.cholesky_triu or constraints.cholesky_tril. We can get rid of constraints.lower_triangular since that fails to capture positivity of the diagonal, and therefore isn't really useful for anything (my mistake!).

Does that seem reasonable? They're simply symbolic placeholders, but we'll use them to register Transforms between constrained and unconstrained spaces.

fritzo · 2018-01-22T16:10:41Z

BTW I've added an issue #99 for implementing a BivariateNormal distribution as a reference implementation that does all linear algebra by hand (e.g. no torch.trtrs). We did this in Pyro and found some silent bugs in our MultivariateNormal gradients due to non-differentiability of some of PyTorch's linear algebra operations. I'd like to have some of these Bivariate-matches-Multivariate tests in PyTorch before release, to ensure we gradients are not being silently corrupted.

fritzo

Looks good so far!

fritzo · 2018-01-22T16:18:07Z

test/test_distributions.py

+        self.assertEqual(MultivariateNormal(mean_multi_batch, cov).sample((2,7)).size(), (2, 7, 6, 5, 3))
+        self.assertEqual(MultivariateNormal(mean, scale_tril=scale_tril).sample((2,7)).size(), (2, 7, 5, 3))
+
+        # check gradients


nice tests!

fritzo · 2018-01-22T16:34:36Z

torch/distributions/multivariate_normal.py

@jwvdm noted that we could generically retrieve params for a distribution if we specified a canonical set of parameters. I've been trying to do this by putting only a single canonical parameterization in the .params dict (e.g. either loc,covariance_matrix or loc,scale_tril but not all three).

But I like what you've done here by adding them all. Maybe we should do that for all distributions and specify canonical_params or something in another field, or just let higher level libraries like Pyro or ProbTorch do that. WDYT?

fritzo · 2018-01-22T16:36:42Z

torch/distributions/multivariate_normal.py

nit: You could simplify via

if (covariance_matrix is None) == (scale_tril is None): raise ValueError(...)

fritzo · 2018-01-22T16:41:46Z

torch/distributions/multivariate_normal.py

+            raise ValueError("Either covariance matrix or scale_tril may be specified, not both.")
+        if covariance_matrix is None and scale_tril is None:
+            raise ValueError("One of either covariance matrix or scale_tril must be specified")
+        if scale_tril is None:


Neeraj made this cool decorator called @lazy_property that could allow you to create scale_tril only if it does not exist, which would avoid unnecessary work in some cases. You could use it as follows:

class MultivariateNormal(Distribution): def __init__(...): ... if scale_tril is not None: self.scale_tril = scale_tril # leave .covariance_matrix unset else: self.covariance_matrix = covariance_matrix # leave .scale_tril unset ... @lazy_property def scale_tril(self): return torch.potrf(self.covariance_matrix, upper=False) @lazy_property def covariance_matrix(self): return torch.mm(scale_tril, scale_tril.t())

fritzo · 2018-01-24T17:02:54Z

@tbrx In pytorch#4771 I've replaced constraints.lower_triangular with the more useful constraints.lower_cholesky that additionally enforces nonnegativity along the diagonal. I've also implemented a LowerCholeskyTransform for this optimizing parameters in this space. Let me know if I've misunderstood anything.

tbrx · 2018-01-24T18:23:10Z

@fritzo Actually… I like leaving it just as constraints.lower_triangular. While it is true that a scale_tril matrix generated by a Cholesky decomposition of a positive definite matrix would have positive entries on the diagonal, it's not required when specifying scale_tril. Any lower triangular matrix with nonzero entries along the diagonal should be enough for us.

The main (potential) problem would be in computing the determinant of the covariance matrix. But that's actually fine. Here's an example lower-triangular matrix with a negative entry on the diagonal:

> L = torch.Tensor([[ 1.0,  0.0, 0.0],
                    [-2.0, -1.0, 0.0],
                    [ 0.5,  0.5, 0.5]])

We can use this to get a covariance matrix, whose Cholesky decomposition is of course different:

> cov = torch.matmul(L, L.t())
> chol = torch.potrf(cov, upper=False)
> chol

 1.0000  0.0000  0.0000
-2.0000  1.0000  0.0000
 0.5000 -0.5000  0.5000
[torch.FloatTensor of size 3x3]

The determinant of this covariance matrix is 0.25. We can get this from the Cholesky decomposition by

> chol.diag().prod()**2

but this is the same as

> L.diag().prod()**2

That said, I believe the current MVN code actually handles scale_tril with negative diagonal entries incorrectly (yielding a nan); I will fix this.

It seems to me one nice use case of scale_tril (as opposed to covariance_matrix) for specifying a multivariate gaussian is that it does not require satisfying the sometimes-annoying PSD constraint, so it is easier to use the scale_tril parameterization directly on the output of some neural network layer. Requiring the diagonal to be positive makes that less natural.

fritzo · 2018-01-24T20:06:27Z

it is easier to use the scale_tril parameterization directly on the output of some neural network layer.

In my very limited experience, it is important to ensure positive definiteness rather than merely semidefiniteness (sorry if I've messed this up in constraints.cholesky_lower). This is easy to do by ensuring the diagonal entries are all strictly positive via

u = Variable(torch.Tensor(4, 4).normal_(), requires_grad=True)  # optimize this
scale_tril = u.tril(-1) + u.diag().exp().diag()

If you merely instead define

scale_tril = u.tril()

then optimization will often pass though a hyperplane of singular matrices, i.e. where one of the scale_tril diagonal entries is zero. This .exp() trick is the same approach you'd take when learning a univariate gaussian from a neural network, setting std = network_output.exp() rather than network_output.abs().

I'm happy to add constraints.lower_diagonal back in (EDIT done!), but I think in Pyro we'll favor the optimization-safe constraints.cholesky_lower (we can just wrap the PyTorch version to use a stricter constraint).

tbrx · 2018-01-24T22:20:40Z

I'm not sure we should actually change it back! Just wanted to discuss. I actually agree with you that the PSD vs PD bit is probably more crucial. In that case I want to confirm I guess that constraints.cholesky_lower would enforce positivity of the diagonal rather than just nonnegativity.

I agree it is nice (generally) to have the scale_tril parameter be something that could plausibly be the result of a Cholesky decomposition.

And actually, your code snippet may have convinced me that this isn't a problem. The u.diag().exp().diag() bit is less ugly than I thought it would be — I'd be up for (say) including your code snippet above in documentation or examples somewhere, demonstrating example usage of scale_tril.

My one remaining concern though is what happens when we (eventually) update the MVN to support batching for covariance_matrix and scale_tril. Is there a "batched" version of Tensor.diag()?

fritzo · 2018-01-24T22:45:19Z

Is there a "batched" version of Tensor.diag()?

There is no batch support yet, and CholeskyLowerTransform currently raises a NotImplementedError. However there are easy workarounds, for example this should work for batched matrices:

def cholesky_lower_transform(x):
    if x.dim() == 2:
        return x.tril(-1) + x.diag().exp().diag()
    else:
        n = x.size(-1)
        diag = torch.eye(n, out=x.new(n, n))
        arange = torch.arange(n, out=x.new(n))
        tril = (arange.unsqueeze(-1) > arange.unsqueeze(0)).float()
        return x * tril + x.exp() * diag

This probably suffers from NAN issues, but the general idea should work.

fritzo · 2018-01-24T22:50:50Z

BTW The u.diag().exp().diag() snippet is implemented in CholeskyLowerTransform and should soon be available via constraint registration as

u = Variable(torch.Tensor(100, 100).normal_(), requires_grad=True)
scale_tril = to_constrained(constraints.cholesky_lower)(u)

or even

scale_tril = to_constrained(dist.params['scale_tril'])(u)

I'm really looking forward to using this in Pyro 😄

…ay computation until after init

tbrx · 2018-01-26T18:21:28Z

I believe the primary blocker to moving upstream at this point is the constraints, and a decision on the params attribute — should we wait on this until constraints.lower_cholesky is merged in as part of pytorch#4771? Or should I add implementations of _LowerCholesky (and maybe _RealVector) and assume we can merge later?

Alternatively, we could wait for pytorch#4771 and then include both this and BivariateNormal in the same pull request. It would be nice to have tests for this which reference BivariateNormal, and BivariateNormal will also need the same additional constraints.

fritzo · 2018-01-26T18:32:03Z

I'd recommend adding an implementation of _LowerCholesky and maybe _RealVector in this PR and mentioning in the PR description that this is waiting on pytorch#4771. Ideally we could send PRs serially, but I'm more worried about missing feature freeze deadline 😄

tbrx · 2018-01-27T21:09:31Z

Is there anything I'm missing here (particularly in terms of test coverage…)? Otherwise, I'd be up for sending this PR upstream.

fritzo

Looks ready to send upstream after one minor doc fix.

Re: testing, I think the strongest tests will be provided once we have a "by hand" bivariate normal distribution. We'll also be using this in Pyro right away; this should give us a little time to look for bugs and weird behavior before PyTorch release.

fritzo · 2018-01-28T17:34:38Z

docs/source/distributions.rst

    :members:

+:hidden:`MultivariateNormal`
+~~~~~~~~~~~~~~~~~~~~~~~


nit: underline is too short and will break docs. You can run make -C docs html and open docs/build/html/index.html to check docs.

tbrx · 2018-01-29T11:04:16Z

After merging in the latest master, I'm getting a test failure for the Pareto distribution, on test_entropy_monte_carlo, on example 2/3. Is this expected?

Visually the results look "okay" for most entries — the max error reported is 0.349, which is on a value of 58.xxx.

fritzo · 2018-01-29T16:46:11Z

@tbrx That failure is not expected. Can you make sure you've rebuilt with python setup.py build develop?

tbrx · 2018-01-29T17:27:05Z

So, it seems that the monte carlo test for Pareto entropy is just very sensitive…

If I change the ordering of EXAMPLES so that MultivariateNormal comes after Pareto, then the entropy test for Pareto runs using the same random number state as it currently uses on master --- in which case the test passes just fine.

…ers for multivariate normal.

tbrx · 2018-01-30T20:04:23Z

In working on the BivariateNormal #99 I started writing helpers for working with torch linear algebra functions, and realized that actually it would be hardly more work to port and implement these here. So, I updated this to support actual batching on covariance matrices and the scale_tril parameter for MultivariateNormal #1 as well.

Would appreciate feedback, particularly on whether I handled the "batch-friendly" matrix constraints correctly, and whether I am missing anything with the linear algebra helpers I added to torch/distributions/multivariate_normal.py. I added additional tests (which pass), but may have missed something.

Obviously the current implementation is not ideal, speed-wise:

We don't have a batched version of torch.potrf, so this resorts to a python list comprehension;
We don't have a batched version of either torch.potrs or torch.trtrs, and furthermore, neither of those currently have backwards methods implemented. The workaround at the moment actually explicitly calls torch.inverse. If either torch.potrs or torch.trtrs receive a .backward implementation then we can plug them in and have something more efficient (but still with python list comprehensions).

fritzo

The helpers look reasonable, and I like that they abstract out the mess and make MultivariateNormal methods more readable.

I'd love to have this in master soon so we can "kick its tires" and get any fixes into PyTorch 0.4 release. E.g. it would help to have other multivariate distributions for testing batch shapes of Transforms.

fritzo · 2018-01-30T21:59:10Z

torch/distributions/multivariate_normal.py

nit: Use r""" rather than """ to open docstrings that contain backslashes

fritzo · 2018-01-30T22:00:59Z

torch/distributions/multivariate_normal.py

+    dims = torch.arange(n, out=bmat.new(n)).long()
+    if isinstance(dims, Variable):
+        dims = dims.data # TODO: why can't I index with a Variable?
+    return bmat[...,dims,dims]


fritzo · 2018-01-30T22:04:07Z

torch/distributions/multivariate_normal.py

+        return -0.5*(M + self.loc.size(-1)*math.log(2*math.pi)) - log_det
+
+    def entropy(self):
+        log_det = _batch_diag(self.scale_tril).abs().log().sum(-1)


Hmm, shouldn't this already have the correct shape? Why do you need to H.expand(self._batch_shape) below?

fritzo · 2018-01-30T22:08:05Z

torch/distributions/multivariate_normal.py

Before you send upstream, consider replacing with something more diplomatic 😉

conform to torch.bmm which requires .dim() == 3

tbrx · 2018-01-30T22:39:24Z

Great, thanks @fritzo ! I'll (finally!) make a new pull request upstream.

tbrx added 2 commits December 28, 2017 07:22

WIP commit of multivariate normal distribution. Doesn't support batch…

e02aab3

…ed covariance matrices at all.

Check MVN shape for tensors as well as variables

9616c0b

fritzo reviewed Dec 28, 2017

View reviewed changes

tbrx added 2 commits January 2, 2018 15:25

Merge branch 'master' into multivariate-normal

b1fade1

Added MVN to docs

89c831a

fritzo mentioned this pull request Jan 4, 2018

Improve Gradients of MultivariateNormal.batch_log_pdf pyro-ppl/pyro#656

Closed

tbrx added 2 commits January 19, 2018 11:45

Merge branch 'master' into multivariate-normal

94c1ad0

Renamed arg cov -> covariance_matrix

9ce92b3

This was referenced Jan 19, 2018

MultivariateNormal.batch_log_pdf() has incorrect gradient pyro-ppl/pyro#685

Closed

Implement BivariateNormal distribution #99

Open

fritzo mentioned this pull request Jan 20, 2018

Refactor and simplify MultivariateNormal distribution pyro-ppl/pyro#693

Merged

tbrx added 3 commits January 22, 2018 13:53

Merge branch 'master' into multivariate-normal

afe96ba

Added a PositiveDefinite constraint and updated the LowerTriangular c…

8455725

…onstraint

Updated multivariate normal: fixes for entropy, added constraints, ad…

522527a

…ded tests

fritzo reviewed Jan 22, 2018

View reviewed changes

tbrx added 2 commits January 25, 2018 16:55

Merge branch 'master' into multivariate-normal

4a73b8c

Nits and minor __init__ cleanup; uses @lazy_property decorator to del…

987e87d

…ay computation until after init

tbrx added 2 commits January 27, 2018 21:03

Added constraints for Cholesky and RealVector

2189d24

(typo fix)

25400ed

fritzo mentioned this pull request Jan 28, 2018

Add support for inverse and logdet in MultivariateNormal distribution pyro-ppl/pyro#716

Closed

fritzo reviewed Jan 28, 2018

View reviewed changes

tbrx added 2 commits January 29, 2018 10:34

Updated MVN docs

b045a18

Merge branch 'master' into multivariate-normal

54750e1

tbrx added 2 commits January 29, 2018 17:17

Merge branch 'master' into multivariate-normal

63f49f5

This fixes the test error for Pareto entropy...

49e764f

fritzo mentioned this pull request Jan 30, 2018

Implemented Relaxed Distributions #113

Closed

2 tasks

tbrx added 2 commits January 30, 2018 17:18

Batch-friendly versions of matrix constraints

8787d17

Added support for batching covariance matrices and scale_tril paramet…

8f22e4f

…ers for multivariate normal.

fritzo reviewed Jan 30, 2018

View reviewed changes

Raw string for docstring

503fa14

tbrx closed this Jan 30, 2018

tbrx mentioned this pull request Jan 30, 2018

Added an implementation of a multivariate normal distribution pytorch/pytorch#4950

Merged

Conversation

tbrx commented Dec 28, 2017

Uh oh!

fritzo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo commented Jan 18, 2018

Uh oh!

tbrx commented Jan 19, 2018

Uh oh!

fritzo commented Jan 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tbrx commented Jan 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fritzo commented Jan 20, 2018

Uh oh!

tbrx commented Jan 20, 2018

Uh oh!

tbrx commented Jan 22, 2018

Uh oh!

fritzo commented Jan 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fritzo commented Jan 22, 2018

Uh oh!

fritzo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo commented Jan 24, 2018

Uh oh!

tbrx commented Jan 24, 2018

Uh oh!

fritzo commented Jan 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tbrx commented Jan 24, 2018

Uh oh!

fritzo commented Jan 24, 2018

Uh oh!

fritzo commented Jan 24, 2018

Uh oh!

tbrx commented Jan 26, 2018

Uh oh!

fritzo commented Jan 26, 2018

Uh oh!

tbrx commented Jan 27, 2018

Uh oh!

fritzo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tbrx commented Jan 29, 2018

Uh oh!

fritzo commented Jan 29, 2018

Uh oh!

fritzo commented Jan 19, 2018 •

edited

Loading

tbrx commented Jan 20, 2018 •

edited

Loading

fritzo commented Jan 22, 2018 •

edited

Loading

fritzo commented Jan 24, 2018 •

edited

Loading