Implement LKJ distribution #1753

fehiepsi · 2019-02-13T01:56:50Z

This PR implements LKJ distribution which is inspired by the discussions at #1692 and the great initial work by @elbamos. Although we already have #1746, it is good to have two versions to compare and avoid unnecessarily discussions.

Reference

Lewandowski et.

What is implemented

C-vine method and Onion method. Although Stan uses C-vine method, I set "onion" by default because it seems faster (also verified in Lewandowski et.'s paper). Both methods give the same marginal distribution (verified in the test).
Every transform is vectorized, which is the main difference to Distribution for Cholesky Factor of LKJ Prior #1746. This has a trade-off that some operators (e.g. cumsum, cumprod) are applied unnecessarily in the upper triangular part of a matrix. So it is worth to wait for Distribution for Cholesky Factor of LKJ Prior #1746 completed to make a comparison.
Signed version of StickBreakingTransform. Because each row of Cholesky matrices has a unit Euclidean length, we need to use a stick-breaking style to transform it. Details on how it is implemented is available in the docs.

Performance

I test performance to generate with 5000 correlation matrices with rank 80 (which is reported in Lewandowski et.'s paper).

C-compiled with full optimization (reported in Lewandowski et.'s paper):

C-vine: 13.4s
Onion: 4.33s

Matlab 2007b (reported in Lewandowski et.'s paper):

C-vine: 146.4s
Onion: 82.4s

This PR:

C-vine: 9.66s
Onion: 10s

JIT of this PR (not available, see below)

As we can see, C-vine method in this PR is faster than the one reported in Lewandowski's paper. However, onion method is slower than that paper, so it has some room to improve.

Some operators of "onion" method are not optimized (e.g. I have to clone tensors here and there just to get samples from uniform distributions over hyperspheres with different dimensions) because I have to support backward for it. If we don't need backward, then onion time is 8.22s, which is faster than C-vine. Anyway, to generate small dimension (e.g. 5 x 5) matrices, onion is faster than c-vine, so I set "onion" as a default method.

It is also worth to see if pytorch's JIT will provide any advantage. Note that this is not a fair comparison because my code is run in a modern system while Lewandowski et.'s paper is in 2009.

Tests

The transform is bijective, log_det_abs_jacobian is tested against using autograd.
rsample method is tested based on the fact that marginal distribution of each off-diagonal element of sampled correlation matrices is a Beta distribution (modulo a linear transform: x -> 2x - 1).
log_prob method is tested using various facts in Stan reference manual/Lewandowski paper:
- When concentration is 1, the distribution of generated correlation matrices is uniform.
- LKJ for Cholesky is a transformed distribution, which base_distribution is a beta distribution and transform is a "signed" version of StickBreakingTransform. The transform is bijective, so we can test against the log_prob of that transformed distribution.
This gist shows that MCMC works without problem with the example in Distribution for Cholesky Factor of LKJ Prior #1746. Thanks @elbamos for providing it.

What is not available in this PR

JIT support: it seems that JIT does not support some operators in this PR. But this can be addressed in a later PR.

Some discussions

C-vine/onion methods require D*(D-1)/2 number of parameters, which grows quadratic in D. In problems where D is large, I would recommend users use LowRankMultivariateNormal distribution (which has the form [email protected] + D). This distribution just requires D * (rank + 1) number of parameters. In addition, inference using low rank mvn distribution has time complexity is O(D * rank^2) (thanks to Woodbury formulas).

I am happy to let #1746 finished first, then gradually merge advantage parts (if any) from this PR later if necessary. On the other hand, I am open to any suggestion to improve this PR. This is a great journey of learning indeed.

* Fix unit tests failing on CUDA * add xfail marker for baseball cuda example

* Use pytorch 1.0.1 in travis CI * add typing as dependency * revert typing dep * use updated wheels * install typing dep in .travis * fix wheel names; revert to earlier smaller sized wheels * update torch whl

neerajprad and others added 6 commits February 9, 2019 19:58

Fix unit tests failing on CUDA (#1741)

bdb6838

* Fix unit tests failing on CUDA * add xfail marker for baseball cuda example

Use pytorch 1.0.1 in travis CI (#1738)

2a469ad

* Use pytorch 1.0.1 in travis CI * add typing as dependency * revert typing dep * use updated wheels * install typing dep in .travis * fix wheel names; revert to earlier smaller sized wheels * update torch whl

Improve sparse DirichletMultinomial.log_prob() (#1743)

ef83ebf

add paper ref to hmm (#1748)

7ff5e90

Fix failing von mises tests on CUDA (#1747)

0c5ab1d

Update README with links to the new repo (#1749)

7d95fdf

fehiepsi closed this Feb 13, 2019

neerajprad mentioned this pull request Feb 22, 2019

Wishart / InverseWishart / LKJ priors #1692

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement LKJ distribution #1753

Implement LKJ distribution #1753

fehiepsi commented Feb 13, 2019

Implement LKJ distribution #1753

Implement LKJ distribution #1753

Conversation

fehiepsi commented Feb 13, 2019

Reference

What is implemented

Performance

C-compiled with full optimization (reported in Lewandowski et.'s paper):

Matlab 2007b (reported in Lewandowski et.'s paper):

This PR:

JIT of this PR (not available, see below)

Tests

What is not available in this PR

Some discussions