Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement LKJ distribution #1753

Closed
wants to merge 6 commits into from
Closed

Implement LKJ distribution #1753

wants to merge 6 commits into from

Conversation

fehiepsi
Copy link
Member

This PR implements LKJ distribution which is inspired by the discussions at #1692 and the great initial work by @elbamos. Although we already have #1746, it is good to have two versions to compare and avoid unnecessarily discussions.

Reference

Lewandowski et.

What is implemented

  • C-vine method and Onion method. Although Stan uses C-vine method, I set "onion" by default because it seems faster (also verified in Lewandowski et.'s paper). Both methods give the same marginal distribution (verified in the test).
  • Every transform is vectorized, which is the main difference to Distribution for Cholesky Factor of LKJ Prior #1746. This has a trade-off that some operators (e.g. cumsum, cumprod) are applied unnecessarily in the upper triangular part of a matrix. So it is worth to wait for Distribution for Cholesky Factor of LKJ Prior #1746 completed to make a comparison.
  • Signed version of StickBreakingTransform. Because each row of Cholesky matrices has a unit Euclidean length, we need to use a stick-breaking style to transform it. Details on how it is implemented is available in the docs.

Performance

I test performance to generate with 5000 correlation matrices with rank 80 (which is reported in Lewandowski et.'s paper).

C-compiled with full optimization (reported in Lewandowski et.'s paper):

  • C-vine: 13.4s
  • Onion: 4.33s

Matlab 2007b (reported in Lewandowski et.'s paper):

  • C-vine: 146.4s
  • Onion: 82.4s

This PR:

  • C-vine: 9.66s
  • Onion: 10s

JIT of this PR (not available, see below)

As we can see, C-vine method in this PR is faster than the one reported in Lewandowski's paper. However, onion method is slower than that paper, so it has some room to improve.

Some operators of "onion" method are not optimized (e.g. I have to clone tensors here and there just to get samples from uniform distributions over hyperspheres with different dimensions) because I have to support backward for it. If we don't need backward, then onion time is 8.22s, which is faster than C-vine. Anyway, to generate small dimension (e.g. 5 x 5) matrices, onion is faster than c-vine, so I set "onion" as a default method.

It is also worth to see if pytorch's JIT will provide any advantage. Note that this is not a fair comparison because my code is run in a modern system while Lewandowski et.'s paper is in 2009.

Tests

  • The transform is bijective, log_det_abs_jacobian is tested against using autograd.
  • rsample method is tested based on the fact that marginal distribution of each off-diagonal element of sampled correlation matrices is a Beta distribution (modulo a linear transform: x -> 2x - 1).
  • log_prob method is tested using various facts in Stan reference manual/Lewandowski paper:
    • When concentration is 1, the distribution of generated correlation matrices is uniform.
    • LKJ for Cholesky is a transformed distribution, which base_distribution is a beta distribution and transform is a "signed" version of StickBreakingTransform. The transform is bijective, so we can test against the log_prob of that transformed distribution.
  • This gist shows that MCMC works without problem with the example in Distribution for Cholesky Factor of LKJ Prior #1746. Thanks @elbamos for providing it.

What is not available in this PR

  • JIT support: it seems that JIT does not support some operators in this PR. But this can be addressed in a later PR.

Some discussions

C-vine/onion methods require D*(D-1)/2 number of parameters, which grows quadratic in D. In problems where D is large, I would recommend users use LowRankMultivariateNormal distribution (which has the form [email protected] + D). This distribution just requires D * (rank + 1) number of parameters. In addition, inference using low rank mvn distribution has time complexity is O(D * rank^2) (thanks to Woodbury formulas).

I am happy to let #1746 finished first, then gradually merge advantage parts (if any) from this PR later if necessary. On the other hand, I am open to any suggestion to improve this PR. This is a great journey of learning indeed.

neerajprad and others added 6 commits February 9, 2019 19:58
* Fix unit tests failing on CUDA

* add xfail marker for baseball cuda example
* Use pytorch 1.0.1 in travis CI

* add typing as dependency

* revert typing dep

* use updated wheels

* install typing dep in .travis

* fix wheel names; revert to earlier smaller sized wheels

* update torch whl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants