Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements LKJ distribution which is inspired by the discussions at #1692 and the great initial work by @elbamos. Although we already have #1746, it is good to have two versions to compare and avoid unnecessarily discussions.
Reference
Lewandowski et.
What is implemented
Performance
I test performance to generate with 5000 correlation matrices with rank 80 (which is reported in Lewandowski et.'s paper).
C-compiled with full optimization (reported in Lewandowski et.'s paper):
Matlab 2007b (reported in Lewandowski et.'s paper):
This PR:
JIT of this PR (not available, see below)
As we can see, C-vine method in this PR is faster than the one reported in Lewandowski's paper. However, onion method is slower than that paper, so it has some room to improve.
Some operators of "onion" method are not optimized (e.g. I have to clone tensors here and there just to get samples from uniform distributions over hyperspheres with different dimensions) because I have to support
backward
for it. If we don't needbackward
, then onion time is 8.22s, which is faster than C-vine. Anyway, to generate small dimension (e.g. 5 x 5) matrices, onion is faster than c-vine, so I set "onion" as a default method.It is also worth to see if pytorch's JIT will provide any advantage. Note that this is not a fair comparison because my code is run in a modern system while Lewandowski et.'s paper is in 2009.
Tests
autograd
.rsample
method is tested based on the fact that marginal distribution of each off-diagonal element of sampled correlation matrices is a Beta distribution (modulo a linear transform: x -> 2x - 1).log_prob
method is tested using various facts in Stan reference manual/Lewandowski paper:What is not available in this PR
Some discussions
C-vine/onion methods require D*(D-1)/2 number of parameters, which grows quadratic in D. In problems where D is large, I would recommend users use LowRankMultivariateNormal distribution (which has the form [email protected] + D). This distribution just requires D * (rank + 1) number of parameters. In addition, inference using low rank mvn distribution has time complexity is O(D * rank^2) (thanks to Woodbury formulas).
I am happy to let #1746 finished first, then gradually merge advantage parts (if any) from this PR later if necessary. On the other hand, I am open to any suggestion to improve this PR. This is a great journey of learning indeed.