Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add multiplier keyword to unit vector #3105

Open
spinkney opened this issue Sep 20, 2024 · 3 comments
Open

add multiplier keyword to unit vector #3105

spinkney opened this issue Sep 20, 2024 · 3 comments
Assignees

Comments

@spinkney
Copy link
Collaborator

spinkney commented Sep 20, 2024

In https://discourse.mc-stan.org/t/a-better-unit-vector/26989/30 Seth Axen lays out the reasoning for adding an additional parameter which repels values away from 0.

The current implementation is equivalent to

data {
  int<lower=0> N;
}
parameters {
  vector[N] u_raw;
}
transformed parameters {
  real r = dot_self(u_raw);
  vector[N] u = u_raw / sqrt(r);
}
model {
  target += -0.5 * r;
}

The proposal is to parameterize the unit vector as

data {
  int<lower=0> N;
}
parameters {
  vector[N] u_raw;
}
transformed parameters {
  real r = dot_self(u_raw);
  vector[N] u = u_raw / sqrt(r);
}
model {
 target += 0.5 * (a * log(r) - r);
}

where a >= 0 and is given as a multiplier keyword. As Seth notes in the post, it corresponds to the chi distribution with a + 1 degrees of freedom. The user would see:

unit_vector<multiplier=a>[N] u;

Setting a = 0 is equivalent to what is currently in Stan and will still be the default when the multiplier keyword is not used.

How to select a?

From the same thread the issue manifests with smaller dimension sizes of the unit vector. Heuristically, I'm finding that setting a = N^(6/N) where N is the dimension of the unit-vector works well.

         N a
 [1,]    1 1.000000
 [2,]    2 8.000000
 [3,]    3 9.000000
 [4,]    4 8.000000
 [5,]    5 6.898648
 [6,]    6 6.000000
 [7,]    7 5.301146
 [8,]    8 4.756828
 [9,]    9 4.326749
[10,]   10 3.981072

Here's a test model for 2d unit vectors where divergences occur in the current parameterization. When the data size, M, is small there are fewer divergences then when it's larger (M > 50).

data {
  int<lower=0> M;
  vector<lower=-pi(), upper=pi()>[M] y;
  real a;
}
parameters {
  vector[2] u_raw;
  real<lower=0> kappa;
}
transformed parameters {
  real r = dot_self(u_raw);
  vector[2] u = u_raw / sqrt(r);
  real mu = atan2(u[2], u[1]);
}
model {
  target += 0.5 * (a * log(r) - r);
  kappa ~ exponential(1);
  y ~ von_mises(mu, kappa);
}

The data for this can be created as

M <- 100
y <- runif(M, min = -pi/4, max = pi/4)
true_mean <- 0

mod_unit <- cmdstan_model("unit.stan")

mod_unit_out <- mod_unit$sample(
  data = list(M = M,
              y = y,
              a = 2^(6/2),
  seed = 2309423,
  parallel_chains = 4
)
@WardBrian
Copy link
Member

If I'm reading this correctly, I think using the keyword multiplier for this would be a mistake. I think that would be interpreted by most users as some sort of scaling on vector itself, such that e.g unit_vector<multiplier=4> would yield a vector that has norm 4

@SteveBronder
Copy link
Collaborator

SteveBronder commented Sep 20, 2024

From the comment

This repels y from y = 0, and the degree of repulsion can be tuned by the user by increasing a.

unit_vector<repulsion=...>?

From the code, a is added to the jacobian.
target += -r^2/2 + a * log(r);

Since it's not applied to the parameters idt, bias, scale, or multiplier fit right with the rest of the constrain keywords.

@spinkney
Copy link
Collaborator Author

I'm ok with it just being something simple like $k$ as in unit_vector<k=...> we note that this is related to the chisquare distribution degrees of freedom. The log-pdf of the chisquare distribution is

$$ \left(\frac{k}{2} - 1 \right) \log(x) - \frac{x}{2} + C $$

What I have above is 0.5 * (a * log(r) - r). Clearly, $r == x$ thus

$$ \begin{aligned} \frac{a}{2} &= \frac{k}{2} - 1 \\ k &= a + 2. \end{aligned} $$

When a = 0 this corresponds to 2 degrees of freedom, ie the sum of the square of two standard normal variates. When k is allowed to be non-integer values then this is a gamma RV with

$$ f(x) = \Gamma \left( \frac{k}{2}, \frac{1}{2} \right) $$

The mean of this is $\mu = k / 4$ and the variance is $\sigma^2 = k / 8$. As $k$ increases both the mean and the variance increase. But this doesn't effect the estimate of the unit_vector (unless sampling just fails because we're at the singularity).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants