diff --git a/misc/nips.md b/misc/nips.md deleted file mode 100644 index 630ea5f0..00000000 --- a/misc/nips.md +++ /dev/null @@ -1,139 +0,0 @@ ---- -layout: page -title: Stan@NIPS2015 -excerpt: "" -modified: -image: - feature: feature/wide_ensemble.png - credit: - creditlink: ---- - -Stan will be well-represented at NIPS2015. Come see a talk -or let us know about any cool projects involving Stan and you -might even score a free sticker! - -Stan 2.9.0 is out! [(Release Notes)](https://github.com/stan-dev/stan/releases/tag/v2.9.0) -See the installation [instructions below](#installation). - -Conference -====== - -Mon Dec 7th 07:00 - 11:59 PM: -**Dustin Tran** will present a poster (#37) on new -variational inference methods in **Room 210C**. - -Tue Dec 8th 03:30 - 04:00 PM: -**Alp Kucukelbir** will speak about _Automatic Variational -Inference in Stan_ in **Room 210A**. - -Tue Dec 8th 07:00 - 11:59 PM: -**Alp** will also present a corresponding poster (#34) in -**Room 210C**. - - -Workshops -====== - -[Workshop on Adaptive Data Analysis](http://wadapt.org) | -Friday 11 December - -04:30 - 5:00 PM: -**Andrew Gelman** will speak about reproducibility. - -[Advances in Approximate Bayesian Inference](http://approximateinference.org) | -Friday 11 December - -09:15 - 10:00 AM: -**Alp** and **Michael Betancourt** will be on the _Tricks of the Trade_ panel. - -04:55 - 06:00 PM: -**Andrew** will be on the _Foundations and Future of Approximate Inference_ panel. - -[Scalable Monte Carlo](http://babaks.github.io/ScalableMonteCarlo/) | -Saturday 12 December - -09:00 - 09:40 AM: -**Andrew** will speak about his _Adventures on the Efficient Frontier_. - -2:40 - 3:30 PM: -**Michael** will be on the panel. - -[Black Box Learning and Inference](http://www.blackboxworkshop.org) | -Saturday 12 December - -10:50 - 11:35 AM: -**Michael** will speak about Stan during the language spotlight session. - -11:35 AM - 12:20 PM: -**Michael** will be on the languages panel. - -2:52 - 3:15 PM: -**Alp** will speak about automatic differentiation variational inference. - -3:15 - 3:37 PM: -**Dustin** will speak about recent advances in variational inference. - - -Installation -============ - -#### CmdStan - -CmdStan 2.9.0 is released and ready to -[download](http://mc-stan.org/interfaces/cmdstan.html). - -ADVI data subsampling lives in an unsupported/untested branch. -See instructions [here](https://github.com/stan-dev/stan/blob/adsvi/how_to_ADSVI.md). - -#### RStan - -The `develop` branch of RStan is up to date with Stan 2.9.0. - -To install RStan 2.8.2, first follow the instructions -[here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started). - -Then use the following commands to start using the `develop` branch: - - if(!require(devtools)) install.packages("devtools") - if(!require(devtools)) install.packages("git2r") - - path_rstan <- tempfile(pattern = "git2r-") - dir.create(path_rstan) # requires recent version of R; may work without this line - git2r::clone("http://github.com/stan-dev/rstan", path_rstan, branch = "develop") - git2r::clone("http://github.com/stan-dev/stan", - file.path(path_rstan, "StanHeaders", "inst", "include", "upstream"), - branch = "master") - git2r::clone("http://github.com/stan-dev/math", - file.path(path_rstan, "StanHeaders", "inst", "include", "mathlib"), - branch = "master") - devtools::install(file.path(path_rstan, "StanHeaders")) - devtools::install(file.path(path_rstan, "rstan", "rstan")) - - library(rstan) - args(vb) - - devtools::install_github("stan-dev/rstanarm", local = FALSE) - library(rstanarm) - args(stan_glmer) - - -#### PyStan - -_Stay tuned!_ - - -#### Stan.jl, MatlabStan, and StataStan - -See the [interfaces](interfaces) page for more details! - - - - - - - - - - - diff --git a/misc/warnings.html b/misc/warnings.html deleted file mode 100644 index d6f16a15..00000000 --- a/misc/warnings.html +++ /dev/null @@ -1,575 +0,0 @@ - - - - -
- - - - - - - - - - -Stan emits a lot of messages, some of which are warnings. In this document we go over why this happens, explain some of the most common warnings, and provide some tips about how to proceed depending on the particular warnings you’re seeing.
-While reading this document it is important to keep in mind that there is no statistical algorithm that is guaranteed to get the right results on all models. Markov chain Monte Carlo is no exception. Although it is often advertised as being “unbiased”, this is not a guarantee unless the Markov chains are infinitely long. Whether or not MCMC yields good values in finite time is a much more challenging problem.
-Preruntime warnings occur before any estimation is performed, although if you call stan()
these preruntime warnings may only appear slightly before the estimation.
Parser warnings occur when you call stanc()
directly, or indirectly by calling stan()
with either its file
or model_code
argument specified. Parser warnings take two forms:
Deprecation warnings may look something like
---DIAGNOSTIC(S) FROM PARSER: Warning (non-fatal): increment_log_prob(…); is deprecated and will be removed in the future. Use target += …; instead. Warning: Deprecated function ‘normal_log’; please replace suffix ’_log’ with ’_lpdf’ for density functions or ’_lpmf’ for mass functions
-
These warnings indicate that, although your Stan syntax is valid, some construction(s) you are using will be removed in a future version of Stan. Thus, you should update your Stan program to the suggested syntax in order to better ensure that your Stan program will continue to run in the future.
-Jacobian warnings may look something like
---DIAGNOSTIC(S) FROM PARSER: Warning (non-fatal): Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable. If so, you need to call target += with the log absolute determinant of the Jacobian of the transform. Left-hand-side of sampling statement: sqrt(y) ~ gamma(…)
-
If you fail to heed this warning, the posterior distribution Stan will sample from is not necessarily the posterior distribution that you have in mind. The only situation in which you can ignore this warning is when you are sure that the determinant of the Jacobian matrix of the transformation depends only on constants. For example, target += exponential(y - 1 | 1);
does not require a Jacobian adjustment because if \(z = y - 1\), then \(y = z + 1\) and \(\frac{\partial y}{\partial z} = 1\) is a constant.
There is one category of warnings that are essentially useless: compiler warnings. Compiler warnings occur when you call stan_model()
directly or indirectly by calling stan()
with its file
or model_code
argument specified. They may look something like
--In file included from /usr/local/lib/R/site-library/BH/include/boost/multi_array/base.hpp:28:0, from /usr/local/lib/R/site-library/BH/include/boost/multi_array.hpp:21, from /usr/local/lib/R/site-library/BH/include/boost/numeric/odeint/util/multi_array_adaption.hpp:29, from /usr/local/lib/R/site-library/BH/include/boost/numeric/odeint.hpp:61, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/prim/arr/functor/integrate_ode_rk45.hpp:13, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/prim/arr.hpp:36, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/prim/mat.hpp:235, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:9, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4, from file3d155eade6dd.cpp:8: /usr/local/lib/R/site-library/BH/include/boost/multi_array/concept_checks.hpp: In static member function ‘static void boost::multi_array_concepts::detail::idgen_helper
-::call(Array&, const IdxGen&, Call_Type)’: /usr/local/lib/R/site-library/BH/include/boost/multi_array/concept_checks.hpp:42:43: warning: typedef ‘index_range’ locally defined but not used [-Wunused-local-typedefs] typedef typename Array::index_range index_range; ^ /usr/local/lib/R/site-library/BH/include/boost/multi_array/concept_checks.hpp:43:37: warning: typedef ‘index’ locally defined but not used [-Wunused-local-typedefs] typedef typename Array::index index; ^ /usr/local/lib/R/site-library/BH/include/boost/multi_array/concept_checks.hpp: In static member function ‘static void boost::multi_array_concepts::detail::idgen_helper<0ul>::call(Array&, const IdxGen&, Call_Type)’: /usr/local/lib/R/site-library/BH/include/boost/multi_array/concept_checks.hpp:53:43: warning: typedef ‘index_range’ locally defined but not used [-Wunused-local-typedefs] typedef typename Array::index_range index_range; ^ /usr/local/lib/R/site-library/BH/include/boost/multi_array/concept_checks.hpp:54:37: warning: typedef ‘index’ locally defined but not used [-Wunused-local-typedefs] typedef typename Array::index index; ^ In file included from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/operator_unary_plus.hpp:7:0, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:34, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4, from file3d155eade6dd.cpp:8: /usr/local/lib/R/site-library/StanHeaders/include/stan/math/prim/scal/fun/constants.hpp: At global scope: /usr/local/lib/R/site-library/StanHeaders/include/stan/math/prim/scal/fun/constants.hpp:65:18: warning: ‘stan::math::NEGATIVE_EPSILON’ defined but not used [-Wunused-variable] const double NEGATIVE_EPSILON ^ In file included from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:42:0, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4, from file3d155eade6dd.cpp:8: /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/set_zero_all_adjoints.hpp:14:17: warning: ‘void stan::math::set_zero_all_adjoints()’ defined but not used [-Wunused-function] static void set_zero_all_adjoints() {
These simply say there is some part of the Stan library that is being compiled but not used; it has nothing to do with your model specifically. You can safely ignore these compiler warnings but they can also be suppressed by editing your Makevars file.
-Open the file whose path is the output of
-normalizePath("~/.R/Makevars")
-and add (or add on to) a line that starts with CXXFLAGS =
. Copy the part(s) inside the square brackets from the compiler warning message and paste them onto the line that starts with CXXFLAGS =
but add the two-character string no
after the starting string -W
. For example, in the case of the compiler warnings in the messages above, the line that starts with CXXFLAGS =
would become
CXXFLAGS = ... -Wno-unused-variable -Wno-unused-function -Wno-unused-local-typedefs
-where ...
stands for what was already there, such as -O3
.
Runtime warnings occur during the estimation. Stan throws a lot of runtime warnings because they may be important. This does not mean that every time you see a warning you can’t trust your estimates, but when you see warnings it does mean that you shouldn’t blindly trust your estimates without first understanding what the warnings mean and which ones require action on your part.
-You might not be used to seeing so many warnings from other software you use, but that does not mean that Stan has more problems than that other software. The Stan Development Team places a high priority on notifying users about any issue that could potentially be important and Stan will always tell you about problems it encounters instead of hiding them from you. Warning messages do not indicate that something is wrong with Stan but rather that Stan is doing its job and warning you when it finds problems (or possible problems) with what it was told to do. Furthermore, a huge advantage of the algorithms used by Stan is that they permit certain unique diagnostics (e.g. the divergences discussed below) that are unavailable when using other algorithms. This leads to more warnings from Stan, but we cannot emphasize enough that this is a feature rather than a drawback.
-Example:
-1: There were 15 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
-2: Examine the pairs() plot to diagnose sampling problems
-Stan uses Hamiltonian Monte Carlo (HMC) to explore the target distribution — the posterior defined by a Stan program + data — by simulating the evolution of a Hamiltonian system. In order to approximate the exact solution of the Hamiltonian dynamics we need to choose a step size governing how far we move each time we evolve the system forward. That is, the step size controls the resolution of the sampler.
-Unfortunately, for particularly hard problems there are features of the target distribution that are too small for this resolution. Consequently the sampler misses those features and returns biased estimates. Fortunately, this mismatch of scales manifests as divergences which provide a practical diagnostic.
-For some intuition, imagine walking down a steep mountain. If you take too big of a step you will fall, but if you can take very tiny steps you might be able to make your way down the mountain, albeit very slowly. Similarly, we can tell Stan to take smaller steps around the posterior distribution, which (in some but not all cases) can help avoid these divergences.
-As the warning message says, you should call pairs()
on the resulting object. Red points indicate divergent transitions. In our experience, divergent transitions that occur above the diagonal of the pairs()
plot — meaning that the amount of numerical error was above the median over the iterations — can often be eliminated simply by increasing the value of the adapt_delta
parameter (see below for example code). This is the target average proposal acceptance probability during Stan’s adaptation period, and increasing it will force Stan to take smaller steps. The downside is that sampling will tend to be slower because a smaller step size means that more steps are required. Since the validity of the estimates is not guaranteed if there are post-warmup divergences, the slower sampling is a minor cost.
If you have many unknowns in your Stan program, then the pairs()
plot may be illegible. There is a pars
argument to the pairs()
function that allows you to specify a subset of parameters. Also, there is an include
argument that can be set to FALSE
that will result in the complement of the pars
argument being included in the pairs()
plot. You should definitely include variance (or standard deviation, etc.) parameters, since they illustrate the funnel phenomenon most often. In addition, you should include at least some lower-level parameters in hierarchical models.
Conversely, divergent transitions that occur below the diagonal of the pairs()
plot — meaning that the amount of numerical error was below the median over the iterations — often cannot be eliminated simply by increasing the value of the adapt_delta
parameter, although it does not hurt to try and it might actually work if the number of divergent transitions is small. If the divergent transitions cannot be eliminated by increasing the adapt_delta
parameter, we have to find a different way to write the model that is logically equivalent but simplifies the geometry of the posterior distribution. This problem occurs frequently with hierarchical models and one of the simplest examples is Neal’s Funnel, which is discussed in the Optimizing Stan Code chapter of the Stan manual and can be performed by calling
library(rstan)
-funnel <- stan_demo("funnel", seed = 12345) # has 5 divergent transitions
-pairs(funnel, pars = c("y", "x[1]", "lp__"), las = 1) # below the diagonal
-funnel_reparam <- stan_demo("funnel_reparam") # has no divergent transitions
-Recommendations:
-adapt_delta
. In RStan, adapt_delta
is one of the parameters that you can include in the optional control
list passed to the stan
or sampling
functions. For example, to set adapt_delta
to 0.99 (the default is 0.8) you would do this:stan(..., control = list(adapt_delta = 0.99))
-Example:
-Exception thrown at line 24: normal_log: Scale parameter is 0, but must be positive!
-This warning indicates that the standard deviation (scale parameter) of the normal distribution (at line 24 in the Stan program) is 0, but it must be positive for Stan to compute the value of density function. It is not uncommon to get this type of warning if your model has bounded parameters. A message like this does not necessarily indicate that something is wrong with your model, as it’s possible that even in a correctly specified and well-behaved model the value of the scale parameter can be 0 (or numerically indistinguishable from 0) occasionally during sampling.
-However, this warning should not be ignored if it occurs many times. Informally, if the number of times this happens is large then something in the model or data (or the combination of the model and data) is consistently forcing the constrained parameter to its boundary and this should be investigated.
-Recommendations:
-sigma
is declared with a <lower=0>
constraint.log_sum_exp(x)
is more robust numerically than log(sum(exp(x))
. Other examples like this include log1m(x)
instead of log(1-x)
, log1p_exp(x)
instead of log(1 + exp(x))
, and many others. See the Composed Functions subsection of the Real-Valued Basic Functions chapter of the Stan manual for a complete listing.Warnings about hitting the maximum treedepth are not as serious as warnings about divergent transitions. While divergent transitions are a validity concern, hitting the maximum treedepth is an efficiency concern. Configuring the No-U-Turn-Sampler (the variant of HMC used by Stan) involves putting a cap on the depth of the trees that it evaluates during each iteration (for details on this see the Hamiltonian Monte Carlo Sampling chapter in the Stan manual). This is controlled through a maximum depth parameter max_treedepth
. When the maximum allowed tree depth is reached it indicates that NUTS is terminating prematurely to avoid excessively long execution time.
Transitions that hit the maximum treedepth are plotted in yellow in the pairs()
plot.
Recommendations:
-max_treedepth
is one of the parameters that you can include in the optional control
list passed to the stan
or sampling
functions. For example, to set max_treedepth
to 15 (the default is 10) you would do this:stan(..., control = list(max_treedepth = 15))
-You may see a warning that says some number of chains had an estimated Bayesian Fraction of Missing Information (BFMI) that was too low. This implies that the adaptation phase of the Markov Chains did not turn out well and those chains likely did not explore the posterior distribution efficiently. For more details on this diagnostic, see https://arxiv.org/abs/1604.00695.
-Recommendations:
-pairs
plot to see which primitive parameters are correlated with the energy__
margin. There should be a negative relationship between lp__
and energy__
in the pairs
plot, which is not a concern because lp__
is the logarithm of the posterior kernel rather than a primitive parameter.energy__
margin in the pairs
plot are a good place to start thinking about reparameterizations.iter
and / or warmup
arguments. By default warmup
is half of iter
and iter
is \(2000\) by default.R-hat convergence diagnostic compares the between- and within-chain estimates for model parameters and other univariate quantities of interest. If chains have not mixed well (ie, the between- and within-chain estimates don’t agree), R-hat is larger than 1. We recommend running at least four chains by default and only using the sample if R-hat is less than 1.01. Stan reports R-hat which is the maximum of rank normalized split-R-hat and rank normalized folded-split-R-hat, which works for thick tailed distributions and is sensitive also to differences in scale. For more details on this diagnostic, see https://arxiv.org/abs/1903.08008.
-Recommendations: * Look at Bulk- and Tail-ESS for further information. * Look at the rank
plot to see how the chains differ from each other. * Look at the local and quantile efficiency plots. * You might try setting a higher value for the iter
argument. By default iter
is \(2000\).
Roughly speaking, the effective sample size (ESS) of a quantity of interest captures how many independent draws contain the same amount of information as the dependent sample obtained by the MCMC algorithm. Clearly, the higher the ESS the better. Stan uses R-hat adjustment to use the between-chain information in computing the ESS. For example, in case of multimodal distributions with well-separated modes, this leads to an ESS estimate that is close to the number of distinct modes that are found.
-Bulk-ESS refers to the effective sample size based on the rank normalized draws. This does not directly compute the ESS relevant for computing the mean of the parameter, but instead computes a quantity that is well defined even if the chains do not have finite mean or variance. Overall bulk-ESS estimates rhe sampling efficiency for the location of the distribution (e.g. mean and median).
-Often quite smaller ESS would be sufficient for the desired estimation accuracy, but the estimation of ESS and convergence diagnostics themselves require higher ESS. We recommend requiring that the bulk-ESS is greater than 100 times the number of chains. For example, when running four chains, this corresponds to having a rank-normalized effective sample size of at least 400.
-Recommendations: * You might try setting a higher value for the iter
argument. By default iter
is \(2000\). * Look at the rank
plot to see how the chains differ from each other. * Look at the local and quantile efficiency plots. * Look at change in bulk-ESS when the number of iterations increase. If R-hat is less than 1.01 and bulk-ESS grows linearly with the number of iterations and eventually exceeds the recommended limit, the mixing is sufficient but MCMC has high autocorrelation requiring a large number of iterations
Tail-ESS computes the minimum of the effective sample sizes (ESS) of the 5% and 95% quantiles. Tail-ESS can help diagnosing problems due to different scales of the chains and slow mixing in the tails. See also general information about ESS above in description of bulk-ESS.
-Recommendations: * You might try setting a higher value for the iter
argument. By default iter
is \(2000\). * Look at the rank
plot to see how the chains differ from each other. * Look at the local and quantile efficiency plots. * Look at change in tail-ESS when the number of iterations increase. If R-hat is less than 1.01 and tail-ESS grows linearly with the number of iterations and eventually exceeds the recommended limit, the mixing is sufficient but MCMC has high autocorrelation requiring a large number of iterations
Te best place to get help from Stan developers and users if you have difficulties fitting a model is to visit the Stan Forums.
-In order to both reduce the amount of help you need and allow us to give the best help when you do need it, it is essential that you follow these recommendations:
-Always put Stan programs in a stand-alone file with a .stan
extension. Even though some Stan interfaces allows specifying the model as a string, the line numbers in warning and error messages are only meaningul if you use a separate file.
Maintain reproducibility by saving the model and initial values in files and the RStan (other other Stan interface) commands in scripts.
Use version control (e.g. git) on your files and scripts so that you have a history of the changes you’ve made.
Start simple! Build your model in stages, and check for good fits at each stage, only adding complexity if there are no red flags. If you start by writing a very complicated model it will be much more difficult to figure out where things are going wrong.
Keep an eye on the diagnostics, in particular divergences and Rhat values. For RStan users, you will be warned about divergences and you can view Rhats using the print
or summary
methods for stanfit objects. All important diagnostics can also be found in our ShinyStan GUI.