-
-
Notifications
You must be signed in to change notification settings - Fork 371
Introduction to Stan for New Developers
Welcome to Stan! We're excited that you're interested in contributing to the project. Before you're able to contribute, there are some processes and other information that are good to know.
The Stan project is hosted on GitHub so you will have to create a GitHub account if you do not yet already have one. Developer discussions are hosted on Discourse so you will have to create an account there in order to ask questions or participate in discussions.
Most of the following discussion is aimed at people who want to contribute C++ code to Stan. But there are many other ways to contribute that don't involve C++!
We're committed to having a permissive open-source license. The stan-dev/math, stan-dev/stan, and stan-dev/cmdstan libraries are licensed with the BSD 3-Clause License and we only accept changes to the code base that compatible with this license.
When you write code, you own the copyright to your code unless you've assigned it to another entity, such as your employer. Contributions to Stan leave copyright ownership with the contributor or their assignee. Per the GitHub terms of service (see the section, User-Generated Content, any code contributed to Stan through GitHub is released under the same license as the repository to which it is contributed.
We reserve issues for bugs and feature requests that are defined well enough for a developer to tackle. If you have general questions about the Math library, please see the Discussion section.
Ideally, bug reports will include these pieces of information:
- a description of the problem
- a reproducible example
- the expected outcome if the bug were fixed.
If there's an error and you can produce any of these pieces, it's much appreciated!
We track the development of new features using the same issue tracker. Ideally, feature requests will have:
- a description of the feature
- an example
- the expected outcome if the feature existed.
Open feature requests should be the ones we want to implement in the Math library. We'll try to close vague feature requests that don't have enough information and move the discussion to the forums.
All changes to the Math library are handled through pull requests. Each pull request should correspond to an issue. We follow a modified GitFlow branching model for development.
When a contributor creates a pull request for inclusion to the Math library, here are some of the things we expect:
- the contribution maintains the Math library's open-source license: 3-clause BSD
- the code base remains stable after merging the pull request; we expect the
develop
branch to always be in a good state - the changes are maintainable. In code review, we look at the design of the proposed code. We also expect documentation.
- the changes are tested. For bugs, we expect at least one test that fails before the patch and is fixed after the patch. For new features, we expect at least one test that shows expected behavior and one test that shows the behavior when there's an error.
- the changes adhere to the Math library's C++ standards. Consistency really helps.
Pull requests are code reviewed after they pass our continuous integration tests. We expect all the above before a pull request is merged. We are an open-source project and once code makes it into the repository, it's on the community to maintain.
See the Code Review Guidelines on the Math wiki.
For general questions, please ask on the forums with the "Developers" tag.
The development for the math library, language and algorithms, and interfaces are arranged into the following repositories with arrows indicating submodule inclusions.
math <- stan <- pystan
<- rstan <- rstanarm
<- cmdstan <- statastan
<- matlabstan
<- stan.jl
<- MathematicaStan
<- stanc3
Currently, the stan
repo includes the algorithms and the service API for the interfaces. There are additional repos for tools such as the emacs mode, R plotting, R Shiny interface, web pages, etc. stanc3
hosts the language compiler. Some of the modules include others as submodules if there's a code dependency. Each of the repos also has their own wiki! Don't forget to check that wiki homepage and search it for information that might be related to that subproject.
People use Stan in many contexts. When we're deciding how to add/remove/modify Stan code, we need to understand what our goals are. This typically involves some discussion where we try to elicit some concrete use-cases for the feature, followed by a github issue in the appropriate repo with something resembling a spec for the issue that the reviewer can use to evaluate an associated pull request, followed by that pull request. These three artifacts exist in different locations, so at the top of each one there should be a link to the others and an attempt to summarize the results of previous steps in the workflow. To summarize:
- Bring up your proposed feature for discussion on our forums. If you're trying to find a place to help out, you can skip this and find an existing issue on the appropriate github repo.
- Summarize the discussion and write something approaching a high level spec in a github issue.
- Create a pull request with an attempt to address a github issue.
You can read more about the developer process here.
We have adopted the GitFlow process for incorporating new contributions into Stan. If you are not yet familiar with Git we recommend that you check out many of the great Git tutorials freely available online. Once you are comfortable with Git itself you can read about are particular implementation of GitFlow here and here.
All new contributions are also tested with out continuous integration framework.
Every developer has their own local development setup, but we have compiled various helpful tricks that you might find useful.
In order to ensure that we can quickly read and understand contributions, consistent style is incredibly important. We have adopted conventions for code quality and code style to which all contributions must conform. You can read more on these links, but we use an automated formatter for many of our conventions.
There is a list of supported compilers and language features here.
The robustness of Stan is only as good as our test coverage, and we require that all new contributions are adequately tested. We use the GoogleTest framework for writing tests and GnuMake and Python for running those tests.
We have two main sources of documentation - Doxygen doc comments and the Stan manual. You can read more about contributing to the former here. The latter typically has a github issue for each Stan release associated with it on the Stan repo, but we also take pull requests to the .tex files.
There are other forms of documentation listed on the website here.
Much of what you might consider to be the "core" of Stan actually exists in the Math repo. This document applies to that repo, but you can read more about how that repo is organized and any differences here.
The core code in Stan is written in heavily-templated C++ to ensure high-performance. There are many great C++ tutorials available online, for example cplusplus.org, and once you are familiar with the basics of the language you can tackle the subtleties of templates. We highly recommend Vandevoorde and Josuttis and Alexandrescu.
There are many additional resources available for learning how to optimize C++ code, including Agner Fog's manuscript and the many books of, amongst others, Scott Meyers and Herb Sutter.
Having a comprehensive set of useful densities coded in the Stan math library is a benefit to users. Densities are also a maintenance burden both for testing and for understanding the code base. As a result we are somewhat cautious about including new densities. Guidelines for including densities:
- The pdf, cdf, and rng should be available so users of the Stan language don't need to check the manual.
- There should be a computational benefit to coding the density in C++. Some densities can easily and efficiently be specified in the Stan language and the benefits of coding them in C++ are limited. It helps to provide some evidence of the computational benefits.
- The density should be applicable to a range of problems.
- If the density's C++ code re-implements or improves on functions already present in the math library, the necessary improvements should be coded separately in the math library.
- Ongoing interest from the code author in maintaining the code.
The Stan interfaces wrap the core C++ code and expose its functionality to other languages, such as R and Python. Consequently contributions to the interfaces may require knowledge of how to couple these languages together, for example with Rccp and Cython, or be built entirely in the interface language. For details on a specific interface please consult the corresponding GitHub repository.
Once you have familiarized yourself with our process take a look at the GitHub issue trackers for the many tasks that need to be tackled! We look forward to hearing from you on Discourse and seeing your pull requests!
This section contains some tips for using developer-oriented tools and setting up a computing environment for development.
There are a lot of files that are in our .gitignore
file that stack up and don't gt cleaned. In order to remove every untracked file, including hidden ones, do this:
git clean -d -x -f
Warning: this will kill everything that's not currently being tracked. You probably want to run git status
first.
https://github.com/git/git/tree/master/contrib/completion
-
git-prompt.sh. This changes the prompt on the command line to show the current branch. Install by copying it to
~/
, follow install instructions. For a cleaner prompt, replace the PS1 suggested with:PS1='\w$(__git_ps1 " (%s)")> '
The prompt will look like:~/stan (master)>
where "~/stan" is the current directory, "(master)" indicates the current branch is the master branch. -
git-completion.*sh. Install this for auto-completion from the command line. It auto-completes git commands and git branches. For example, type
git checkout
then hit tab twice. It should show the available branches.
By default, aquamacs will has multiple kill buffers. This means that there is a copy and paste buffer by using command-c/x/v and there is a separate copy and paste buffer by using ctrl-w, alt-w, ctrl-y. This gets really confusing. Here's how to have a single kill buffer so copying from any Mac program will paste into emacs using ctrl-y or command-v.
- Open ~/.emacs (or wherever else you're storing your preferences)
- Add:
(setq x-select-enable-clipboard t)
Ant includes a handy task called FixCRLF that "Adjusts a text file to local conventions." So you can set it to replace tabs with spaces and Windows line ends with unix ones.
To make sure you never have tabs or line-ending spaces in your code files, you can use this in your .emacs
file:
(defun java-mode-untabify ()
(save-excursion
(goto-char (point-min))
(if (search-forward "\t" nil t)
(untabify (1- (point)) (point-max))))
nil)
(add-hook 'java-mode-hook
'(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))
(add-hook 'html-mode-hook
'(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))
(add-hook 'cpp-mode-hook
'(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))
(add-hook 'stan-mode-hook
'(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))
You can rename --- the "Java" in the title is a holdover from where I first got the macros.
You can also automatically remove line-final whitespace (this is just for C++, but it could be hooked elsewhere):
(add-hook 'c++-mode-hook
(lambda () (add-to-list 'write-file-functions 'delete-trailing-whitespace)))
Not just for Windows:
http://nadeausoftware.com/articles/2012/01/c_c_tip_how_use_compiler_predefined_macros_detect_operating_system Some GCC specifics:
http://stackoverflow.com/questions/259248/how-do-i-test-the-current-version-of-gcc
- Stan project structure
- Stan Users Group
- Discourse
- Developer process
- Contributing a new Stan function
- Stan C++ style guide (includes some developer environment setup)
- Autodiff paper Details the implementation and math library generally
- Some Bayesian Modeling Techniques in Stan
- [Vandevoorde's C++ Templates] (http://www.josuttis.com/tmplbook/) and the free PDF here: Vandevoorde's Templates PDF
- [Alexandrescu's Modern C++ Design] (http://erdani.com/index.php/books/modern-c-design/) and the free PDF here: Modern C++ PDF
- Agner Fog's C++ optimization tips
- On template metaprograms
The Stan Math library, referred to as Math or the Math library, is a C++ library for automatic differentiation. It's designed to be usable, extensive and extensible, efficient, scalable, stable, portable, and redistributable in order to facilitate the construction and utilization of algorithms that utilize derivatives.
The Math library implements:
- reverse mode automatic differentiation for computing gradients. This is fully tested and is utilized by Stan.
- forward mode automatic differentiation for computing directional derivatives. This does not work for higher-order functions, but is otherwise completely tested.
- mixed mode automatic differentiation for computing higher order derivatives. Once forward mode is fully tested, this should work.
Some key features of the Math library's reverse mode automatic differentiation:
- object oriented design with overloading of operators
- arena-based memory management
For implementation details of the Math library's automatic differentiation, please read the arXiv paper "The Stan Math Library: Reverse-Mode Automatic Differentiation in C++."
Without including this, Boost will assert if certain inputs do not meet the preconditions of the function. Assertions are difficult to trap and recover from and we want to continue to have control over this behavior.
See Discourse: Boost defines for more details.
It seems to be challenging these days to get a working Unix compilation toolchain on Windows.
The complilation shell environment is MSYS2 MinGW 64-bit. When launching, make sure to pick the 64 bit shortcut also. The shell command prompt itself should say (in purple) MINGW64, not MSYS. All commands herein have been run from the top-most math
folder of this repo.
Reference this guide on installing g++, but with a few modifictions to make it simpler. We won't be installing in non-default directories or doing two installations.
$ pacman -Syuu
Then, follow the on screen instructions, which will tell eventually require closing MSYS without using exit
. Then run the exact same command again to finish the process. Note, you'll need both toolchains mentioned below. The second, i686, toolchain is for tbb
, as is the make
package, which contains mingw32-make
.
$ pacman -S --needed base-devel mingw-w64-x86_64-toolchain mingw-64 mingw-w64-i686-toolchain git make
$ which g++
/mingw64/bin/g++
$ g++ --version
g++.exe (Rev2, Built by MSYS2 project) 9.2.0
... rest of listing omitted ...
The make and python based build structure referenced in the stan developer overview is replicated in the stan-math repo, so one can use the instructions there to get an idea of what commands the build system supports, like make test-headers
or runTests.py
. Be aware that these commands can take a long time, and that stan-math's continuous integration automated build system, Jenkins, is running older compilers like GCC 4.9. Therefore, new Windows related bugs may be encountered deep in the middle of the tests.
$ which python2
/usr/bin/python2
$ pacman -Ss python2 | grep installed
msys/python2 2.7.17-1 [installed]
The build system supports making and running individual tests. You may have to build Google Test first on your own if the makefile doesn't handle it automatically. Directions for this are not given here.
$ PYTHON2=python2 ./runTests.py test/unit/math/rev/core/var_test.exe
If that command doesn't work, try swapping the test script with make:
$ PYTHON2=python2 make test/unit/math/rev/core/var_test.exe
make: 'test/unit/math/rev/core/var_test.exe' is up to date.
If the test suite isn't built first, client code using numerical integration routines such as integrate_ode_bdf
may fail to link because the static libraries haven't been built yet. Obviously these files and directories would need to be added to the downstream project makefile's LDLIBS variable, or equivalent.
$ PYTHON2=python2 make lib/sundials_5.1.0/lib/libsundials_cvodes.a lib/sundials_5.1.0/lib/libsundials_idas.a lib/sundials_5.1.0/lib/libsundials_kinsol.a lib/sundials_5.1.0/lib/libsundials_nvecserial.a
make: 'lib/sundials_5.1.0/lib/libsundials_cvodes.a' is up to date.
make: 'lib/sundials_5.1.0/lib/libsundials_idas.a' is up to date.
make: 'lib/sundials_5.1.0/lib/libsundials_kinsol.a' is up to date.
make: 'lib/sundials_5.1.0/lib/libsundials_nvecserial.a' is up to date.
The Intel Thread Building Blocks dynamic library is also required. Note, the tbb
folder will be created during this process. Note the alternate make
executable.
$ PYTHON2=python2 mingw32-make lib/tbb/tbb.dll
Rather than using rpath
for the final executable as would be done on Linux, the author recommends just copying the tbb.dll
to the output folder of your program.