-
-
Notifications
You must be signed in to change notification settings - Fork 371
Introduction to Stan for New Developers
Welcome to Stan! We're excited that you're interested in contributing to the project. Before you're able to contribute, there are some processes and other information that are good to know.
The Stan project is hosted on GitHub so you will have to create a GitHub account if you do not yet already have one. Developer discussions are hosted on Discourse so you will have to create an account there in order to ask questions or participate in discussions.
Most of the following discussion is aimed at people who want to contribute C++ code to Stan. But there are many other ways to contribute that don't involve C++!
We're committed to having a permissive open-source license. The stan-dev/math, stan-dev/stan, and stan-dev/cmdstan libraries are licensed with the BSD 3-Clause License and we only accept changes to the code base that compatible with this license.
When you write code, you own the copyright to your code unless you've assigned it to another entity, such as your employer. Contributions to Stan leave copyright ownership with the contributor or their assignee. Per the GitHub terms of service (see the section, User-Generated Content, any code contributed to Stan through GitHub is released under the same license as the repository to which it is contributed.
The development for the math library, language and algorithms, and interfaces are arranged into the following repositories with arrows indicating submodule inclusions.
math <- stan <- pystan
<- rstan <- rstanarm
<- cmdstan <- statastan
<- matlabstan
<- stan.jl
<- MathematicaStan
<- stanc3
Currently, the stan
repo includes the algorithms and the service API for the interfaces. There are additional repos for tools such as the emacs mode, R plotting, R Shiny interface, web pages, etc. stanc3
hosts the language compiler. Some of the modules include others as submodules if there's a code dependency. Each of the repos also has their own wiki! Don't forget to check that wiki homepage and search it for information that might be related to that subproject.
People use Stan in many contexts. When we're deciding how to add/remove/modify Stan code, we need to understand what our goals are. This typically involves some discussion where we try to elicit some concrete use-cases for the feature, followed by a github issue in the appropriate repo with something resembling a spec for the issue that the reviewer can use to evaluate an associated pull request, followed by that pull request. These three artifacts exist in different locations, so at the top of each one there should be a link to the others and an attempt to summarize the results of previous steps in the workflow. To summarize:
- Bring up your proposed feature for discussion on our forums. If you're trying to find a place to help out, you can skip this and find an existing issue on the appropriate github repo.
- Summarize the discussion and write something approaching a high level spec in a github issue.
- Create a pull request with an attempt to address a github issue.
You can read more about the developer process here.
We have adopted the GitFlow process for incorporating new contributions into Stan. If you are not yet familiar with Git we recommend that you check out many of the great Git tutorials freely available online. Once you are comfortable with Git itself you can read about are particular implementation of GitFlow here and here.
All new contributions are also tested with out continuous integration framework.
Every developer has their own local development setup, but we have compiled various helpful tricks that you might find useful.
In order to ensure that we can quickly read and understand contributions, consistent style is incredibly important. We have adopted conventions for code quality and code style to which all contributions must conform. You can read more on these links, but we use an automated formatter for many of our conventions.
There is a list of supported compilers and language features here.
The robustness of Stan is only as good as our test coverage, and we require that all new contributions are adequately tested. We use the GoogleTest framework for writing tests and GnuMake and Python for running those tests.
We have two main sources of documentation - Doxygen doc comments and the Stan manual. You can read more about contributing to the former here. The latter typically has a github issue for each Stan release associated with it on the Stan repo, but we also take pull requests to the .tex files.
There are other forms of documentation listed on the website here.
Much of what you might consider to be the "core" of Stan actually exists in the Math repo. This document applies to that repo, but you can read more about how that repo is organized and any differences here.
The core code in Stan is written in heavily-templated C++ to ensure high-performance. There are many great C++ tutorials available online, for example cplusplus.org, and once you are familiar with the basics of the language you can tackle the subtleties of templates. We highly recommend Vandevoorde and Josuttis and Alexandrescu.
There are many additional resources available for learning how to optimize C++ code, including Agner Fog's manuscript and the many books of, amongst others, Scott Meyers and Herb Sutter.
Having a comprehensive set of useful densities coded in the Stan math library is a benefit to users. Densities are also a maintenance burden both for testing and for understanding the code base. As a result we are somewhat cautious about including new densities. Guidelines for including densities:
- The pdf, cdf, and rng should be available so users of the Stan language don't need to check the manual.
- There should be a computational benefit to coding the density in C++. Some densities can easily and efficiently be specified in the Stan language and the benefits of coding them in C++ are limited. It helps to provide some evidence of the computational benefits.
- The density should be applicable to a range of problems.
- If the density's C++ code re-implements or improves on functions already present in the math library, the necessary improvements should be coded separately in the math library.
- Ongoing interest from the code author in maintaining the code.
The Stan interfaces wrap the core C++ code and expose its functionality to other languages, such as R and Python. Consequently contributions to the interfaces may require knowledge of how to couple these languages together, for example with Rccp and Cython, or be built entirely in the interface language. For details on a specific interface please consult the corresponding GitHub repository.
Once you have familiarized yourself with our process take a look at the GitHub issue trackers for the many tasks that need to be tackled! We look forward to hearing from you on Discourse and seeing your pull requests!
This section contains some tips for using developer-oriented tools and setting up a computing environment for development.
There are a lot of files that are in our .gitignore
file that stack up and don't gt cleaned. In order to remove every untracked file, including hidden ones, do this:
git clean -d -x -f
Warning: this will kill everything that's not currently being tracked. You probably want to run git status
first.
https://github.com/git/git/tree/master/contrib/completion
-
git-prompt.sh. This changes the prompt on the command line to show the current branch. Install by copying it to
~/
, follow install instructions. For a cleaner prompt, replace the PS1 suggested with:PS1='\w$(__git_ps1 " (%s)")> '
The prompt will look like:~/stan (master)>
where "~/stan" is the current directory, "(master)" indicates the current branch is the master branch. -
git-completion.*sh. Install this for auto-completion from the command line. It auto-completes git commands and git branches. For example, type
git checkout
then hit tab twice. It should show the available branches.
By default, aquamacs will has multiple kill buffers. This means that there is a copy and paste buffer by using command-c/x/v and there is a separate copy and paste buffer by using ctrl-w, alt-w, ctrl-y. This gets really confusing. Here's how to have a single kill buffer so copying from any Mac program will paste into emacs using ctrl-y or command-v.
- Open ~/.emacs (or wherever else you're storing your preferences)
- Add:
(setq x-select-enable-clipboard t)
Ant includes a handy task called FixCRLF that "Adjusts a text file to local conventions." So you can set it to replace tabs with spaces and Windows line ends with unix ones.
To make sure you never have tabs or line-ending spaces in your code files, you can use this in your .emacs
file:
(defun java-mode-untabify ()
(save-excursion
(goto-char (point-min))
(if (search-forward "\t" nil t)
(untabify (1- (point)) (point-max))))
nil)
(add-hook 'java-mode-hook
'(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))
(add-hook 'html-mode-hook
'(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))
(add-hook 'cpp-mode-hook
'(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))
(add-hook 'stan-mode-hook
'(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))
You can rename --- the "Java" in the title is a holdover from where I first got the macros.
You can also automatically remove line-final whitespace (this is just for C++, but it could be hooked elsewhere):
(add-hook 'c++-mode-hook
(lambda () (add-to-list 'write-file-functions 'delete-trailing-whitespace)))
Not just for Windows:
http://nadeausoftware.com/articles/2012/01/c_c_tip_how_use_compiler_predefined_macros_detect_operating_system Some GCC specifics:
http://stackoverflow.com/questions/259248/how-do-i-test-the-current-version-of-gcc
- Stan project structure
- Stan Users Group
- Discourse
- Developer process
- Contributing a new Stan function
- Stan C++ style guide (includes some developer environment setup)
- Autodiff paper Details the implementation and math library generally
- Some Bayesian Modeling Techniques in Stan
- [Vandevoorde's C++ Templates] (http://www.josuttis.com/tmplbook/) and the free PDF here: Vandevoorde's Templates PDF
- [Alexandrescu's Modern C++ Design] (http://erdani.com/index.php/books/modern-c-design/) and the free PDF here: Modern C++ PDF
- Agner Fog's C++ optimization tips
- On template metaprograms
The Stan Math library, referred to as Math or the Math library, is a C++ library for automatic differentiation. It's designed to be usable, extensive and extensible, efficient, scalable, stable, portable, and redistributable in order to facilitate the construction and utilization of algorithms that utilize derivatives.
The Math library implements:
- reverse mode automatic differentiation for computing gradients. This is fully tested and is utilized by Stan.
- forward mode automatic differentiation for computing directional derivatives. This does not work for higher-order functions, but is otherwise completely tested.
- mixed mode automatic differentiation for computing higher order derivatives. Once forward mode is fully tested, this should work.
Some key features of the Math library's reverse mode automatic differentiation:
- object oriented design with overloading of operators
- arena-based memory management
For implementation details of the Math library's automatic differentiation, please read the arXiv paper "The Stan Math Library: Reverse-Mode Automatic Differentiation in C++."