Skip to content

Commit

Permalink
Merge pull request #171 from dessn/docs
Browse files Browse the repository at this point in the history
Update documentation and README
  • Loading branch information
OmegaLambda1998 authored Jul 31, 2024
2 parents 4fd0994 + 1b4389a commit 0800bfa
Show file tree
Hide file tree
Showing 24 changed files with 1,263 additions and 2,612 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/black-formatter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ jobs:
- uses: psf/black@stable
with:
options: "--check --verbose --diff"
version: "~= 22.0"
version: "~= 22.0"
1,001 changes: 7 additions & 994 deletions README.md

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
[![Documentation](https://readthedocs.org/projects/pippin/badge/?version=latest)](https://pippin.readthedocs.io/en/latest/?badge=latest)
[![JOSS](https://joss.theoj.org/papers/10.21105/joss.02122/status.svg)](https://doi.org/10.21105/joss.02122)
[![Zenodo](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.366608-blue)](https://zenodo.org/badge/latestdoi/162215291)
[![GitHub license](https://img.shields.io/badge/License-MIT-green)](https://github.com/dessn/Pippin/blob/master/LICENSE)
[![Github Issues](https://img.shields.io/github/issues/dessn/Pippin)](https://github.com/dessn/Pippin/issues)
![Python Version](https://img.shields.io/badge/Python-3.7%2B-red)
![Pippin Test](https://github.com/dessn/Pippin/actions/workflows/test-pippin.yml/badge.svg)

# Pippin

Pippin - a pipeline designed to streamline and remove as much hassle as we can when running end-to-end supernova cosmology analyses.

![A Really Funny Meme](_static/images/meme.jpg)

## Table of Contents

:::{toctree}
:maxdepth: 2
:hidden:

self
:::

:::{toctree}
:maxdepth: 2

src/install.md
src/usage.md
src/tasks.md
src/dev.md
:::
13 changes: 12 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,20 @@
# ones.
extensions = [
'sphinx_rtd_theme',
'myst_parser'
'sphinx_rtd_dark_mode',
'myst_parser',
'sphinxcontrib.youtube',
]

myst_enable_extensions = [
"substitution",
"colon_fence",
]

myst_substitutions = {
"patrick": "[Patrick Armstrong](https://github.com/OmegaLambda1998)"
}

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

Expand Down
1,007 changes: 0 additions & 1,007 deletions docs/index.md

This file was deleted.

2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.. include:: README.md
:parser: myst_parser.sphinx_
23 changes: 0 additions & 23 deletions docs/install.rst

This file was deleted.

2 changes: 2 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
sphinx<8
sphinx_rtd_theme
sphinx-rtd-dark-mode
myst-parser
sphinxcontrib-youtube
89 changes: 89 additions & 0 deletions docs/src/dev.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Pippin Development

## Issues and Contributing to Pippin

Contributing to Pippin or raising issues is easy. Here are some ways you can do it, in order of preference:

1. Submit an [issue on Github](https://github.com/dessn/Pippin/issues), and then submit a pull request to fix that issue.
2. Submit an [issue on Github](https://github.com/dessn/Pippin/issues), and then wait until I have time to look at it. Hopefully thats quickly, but no guarantees.
3. Email me with a feature request

If you do want to contribute code, fantastic. [Please note that all code in Pippin is subject to the Black formatter](https://black.readthedocs.io/en/stable/). I would recommend installing this yourself because it's a great tool.

![Developer Documentation Below](../_static/images/developer.jpg)

## Coding style

Please, for the love of god, don't code this up in vim/emacs on a terminal connection[^1]. Use a proper IDE (I recommend PyCharm or VSCode), and **install the Black extension**! I have Black set up in PyCharm as a file watcher, and all python files, on save, are automatically formatted. Use 160 characters a linewidth. Here is the Black file watcher config:

![Black config](../_static/images/black.jpg)

If everyone does this, then all files should remain consistent across different users.

[^1]: {{patrick}}: Since taking over as primary developer, I have done nothing but code this up in vim on a terminal connection. It's not the worst thing you could possibly do. There's a [Black Linter](https://github.com/dessn/Pippin/actions/workflows/black-formatter.yml) github action which will trigger on pull requests to main, allowing you to format your contributions before merging.

## Testing valid config in Pippin

To ensure we don't break things when pushing out new code, the tests directory contains a set of tests progressively increasing in pipeline complexity, designed to ensure that existing config files act consistently regardless of code changes. Any failure in the tests means a break in backwards compatibility and should be discussed before being incorporated into a release.

To run the tests, in the top level directory, simply run:

`pytest -v .`

## Adding a new task

Alright there, you want to add a new task to Pippin? Great. Here's what you've got to do:

1. Create an implementation of the `Task` class, can keep it empty for now.
2. Figure out where it goes - in `manager.py` at the top you can see the current stages in Pippin. You'll probably need to figure out where it should go. Once you have figured it out, import the task and slot it in.
3. Back in your new class that extends Task, you'll notice you have a few methods to implement:
1. `_run()`: Kick the task off, report True or False for successful kicking off. To help with determining the hash and whether the task shoudl run, there are a few handy functions: `_check_regenerate`, `get_hash_from_string`, `save_hash`, `get_hash_from_files`, `get_old_hash`. See, for example, the <project:./tasks/analyse.md> task for an example on how I use these.
2. `_check_completion(squeue)`: Check to see if the task (whether its being rerun or not) is done. Normally I do this by checking for a done file, which contains either SUCCESS or FAILURE. For example, if submitting a script to a queuing system, I might have this after the primary command:
```sh
if [ $? -eq 0 ]; then
echo SUCCESS > {done_file}
else
echo FAILURE > {done_file}
fi
```
This allows me to easily see if a job failed or passed. On failure, I then generally recommend looking through the task logs and trying to figure out what went wrong, so you can present a useful message to your user.
To then show that error, or **ANY MESSAGE TO THE USER**, use the provided logger:
`self.logger.error("The task failed because of this reason")`.

This method should return either a) Task.FINISHED_FAILURE, Task.FINISHED_SUCCESS, or alternatively the number of jobs still in the queue, which you could figure out because I pass in all jobs the user has
active in the variable squeue (which can sometimes be None).
3. `get_tasks(task_config, prior_tasks, output_dir, stage_num, prefix, global_config)`: From the given inputs, determine what tasks should be created, and create them, and then return them in a list. For context,
here is the code I use to determine what simulation tasks to create:
```python
@staticmethod
def get_tasks(config, prior_tasks, base_output_dir, stage_number, prefix, global_config):
tasks = []
for sim_name in config.get("SIM", []):
sim_output_dir = f"{base_output_dir}/{stage_number}_SIM/{sim_name}"
s = SNANASimulation(sim_name, sim_output_dir, f"{prefix}_{sim_name}", config["SIM"][sim_name], global_config)
Task.logger.debug(f"Creating simulation task {sim_name} with {s.num_jobs} jobs, output to {sim_output_dir}")
tasks.append(s)
return tasks
```
## Adding a new classifier
Alright, so what if we're not after a brand new task, but just adding another classifier. Well, its easier to do, and I recommend looking at
`nearest_neighbor_python.py` for something to copy from. You'll see we have the parent Classifier class, I write out the slurm script that
would be used, and then define the `train` and `predict` method (which both invoke a general `classify` function in different ways, you can do this
however you want.)
You'll also notice a very simply `_check_completion` method, and a `get_requirmenets` method. The latter returns a two-tuple of booleans, indicating
whether the classifier needs photometry and light curve fitting results respectively. For the NearestNeighbour code, it classifies based
only on SALT2 features, so I return `(False, True)`.
You can also define a `get_optional_requirements` method which, like `get_requirements`, returns a two-tuple of booleans, indicating whether the classifer needs photometry and light curve fitting results *for this particular run*. By default, this method returns:
- `True, True` if `OPTIONAL_MASK` set in `OPTS`
- `True, False` if `OPTIONAL_MASK_SIM` set in `OPTS`
- `False, True` if `OPTIONAL_MASK_FIT` set in `OPTS`
- `False, False` otherwise.
If you define your own method based on classifier specific requirements, then these `OPTIONAL_MASK*` keys can still be set to choose which tasks are optionally included. If there are not set, then the normal `MASK`, `MASK_SIM`, and `MASK_FIT` are used instead. Note that if *no* masks are set then *every* sim or lcfit task will be included.
Finally, you'll need to add your classifier into the ClassifierFactory in `classifiers/factory.py`, so that I can link a class name
in the YAML configuration to your actual class. Yeah yeah, I could use reflection or dynamic module scanning or similar, but I've had issues getting
the behaviour consistent across systems and conda environments, so we're doing it the hard way.
29 changes: 29 additions & 0 deletions docs/src/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Installation

If you're using a pre-installed version of Pippin - like the one on Midway, ignore this.

If you're not, installing Pippin is simple.

1. Checkout Pippin
2. Ensure you have the dependencies install (`pip install -r requirements.txt`) and that your python version is 3.7+.
3. Celebrate

There is no need to attempt to install Pippin like a package (no `python setup.py install`), just run from the clone.

Now, Pippin also interfaces with other software, including:
- [SNANA](https://github.com/RickKessler/SNANA)
- [SuperNNova](https://github.com/supernnova/SuperNNova)
- [SNIRF](https://github.com/evevkovacs/ML-SN-Classifier)
- [DataSkimmer](https://github.com/supernnova/DES_SNN)
- [SCONE](https://github.com/helenqu/scone)

When it comes to installing SNANA, the best method is to already have it installed on a high performance server you have access to[^1]. However, installing the other software used by Pippin should be far simpler. Taking [SuperNNova](https://github.com/supernnova/SuperNNova) as an example:

1. In an appropriate directory `git clone https://github.com/SuperNNova/SuperNNova`
2. Create a GPU conda env for it: `conda create --name snn_gpu --file env/conda_env_gpu_linux64.txt`
3. Activate environment and install natsort: `conda activate snn_gpu` and `conda install --yes natsort`

Then, in the Pippin global configuration file, [cfg.yml](https://github.com/dessn/Pippin/blob/4fd0994bc445858bba83b2e9e5d3fcb3c4a83120/cfg.yml) in the top level directory, ensure that the `SuperNNova: location` path is pointing to where you just cloned SNN into. You will need to install the other external software packages if you want to use them, and you do not need to install any package you do not explicitly request in a config file[^2].

[^1]: {{patrick}}: I am ***eventually*** going to attempt to create an SNANA docker image, but that's likely far down the line.
[^2]: {{patrick}}: If Pippin is complaining about a missing software package which you aren't using, please file an issue.
20 changes: 14 additions & 6 deletions docs/tasks.rst → docs/src/tasks.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,20 @@
#####
Tasks
#####
# Tasks

Pippin is essentially a wrapper around many different tasks. In this section, I'll try and explain how tasks are related to each other, and what each task is.

As a general note, most tasks have an ``OPTS`` section where most details go. This is partially historical, but essentially properties that Pippin uses to determine how to construct tasks (like ``MASK``, classification mode, etc) are top level, and the Task itself gets passed everything inside OPTS to use however it wants.

.. toctree::
:maxdepth: 2
:::{toctree}
:maxdepth: 1

tasks/dataprep
tasks/dataprep.md
tasks/sim.md
tasks/lcfit.md
tasks/classify.md
tasks/agg.md
tasks/merge.md
tasks/biascor.md
tasks/createcov.md
tasks/cosmofit.md
tasks/analyse.md
:::
20 changes: 20 additions & 0 deletions docs/src/tasks/agg.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# 4. AGGREGATION

The aggregation task takes results from one or more classification tasks (that have been run in predict mode on the same dataset) and generates comparisons between the classifiers (their correlations, PR curves, ROC curves and their calibration plots). Additionally, it merges the results of the classifiers into a single csv file, mapping SNID to one column per classifier.

```yaml
AGGREGATION:
SOMELABEL:
MASK: mask # Match sim AND classifier
MASK_SIM: mask # Match only sim
MASK_CLAS: mask # Match only classifier
RECALIBRATION: SIMNAME # Optional, use this simulation to recalibrate probabilities. Default no recal.
# Optional, changes the probability column name of each classification task listed into the given probability column name.
# Note that this will crash if the same classification task is given multiple probability column names.
# Mostly used when you have multiple photometrically classified samples
MERGE_CLASSIFIERS:
PROB_COLUMN_NAME: [CLASS_TASK_1, CLASS_TASK_2, ...]
OPTS:
PLOT: True # Default True, make plots
PLOT_ALL: False # Default False. Ie if RANSEED_CHANGE gives you 100 sims, make 100 set of plots.
```
20 changes: 20 additions & 0 deletions docs/src/tasks/analyse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# 9. ANALYSE

The final step in the Pippin pipeline is the Analyse task. It creates a final output directory, moves relevant files into it, and generates extra plots. It will save out compressed CosmoMC chains and the plotting scripts (so you can download the entire directory and customise it without worrying about pointing to external files), it will copy in Hubble diagrams, and - depending on if you've told it to, will make histogram comparison plots between data and sim. Oh and also redshift evolution plots. The scripts which copy/compress/rename external files into the analyse directory are generally named `parse_*.py`. So `parse_cosmomc.py` is the script which finds, reads and compresses the MCMC chains from CosmoMC into the output directory. Then `plot_cosmomc.py` reads those compressed files to make the plots.

Cosmology contours will be blinded when made by looking at the BLIND flag set on the data. For data, this defaults to True.

Note that all the plotting scripts work the same way - `Analyse` generates a small yaml file pointing to all the resources called `input.yml`, and each script uses the same file to make different plots. It is thus super easy to add your own plotting code scripts, and you can specify arbitrary code to execute using the `ADDITIONAL_SCRIPTS` keyword in opts. Just make sure your code takes `input.yml` as an argument. As an example, to rerun the CosmoMC plots, you'd simply have to run `python plot_cosmomc.py input.yml`.

```yaml
ANALYSE:
SOMELABEL:
MASK_COSMOFIT: mask # partial match
MASK_BIASCOR: mask # partial match
MASK_LCFIT: [D_DESSIM, D_DATADES] # Creates histograms and efficiency based off the input LCFIT_SIMNAME matches. Optional
OPTS:
COVOPTS: [ALL, NOSYS] # Optional. Covopts to match when making contours. Single or list. Exact match.
SHIFT: False # Defualt False. Shift all the contours on top of each other
PRIOR: 0.01 # Default to None. Optional normal prior around Om=0.3 to apply for sims if wanted.
ADDITIONAL_SCRIPTS: /somepath/to/your/script.py # Should take the input.yml as an argument
```
Loading

0 comments on commit 0800bfa

Please sign in to comment.