Merge pull request #171 from dessn/docs

Update documentation and README
dessn · Jul 31, 2024 · 0800bfa · 0800bfa
2 parents 4fd0994 + 1b4389a
commit 0800bfa
Show file tree

Hide file tree

Showing 24 changed files with 1,263 additions and 2,612 deletions.
diff --git a/.github/workflows/black-formatter.yml b/.github/workflows/black-formatter.yml
@@ -16,4 +16,4 @@ jobs:
         - uses: psf/black@stable
           with:
               options: "--check --verbose --diff"
-              version: "~= 22.0"
+              version: "~= 22.0"
diff --git a/README.md b/README.md
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,31 @@
+[![Documentation](https://readthedocs.org/projects/pippin/badge/?version=latest)](https://pippin.readthedocs.io/en/latest/?badge=latest)
+[![JOSS](https://joss.theoj.org/papers/10.21105/joss.02122/status.svg)](https://doi.org/10.21105/joss.02122)
+[![Zenodo](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.366608-blue)](https://zenodo.org/badge/latestdoi/162215291)
+[![GitHub license](https://img.shields.io/badge/License-MIT-green)](https://github.com/dessn/Pippin/blob/master/LICENSE)
+[![Github Issues](https://img.shields.io/github/issues/dessn/Pippin)](https://github.com/dessn/Pippin/issues)
+![Python Version](https://img.shields.io/badge/Python-3.7%2B-red)
+![Pippin Test](https://github.com/dessn/Pippin/actions/workflows/test-pippin.yml/badge.svg)
+
+# Pippin
+
+Pippin - a pipeline designed to streamline and remove as much hassle as we can when running end-to-end supernova cosmology analyses.
+
+![A Really Funny Meme](_static/images/meme.jpg)
+
+## Table of Contents
+
+:::{toctree}
+:maxdepth: 2
+:hidden:
+
+self
+:::
+
+:::{toctree}
+:maxdepth: 2
+
+src/install.md
+src/usage.md
+src/tasks.md
+src/dev.md
+:::
diff --git a/docs/conf.py b/docs/conf.py
@@ -31,9 +31,20 @@
 # ones.
 extensions = [
     'sphinx_rtd_theme',
-    'myst_parser'
+    'sphinx_rtd_dark_mode',
+    'myst_parser',
+    'sphinxcontrib.youtube',
 ]
 
+myst_enable_extensions = [
+    "substitution",
+    "colon_fence",
+]
+
+myst_substitutions = {
+    "patrick": "[Patrick Armstrong](https://github.com/OmegaLambda1998)"
+}
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
 

diff --git a/docs/index.md b/docs/index.md
diff --git a/docs/index.rst b/docs/index.rst
@@ -0,0 +1,2 @@
+.. include:: README.md
+    :parser: myst_parser.sphinx_
diff --git a/docs/install.rst b/docs/install.rst
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1,3 +1,5 @@
 sphinx<8
 sphinx_rtd_theme
+sphinx-rtd-dark-mode
 myst-parser
+sphinxcontrib-youtube
diff --git a/docs/src/dev.md b/docs/src/dev.md
@@ -0,0 +1,89 @@
+# Pippin Development
+
+## Issues and Contributing to Pippin
+
+Contributing to Pippin or raising issues is easy. Here are some ways you can do it, in order of preference:
+
+1. Submit an [issue on Github](https://github.com/dessn/Pippin/issues), and then submit a pull request to fix that issue.
+2. Submit an [issue on Github](https://github.com/dessn/Pippin/issues), and then wait until I have time to look at it. Hopefully thats quickly, but no guarantees.
+3. Email me with a feature request
+
+If you do want to contribute code, fantastic. [Please note that all code in Pippin is subject to the Black formatter](https://black.readthedocs.io/en/stable/). I would recommend installing this yourself because it's a great tool.
+
+![Developer Documentation Below](../_static/images/developer.jpg)
+
+## Coding style
+
+Please, for the love of god, don't code this up in vim/emacs on a terminal connection[^1]. Use a proper IDE (I recommend PyCharm or VSCode), and **install the Black extension**! I have Black set up in PyCharm as a file watcher, and all python files, on save, are automatically formatted. Use 160 characters a linewidth. Here is the Black file watcher config:
+
+![Black config](../_static/images/black.jpg)
+
+If everyone does this, then all files should remain consistent across different users.
+
+[^1]: {{patrick}}: Since taking over as primary developer, I have done nothing but code this up in vim on a terminal connection. It's not the worst thing you could possibly do. There's a [Black Linter](https://github.com/dessn/Pippin/actions/workflows/black-formatter.yml) github action which will trigger on pull requests to main, allowing you to format your contributions before merging.
+
+## Testing valid config in Pippin
+
+To ensure we don't break things when pushing out new code, the tests directory contains a set of  tests progressively increasing in pipeline complexity, designed to ensure that existing config files act consistently regardless of code changes. Any failure in the tests means a break in backwards compatibility and should be discussed before being incorporated into a release.
+
+To run the tests, in the top level directory, simply run:
+
+`pytest -v .`
+
+## Adding a new task
+
+Alright there, you want to add a new task to Pippin? Great. Here's what you've got to do:
+
+1. Create an implementation of the `Task` class, can keep it empty for now.
+2. Figure out where it goes - in `manager.py` at the top you can see the current stages in Pippin. You'll probably need to figure out where it should go. Once you have figured it out, import the task and slot it in.
+3. Back in your new class that extends Task, you'll notice you have a few methods to implement:
+    1. `_run()`: Kick the task off, report True or False for successful kicking off. To help with determining the hash and whether the task shoudl run, there are a few handy functions: `_check_regenerate`, `get_hash_from_string`, `save_hash`, `get_hash_from_files`, `get_old_hash`. See, for example, the <project:./tasks/analyse.md> task for an example on how I use these.
+    2. `_check_completion(squeue)`: Check to see if the task (whether its being rerun or not) is done. Normally I do this by checking for a done file, which contains either SUCCESS or FAILURE. For example, if submitting a script to a queuing system, I might have this after the primary command:
+        ```sh
+        if [ $? -eq 0 ]; then
+            echo SUCCESS > {done_file}
+        else
+            echo FAILURE > {done_file}
+        fi
+        ```
+        This allows me to easily see if a job failed or passed. On failure, I then generally recommend looking through the task logs and trying to figure out what went wrong, so you can present a useful message to your user. 
+        To then show that error, or **ANY MESSAGE TO THE USER**, use the provided logger: 
+            `self.logger.error("The task failed because of this reason")`. 
+
+        This method should return either a) Task.FINISHED_FAILURE, Task.FINISHED_SUCCESS, or alternatively the number of jobs still in the queue, which you could figure out because I pass in all jobs the user has
+        active in the variable squeue (which can sometimes be None).
+    3. `get_tasks(task_config, prior_tasks, output_dir, stage_num, prefix, global_config)`: From the given inputs, determine what tasks should be created, and create them, and then return them in a list. For context,
+    here is the code I use to determine what simulation tasks to create:
+        ```python
+        @staticmethod
+        def get_tasks(config, prior_tasks, base_output_dir, stage_number, prefix, global_config):
+            tasks = []
+            for sim_name in config.get("SIM", []):
+                sim_output_dir = f"{base_output_dir}/{stage_number}_SIM/{sim_name}"
+                s = SNANASimulation(sim_name, sim_output_dir, f"{prefix}_{sim_name}", config["SIM"][sim_name], global_config)
+                Task.logger.debug(f"Creating simulation task {sim_name} with {s.num_jobs} jobs, output to {sim_output_dir}")
+                tasks.append(s)
+            return tasks
+        ```
+
+## Adding a new classifier
+
+Alright, so what if we're not after a brand new task, but just adding another classifier. Well, its easier to do, and I recommend looking at 
+`nearest_neighbor_python.py` for something to copy from. You'll see we have the parent Classifier class, I write out the slurm script that
+would be used, and then define the `train` and `predict` method (which both invoke a general `classify` function in different ways, you can do this
+however you want.)
+
+You'll also notice a very simply `_check_completion` method, and a `get_requirmenets` method. The latter returns a two-tuple of booleans, indicating 
+whether the classifier needs photometry and light curve fitting results respectively. For the NearestNeighbour code, it classifies based
+only on SALT2 features, so I return `(False, True)`.
+You can also define a `get_optional_requirements` method which, like `get_requirements`, returns a two-tuple of booleans, indicating whether the classifer needs photometry and light curve fitting results *for this particular run*. By default, this method returns:
+- `True, True` if `OPTIONAL_MASK` set in `OPTS`
+- `True, False` if `OPTIONAL_MASK_SIM` set in `OPTS`
+- `False, True` if `OPTIONAL_MASK_FIT` set in `OPTS`
+- `False, False` otherwise.
+
+If you define your own method based on classifier specific requirements, then these `OPTIONAL_MASK*` keys can still be set to choose which tasks are optionally included. If there are not set, then the normal `MASK`, `MASK_SIM`, and `MASK_FIT` are used instead. Note that if *no* masks are set then *every* sim or lcfit task will be included.
+
+Finally, you'll need to add your classifier into the ClassifierFactory in `classifiers/factory.py`, so that I can link a class name
+in the YAML configuration to your actual class. Yeah yeah, I could use reflection or dynamic module scanning or similar, but I've had issues getting
+the behaviour consistent across systems and conda environments, so we're doing it the hard way.
diff --git a/docs/src/install.md b/docs/src/install.md
@@ -0,0 +1,29 @@
+# Installation
+
+If you're using a pre-installed version of Pippin - like the one on Midway, ignore this.
+
+If you're not, installing Pippin is simple.
+
+1. Checkout Pippin
+2. Ensure you have the dependencies install (`pip install -r requirements.txt`) and that your python version is 3.7+.
+3. Celebrate
+
+There is no need to attempt to install Pippin like a package (no `python setup.py install`), just run from the clone.
+
+Now, Pippin also interfaces with other software, including:
+- [SNANA](https://github.com/RickKessler/SNANA)
+- [SuperNNova](https://github.com/supernnova/SuperNNova)
+- [SNIRF](https://github.com/evevkovacs/ML-SN-Classifier)
+- [DataSkimmer](https://github.com/supernnova/DES_SNN)
+- [SCONE](https://github.com/helenqu/scone)
+
+When it comes to installing SNANA, the best method is to already have it installed on a high performance server you have access to[^1]. However, installing the other software used by Pippin should be far simpler. Taking [SuperNNova](https://github.com/supernnova/SuperNNova) as an example:
+
+1. In an appropriate directory `git clone https://github.com/SuperNNova/SuperNNova`
+2. Create a GPU conda env for it: `conda create --name snn_gpu --file env/conda_env_gpu_linux64.txt`
+3. Activate environment and install natsort: `conda activate snn_gpu` and `conda install --yes natsort`
+
+Then, in the Pippin global configuration file, [cfg.yml](https://github.com/dessn/Pippin/blob/4fd0994bc445858bba83b2e9e5d3fcb3c4a83120/cfg.yml) in the top level directory, ensure that the `SuperNNova: location` path is pointing to where you just cloned SNN into. You will need to install the other external software packages if you want to use them, and you do not need to install any package you do not explicitly request in a config file[^2].
+
+[^1]: {{patrick}}: I am ***eventually*** going to attempt to create an SNANA docker image, but that's likely far down the line.
+[^2]: {{patrick}}: If Pippin is complaining about a missing software package which you aren't using, please file an issue.
diff --git a/docs/tasks.rst → docs/src/tasks.md b/docs/tasks.rst → docs/src/tasks.md
@@ -1,12 +1,20 @@
-#####
-Tasks
-#####
+# Tasks
 
 Pippin is essentially a wrapper around many different tasks. In this section, I'll try and explain how tasks are related to each other, and what each task is.
 
 As a general note, most tasks have an ``OPTS`` section where most details go. This is partially historical, but essentially properties that Pippin uses to determine how to construct tasks (like ``MASK``, classification mode, etc) are top level, and the Task itself gets passed everything inside OPTS to use however it wants.
 
-.. toctree::
-    :maxdepth: 2
+:::{toctree}
+:maxdepth: 1
 
-    tasks/dataprep
+tasks/dataprep.md
+tasks/sim.md
+tasks/lcfit.md
+tasks/classify.md
+tasks/agg.md
+tasks/merge.md
+tasks/biascor.md
+tasks/createcov.md
+tasks/cosmofit.md
+tasks/analyse.md
+:::
diff --git a/docs/src/tasks/agg.md b/docs/src/tasks/agg.md
@@ -0,0 +1,20 @@
+# 4. AGGREGATION
+
+The aggregation task takes results from one or more classification tasks (that have been run in predict mode on the same dataset) and generates comparisons between the classifiers (their correlations, PR curves, ROC curves and their calibration plots). Additionally, it merges the results of the classifiers into a single csv file, mapping SNID to one column per classifier.
+
+```yaml
+AGGREGATION:
+  SOMELABEL:
+    MASK: mask  # Match sim AND classifier
+    MASK_SIM: mask # Match only sim
+    MASK_CLAS: mask # Match only classifier
+    RECALIBRATION: SIMNAME # Optional, use this simulation to recalibrate probabilities. Default no recal.
+    # Optional, changes the probability column name of each classification task listed into the given probability column name.
+    # Note that this will crash if the same classification task is given multiple probability column names.
+    # Mostly used when you have multiple photometrically classified samples
+    MERGE_CLASSIFIERS:
+      PROB_COLUMN_NAME: [CLASS_TASK_1, CLASS_TASK_2, ...]
+    OPTS:
+      PLOT: True # Default True, make plots
+      PLOT_ALL: False # Default False. Ie if RANSEED_CHANGE gives you 100 sims, make 100 set of plots.
+```
diff --git a/docs/src/tasks/analyse.md b/docs/src/tasks/analyse.md
@@ -0,0 +1,20 @@
+# 9. ANALYSE
+
+The final step in the Pippin pipeline is the Analyse task. It creates a final output directory, moves relevant files into it, and generates extra plots. It will save out compressed CosmoMC chains and the plotting scripts (so you can download the entire directory and customise it without worrying about pointing to external files), it will copy in Hubble diagrams, and - depending on if you've told it to, will make histogram comparison plots between data and sim. Oh and also redshift evolution plots. The scripts which copy/compress/rename external files into the analyse directory are generally named `parse_*.py`. So `parse_cosmomc.py` is the script which finds, reads and compresses the MCMC chains from CosmoMC into the output directory. Then `plot_cosmomc.py` reads those compressed files to make the plots. 
+
+Cosmology contours will be blinded when made by looking at the BLIND flag set on the data. For data, this defaults to True.
+
+Note that all the plotting scripts work the same way - `Analyse` generates a small yaml file pointing to all the  resources called `input.yml`, and each script uses the same file to make different plots. It is thus super easy to add your own plotting code scripts, and you can specify arbitrary code to execute using the `ADDITIONAL_SCRIPTS` keyword in opts. Just make sure your code takes `input.yml` as an argument. As an example, to rerun the CosmoMC plots, you'd simply have to run `python plot_cosmomc.py input.yml`.
+
+```yaml
+ANALYSE:
+  SOMELABEL:
+    MASK_COSMOFIT: mask  # partial match
+    MASK_BIASCOR: mask # partial match
+    MASK_LCFIT: [D_DESSIM, D_DATADES] # Creates histograms and efficiency based off the input LCFIT_SIMNAME matches. Optional
+    OPTS:
+      COVOPTS: [ALL, NOSYS] # Optional. Covopts to match when making contours. Single or list. Exact match.
+      SHIFT: False  # Defualt False. Shift all the contours on top of each other
+      PRIOR: 0.01  # Default to None. Optional normal prior around Om=0.3 to apply for sims if wanted.
+      ADDITIONAL_SCRIPTS: /somepath/to/your/script.py  # Should take the input.yml as an argument
+```
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		.. include:: README.md
		:parser: myst_parser.sphinx_