NDParticleML

This code seeks to train neural networks to learn likelihood functions (LFs) in the Standard Model effective field theory framework calculated from simulated yields parameterized by Wilson Coefficients and observed yields in the CMS detector.

Overview

The workflow consists of three steps: training, validation, and analysis. Training and validation are completed (although still being improved), and analysis is still in its early stages.

Training: During this step, a neural network (NN) is trained on a sampling of the LF from the LHC CMS experiment to approximate the LF with sufficient accuracy.

Requirements
- A combine sampling of the LF
Products
- A trained NN as a .pt file

Validation: During this step, the trained NN is tested against correct data—combine scans—to analyze its accuracy.

Requirements
- combine low-dimentional scans of the LF
- A trained NN
Products
- Comparison graphs between combine scans and NN scans

Analysis: During this step, we take the trained NN as the LF and explore the 16D parameter space, taking advantage of the speedup over combine samplings.

Requirements
- A trained NN
Products
- Scans over linear combinations of WCs
- TBD

Getting Started

This section will show you how to train and validate a NN in our framework. Warning: For non-NDCMS members, data files need to be obtained via alternative means.

Set up the environment

For NDCMS members, this Google Doc is a good reference for setting up CRC and the CAML GPU cluster. Key steps:

Get a CRC account
Find your personal CRC directory, including your scratch365 space.
Be able to log into CAML's Jupyterhub at https://camlnd.crc.nd.edu:9800/hub/home.
Gain access to the data files on CurateND. It's called "DNNLikelihood Data", and the link is https://curate.nd.edu/show/5m60qr49v54.

For general use:

Make sure to run everything on CUDA.
Have Pytorch installed, version >=1.9.

Train an example NN

Via batch system:

Copy the following into your working directory
- ./archive/v1/training/likelihood.py
- ./archive/v1/training/likelihood.sh
- ./archive/v1/training/likelihood.submit
- ./archive/v1/modules/nn_module_v1.py
- likelihood_data_processed.npz on CurateND
Check if the import statement has the right nn_module name
Run condor_submit likelihood.submit
After finished, there will be graphs and the trained model in their respective folder.

Via Jupyter Notebook:

Move the contents of ./archive/v1/training/likelihood.py into a notebook
Near the bottom, change how the graphs and model are saved. For example, f'./graphs/{args.out_file}.pdf' becomes f'{args.out_file}.pdf'.
Copy the following into your working directory
- The Jupyter Notebook
- ./archive/v1/modules/nn_module_v1.py
- likelihood_data_processed.npz on CurateND
Check if import has the right nn_module name
Run the notebook
After finished, there will be graphs and the trained model in your working directory.

Validate an example NN

Copy the following into the same directory as the xxxx_model+.pt file
- ./archive/v1/validation/Validation.ipynb
- All the likelihood_xxx.npz files
- nn_module_v1.py
Run Validation.ipynb
Graphs should be saved to the same directory

Going Beyond Old Code

Above was the state of the project at the end of summer 2021. Since then, a lot has happened, but the basic structure remains the same. Go into the training and validation folders to learn more about how to execute newer code. archive is a self-contained folder with everything needed to reproduce the project at the end of summer 2021. Outside of archive, everything is in active development.

Here is a brief overview of each folder:

demos: Minimal runnable code that captures essential ideas
models: Files that contain trained NNs, possibly along with validation graphs
modules: Python modules that need to be imported for every code run in this repository
tools: Handy scripts for a variety of tasks of tangential importance to the project
training: Code for training NNs
validation: Code for validating NNs

Additional Notes:

Folders named nb_code contain the raw code of the Jupyter notebooks in the same directory. This is to keep track of meaningful changes in the notebooks. Therefore, please update the raw code every time a notebook is modified by saving the notebook as a .py file.
Data for the validation and analysis graphs, including the 1D and 2D combine scans, are in CurateND.

TODOs

Make sure all validation codes are compatible with the changes associated with compare_plots.
Training
- early stopping
- Try using loss instead of accuracy to select models. Not sure if this will be good, but it seems like the loss keeps decreasing while the accuracy bottoms out.
- Sample the LF automatically, i.e. automatically oversampling regions with high LF.
- Possibly outdated: try using np.triu_indices to compute the quadratic WC terms.
- Possibly not there yet: Try DNN pruning.
Validation
- Use random minimatches for profiling. See the TODO in the profile function in nn_module.
Analysis
- Fit to a hyperellipse, i.e. find the flattest direction, then find the flattest direction in the space orthogonal to the first direction, and so on.
- Talk to theorists to find more applications of linear combination WC scans.
- Explore the topology in higher-dimensional spaces. Start with Parker's notebook /analysis/3d_mapper.ipynb.
General
- Just for organization, save the raw data for graphs using multiindex dataframes. Currently everything is saved in dictionaries of dictionaries.
Papers for onboarding
- TOP-19-001: https://arxiv.org/abs/2012.04120
- DNNLikelihood: https://arxiv.org/abs/1911.03305
- dim6top: https://arxiv.org/abs/1802.07237
- TOP-22-006: https://cds.cern.ch/record/2851651
Papers to study for future directions
- Global SMEFT fit: https://arxiv.org/abs/2105.00006
See #TODOs scattered around the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NDParticleML

Overview

Getting Started

Set up the environment

Train an example NN

Validate an example NN

Going Beyond Old Code

TODOs

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
analysis		analysis
archive		archive
demos		demos
misc		misc
models		models
modules		modules
training		training
validation		validation
README.md		README.md

NDCMS/NDParticleML

Folders and files

Latest commit

History

Repository files navigation

NDParticleML

Overview

Getting Started

Set up the environment

Train an example NN

Validate an example NN

Going Beyond Old Code

TODOs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages