Analysis based on columnflow, law and order.
A couple test tasks are listed below. They might require a valid voms proxy for accessing input data.
# clone the project
git clone --recursive [email protected]:uhh-cms/hh2bbww.git
cd hh2bbww
# source the setup and store decisions in .setups/dev.sh (arbitrary name)
source setup.sh dev
# index existing tasks once to enable auto-completion for "law run"
law index --verbose
# run your first task
# (they are all shipped with columnflow and thus have the "cf." prefix)
law run cf.ReduceEvents \
--version v1 \
--dataset st_tchannel_t_powheg \
--branch 0
# create some plots
law run cf.PlotVariables1D \
--version v1 \
--datasets st_tchannel_t_powheg \
--producers features \
--variables jet1_pt,jet2_pt \
--categories 1e
# create a (test) datacard (CMS-style)
law run cf.CreateDatacards \
--version v1 \
--inference-model default \
--workers 3
(Note: please tell me, if a link or task does not work. I did not test all of them)
Files relevant for the configuration of an analysis are mainly to be found in this folder. The main analysis object is defined here. The analysis object contains multiple configs (at least one config per campaign). Most of the configuration takes place here and defines meta-data like the used datasets, processes, shifts, categories, variables and much more.
At the moment, there are only two configs, the default config config_2017
and a config with reduced event statistics (for test purposes) named config_2017_limited
. Most tasks can use a --config
parameter as an input, e.g.
law run cf.SelectEvents --version v1 --config config_2017
Modules that are used to define event selections are usually named selectors
and can be found in this folder.
The main selector is called default
and can be found here.
You can call the SelectEvents
task using this selector, e.g. via
law run cf.SelectEvents --version v1 --selector default
Modules that are used to produce new columns are usually named producers
and can be found in this folder.
A producer can be used as a part of another calibrator/selector/producer.
You can also call a producer on it's own as part of the ProduceColumns
task, e.g. via
law run cf.ProduceColumns --version v1 --producer example
Modules for machine learning are located in this folder.
Our base ML Model ist defined here and parameters are defined as class parameters.
From the base class, new ML Models can be built via class inheritance. Our default ML Model is the DenseClassifier, which uses the base model and some mixins for additional functionality and overwrites their class parameters. From each ML Model, new models can be derived with different parameters using the derive
method:
DenseClassifier.derive("dense_default", cls_dict={"folds": 5})
All ML Models can be used as part of law tasks, e.g. with
law run cf.MLTraining --version v1 --ml-model dense_default --branches 0
NOTE: the DenseClasifier
already defines, which config, selector and producers are required, so you don't need to add them on the command line.
Modules to prepare datacards are located in this folder. At the moment, there is only the 'default' inference model (here). Datacard are produced via calling
law run cf.CreateDatacards --version v1 --inference-model default
Similar to the ML Models, we can also derive additional inference models using the derive
method.
NOTE: our inference model already defines, which ml_model is required and based on that, producer requirements are resolved automatically per default. It is therefore better to not add them to the task parameters since the dependencies are already a bit complicated.
Analysis-specific tasks are defined in this folder.
The analysis uses many functionalities of columnflow. We rely on:
- tasks defined from columnflow.
- columnflow modules for calibration, selection and production
- convenience functions defined in columnflow (util, columnar_util, config_util)
The law config is located here and takes care of information available to law when calling tasks. In here, we can for example:
- set some defaults for parameters (e.g. when not setting the
--dataset
parameter in a task that needs this parameter, we use thedefault_dataset
parameter instead) - define, in which file system to store outputs for each task
- define, which tasks should be loaded for the analysis
- Source hosted at GitHub
- Report issues, questions, feature requests on GitHub Issues