This code is used for the study of a Higgs boson decaying to a pair of tau leptons. The repository includes analyzers, corresponding to the final states being studied: electron-tau (et) and muon-tau (mt).
Organization
File Locations
Processed Files
Quick Start
Objects
Helpers
Plugins
Compiling Plugins
Scripts
- Directories
bin: Contains binaries primarily used for initial processing.include: This is where the majority of the library is contained. Header files for each object are stored here along with headers for corrections and other necessary components.data: All input ROOT files for corrections are contained here. This includes a file containing legacy scale factors for each year, pileup corrections, and Higgs pT reweighting files for MadGraph samples.plugins: This directory contains all C++ plugins using the library. This is currently limited to analyzers, but other plugins may be added.scripts: Scripts used to process the outputs from analyzers are contained here. This includes plotters, datacard builders, and fake weighters.Output: All outputs are stored in the directory. Outputs include plots, fake fractions, TTrees, and datacards for Higgs Combine. These are all stored in subdirectories within theOutputdirectory.
- Other Files
build: A simple bash script for compiling a plugin with the correct libraries (ROOT). The script takes two ordered arguments: the plugin to compile and the name of the output binary. The binary should be copied to your $HOME/bin directoryautomate_analysis.py: Used for analyzing an entire directory. Explained more later
Here are the locations of all currently used files on the Wisconsin cluster. Directory names should be obvious
- /hdfs/store/user/tmitchel/legacy-v5
Location of all files already processed by one of the analyzers. They are ready to be fed to the NN or used directly for making datacards/plots.
- /hdfs/store/user/tmitchel/legacy-v5/analyzed
This section is intended to help someone quickly get started producing plots and datacards. Most details about how to use the repository will be excluded, so read the rest of the instructions for more detailed information.
- Setup a new CMSSW release (must be 104X or greater for python scripts)
cmsrel CMSSW_CMSSW_10_4_0 && cd CMSSW_10_4_0/src && cmsenv
-
Clone and build all necessary repositories, including this one.
- clone this repo
git clone -b stabalize-workflow [email protected]:tmitchel/LTau_Analyzers.git - Compile all repos (none are currently needed)
cd ${CMSSW_BASE}/src scram b -j 8 - Setup a python virtual environment
cd ${CMSSW_BASE}/src/LTau_Analyzers virtualenv .pyenv source .pyenv/bin/activate # this must be done every time you log in source setup/setup-python.sh
- clone this repo
-
Compile the appropriate plugin. For example, to compile the 2018 analyzer for the muon+tau channel, use the following command
./build plugins/mt_analyzer2018.cc bin/analyze2018_mtThis will produce a binary named
analyze2018_mtin thebindirectory that can be used to analyze 2018 mutau events. -
Use the python automation script to processes all files in a directory and produce output trees with selection branches and weight branches
python automate_analysis.py -e bin/analyze2018_mt -p /hdfs/store/user/tmitchel/legacy-v5/mela/mt2018_v1p3 --output-dir mt2018_v5p3 --parallel --systThis command requires some explanation because it has many options. The purpose of this script is to run a provided binary on all *.root files in a given directory. The binary is supplied with the
-eoption and the input directory is supplied with the-poption. Because the analyzers use the name of the input root file to lookup the correct cross-section, any prefixes added to the filenames (looking at you SVFit and MELA) need to be stripped before providing the sample name to the binary. All processed files are stored inOutput/trees/. The--output-diroption can be used to create a new directory inOutput/trees/and store the processed files there. The --parallel option is used to enable multiprocessing. The maximum number of processes at any one time is the minimum(10, ncores / 2). Each input file will recieve it's own process. The final option--systis provided when you want to produce additional output files containing systematic shifts. Within the output directory, nominal files are stored in theNOMINALsub-directory and sub-directories will be created for each systematic with the prefix SYST_. Logs will also be saved in thelogssubdirectory. -
hadd the appropriate files together
python scripts/hadder.py -p Output/trees/mt2018_v5p3This will hadd all files in the directory
Output/trees/mt2018_v5p3into the appropriate files. This will descend into each subdirectory (exluding logs) to hadd files for NOMINAL and all systematics. The hadded files will be stored in the new sub-directory within the current sub-directory namedmerged. -
Fill the fake fractions to be used in plotting and datacard making and store them in the
fake_factorsdirectory.python scripts/fill_fake_fractions.py -i Output/trees/mt2018_v5p3/NOMINAL/merged -y 2018 -t mt_tree -s v5p3 -cThe
-soption is used to add a descriptor to the output file containing the fake fractions. When the-cflag is provided, the newly calculated fractions will be used to create a new ROOT file named 'jetFakes.root' within the input directory. This file contains all necessary data and MC events used for calculating the jetFakes background. A new branch is added containing the fake weight used to appropriately scale events. -
(Optional) Create histograms for plotting.
- Plots and binning are defined in the JSON file
scripts/plotting.json. Multiple plotting scenarios can be defined and then chosen at runtime via a command line option. Thevariableskey defines the variable name along with the binning. Binning is assumed to be [nbins, lowest, highest]. Thezvarkey is used to specify a variable and binning to use for the VBF sub-categorization. The binning forzvaris assumed to be [lowest, edge1, edge2, ...]. - Produce the plots
This will produce an output root file in the
python scripts/produce_histograms.py -e -y 2018 -t mt_tree -i Output/trees/mt2018_v5p3/NOMINAL/merged -d v5p3 -c baselineOutput/histogramsdirectory. The file contains a TDirectory for each category. Inside each category will be a TDirectory for each variable containing histograms for each sample.-dis used to add a descriptive string to the name of the output file.-cis used to specify the plotting scenario to be read fromplotting.json. The-eflag tells the script to use the embedded background instead of ZTT MC.
- Plots and binning are defined in the JSON file
-
(Optional) Create stack plots from previous histograms.
python scripts/autoplot.py --input Output/histograms/htt_mt_emb_noSys_fa3_2018_v5p3.root -c mt -p v5p3 -y 2018This will create stacked histograms for each variable listed at the beginning of the script.
-
Create a 2D datacards for Combine
python scripts/produce_datacards.py -e -y 2018 -t mt_tree -i Output/trees/mt2018_v5p3 --suffix v5p3 -c baselineThis script will produce all the 2D datacards needed for Higgs Combine. VBF category variables and binning are configured using the
scripts/binning.jsonfile where multiple scenarios may be defined. The previous command will use the baseline scenario. Usepython -hto see other options including running with systematics.
All input variables are accessed through factories. These factories group information into physics-motivated classes. Some factories provide access to variables directly (i.e. event_info provides direct access to the event's run number), while others provide access to friend classes that then provide even more physics context to the variables (i.e. electron_factory provides access to electron objects containing information about a single electron). This reduces the mental burden of remembering the meaning of each variable name in your input files. These factories are all contained within the include directory.
The include directory also contains headers providing many useful functions.
- ACWeighter.h provides methods for accessing AC reweighting coefficients for JHU samples. These can then be stored in output TTrees.
- CLParser.h provides the basic command-line parsing capabilities used by plugins
- LumiReweightingStandAlone.h provides helper functions for reading pileup corrections
- slim_tree.h contains the output TTree and defines how it will be filled
- swiss_army_class.h contains useful information with no other home. This includes: luminosities, cross-sections, embedded tracking scale factors, and more.
The plugins directory contains the code for analyzing input skims. These plugins are all compiled using the build script discussed in the "Compiling Plugins" section. A list of currently maintained plugins is shown below:
et_analyzer2016.cc: Used to analyze the 2016 etau channel and produce slimmed trees.et_analyzer2017.cc: Used to analyze the 2017 etau channel and produce slimmed trees.et_analyzer2018.cc: Used to analyze the 2018 etau channel and produce slimmed trees.mt_analyzer2016.cc: Used to analyze the 2016 mutau channel and produce slimmed trees.mt_analyzer2017.cc: Used to analyze the 2017 mutau channel and produce slimmed trees.mt_analyzer2018.cc: Used to analyze the 2018 mutau channel and produce slimmed trees.
The build script is provided to make compilation easier/less verbose. The script takes two input parameters and outputs a compiled binary. The first parameter must be the name of the analyzer to be compiled; the second parameter is the desired name of the output binary. An example is shown below:
./build plugins/et_analyzer.cc bin/Analyze_et
This example compiles the electron-tau channel analyzer to make an executable named Analyze_et. All analyzers are compiled with O3 level optimization as well as linking ROOT and RooFit.
Most scripts cannot be run with the tools provided by the basic CMSSW environment. It is recommended to create a virtual environment for installing all necessary packages.
virtualenv .pyenv
This will create a python2 virtual environment in the directory .pyenv. To activiate this environment, use
source .pyenv/bin/activate # deactivate to return to normal environment
With the environment activated, install all necessary packages (/cvmfs does weird things so it's hard to get a requirements.txt)
source setup/setup-python.sh