We want to check the sensitivity of our conclusions to both the type distribution of fitness effects (DFE) estimation as input for the response variables (proportion of strongly versus midly deleterious mutations), possibly the type of assumptions underlying the regression. In particular we want to rerun the same analysis with DFE estimated based on different parametrizations to check the robustness of our conclusion on covariation of DFE summaries to Ne and generation time:
-
So called
$\Gamma$ or$\Gamma$ + exponential parametrizations. -
Parametrization that just place different probabilities of a mutation in being in a certain range (bins) of
$N_e s$ values ( eg$[ -\infty , -10]$ ,$[-1,0]$ etc).
We use one template analysis Rmarkdown file Report.Rmd that generates reports and a rendering R script that controls the parameters to make as little hard coded decisions as possible while not overly complicating the code:
Report.Rmdfile contains the template for the stats and visuals and expects a tree file for the underlying phylogeny and a so-calledregression (csv) file that stores the DFE parameters along with Ne and generation time estimates for each species.
Note that the list of species name in the tree file and the DFE files need not match perfectly ( these list are intersected). The same goes for generation time estimates ( these might no be always available and accordingly species are dropped from the analysis)
These are stored in the yaml header as
--
params:
csv_file: "scratch/reg_vars.csv" # default can be changed with render;
mytree_file: "scratch/science.abn7829_data_s4.nex.tree" # default;
render_reports_regression_models.Rcontains the list of csv files (one file per fastDFE inference) and will generate standardized reports, one for each DFE summary input.
We Assume:
-
Input files (
csv_fileandmytree_file) are in/scratch. -
All reports will be placed in the
/reports_renderizedsubfolder.
In the R terminal type:
source("render_reports_regression_models.R")
Started on Feb 14, 2026 by TB