Skip to content

Latest commit

 

History

History
42 lines (25 loc) · 2.14 KB

File metadata and controls

42 lines (25 loc) · 2.14 KB

Set up for Regression model

Goals & Design

We want to check the sensitivity of our conclusions to both the type distribution of fitness effects (DFE) estimation as input for the response variables (proportion of strongly versus midly deleterious mutations), possibly the type of assumptions underlying the regression. In particular we want to rerun the same analysis with DFE estimated based on different parametrizations to check the robustness of our conclusion on covariation of DFE summaries to Ne and generation time:

  • So called $\Gamma$ or $\Gamma$ + exponential parametrizations.

  • Parametrization that just place different probabilities of a mutation in being in a certain range (bins) of $N_e s$ values ( eg $[ -\infty , -10]$, $[-1,0]$ etc).

We use one template analysis Rmarkdown file Report.Rmd that generates reports and a rendering R script that controls the parameters to make as little hard coded decisions as possible while not overly complicating the code:

  • Report.Rmd file contains the template for the stats and visuals and expects a tree file for the underlying phylogeny and a so-calledregression (csv) file that stores the DFE parameters along with Ne and generation time estimates for each species.

Note that the list of species name in the tree file and the DFE files need not match perfectly ( these list are intersected). The same goes for generation time estimates ( these might no be always available and accordingly species are dropped from the analysis)

These are stored in the yaml header as

--
params:
  csv_file: "scratch/reg_vars.csv"  # default can be changed with render; 
  mytree_file: "scratch/science.abn7829_data_s4.nex.tree" # default;
  • render_reports_regression_models.R contains the list of csv files (one file per fastDFE inference) and will generate standardized reports, one for each DFE summary input.

How to run the analysis.

We Assume:

  • Input files (csv_file and mytree_file) are in /scratch.

  • All reports will be placed in the /reports_renderized subfolder.

In the R terminal type:
source("render_reports_regression_models.R")

History

Started on Feb 14, 2026 by TB