The ratio between nonsynonymous and synonymous substitution rates,
- ... under positive (or diversifying) selection (or adaptive evolution), if
$\omega>1$ ; - ... under neutral evolution, if
$\omega=1$ ; - ... under negative (or purifying) selection, if
$\omega<1$ .
As every functional protein has some structural constraints, the
In this repository, you will find the data, the code, and step-by-step guidelines for reproducing the results in the CODEML protocol (Álvarez-Carretero, et al. 2023). We performed all positive selection analyses with CODEML, in the PAML v4.10.6 package (Yang 2007).
We use the alignment and tree files for the myxovirus gene sequences from ten mammal species and two birds (outgroup) analysed by Huo et al. (2007).
In direcyory 00_data, we explain how we downloaded and parsed these sequences before we generated the alignment and the gene tree. Then, we carried out different tests for positive selection under the following models:
-
Homogenous model: all alignment sites and taxa have evolved under the same evolutionary pressure. This model, also known as
M0model, assumes that$\omega$ is constant across all sites and lineages. -
Site models assume that different (amino acid or codon) sites are under different selective pressures and have different
$\omega$ values. Positive selection is detected when a subset of sites in the protein-coding gene have$\omega>1$ . -
Branch models assume that
$\omega$ varies among branches of the phylogeny and positive selection is detected along specific lineages if$\omega$ for the branches is$>1$ . -
Branch-site models assume that
$\omega$ varies among branches of the phylogeny and across sites of the gene, and positive selection is detected if a subset of sites for specific branches of the phylogeny have$\omega>1$ .
In directory 01_protocol_analyses, you can find one directory for each of the analyses mentioned above, with the corresponding README.md file. All code snippets and explanations needed to run CODEML under each scenario according to our protocol (Álvarez-Carretero et al. 2023) are provided. In directory 02_extra_analyses, you can find one directory for each of the new analyses that can be carried out with CODEML when (i) there are several genes that need to be analysed at once and (ii) some of these genes may have missing taxa.
We hope that the protocol will be useful for illustrating the control-file settings and interpretations of the program outputs, enabling you to apply similar analyses to your own data.
If you use any of the code we provide in this GitHub repository or consult the protocol for your own analyses, please cite: