Skip to content

Phenotype Simulator

brakitsch edited this page Dec 5, 2014 · 11 revisions

Phenotype Simulator

Our software package also includes a command-line simulator that allows to generate phenotypes with a wide range of different genetic architectures. In brief, the simulator assumes a linear-additive model, considering the contribution of a randomly selected (causal) genetic region for the set component, polygenic background effects from all remaining genome-wide variants, a contribution from unmeasured factors and iid observation noise. For a detailed description of the simulation procedure, we refer to the Supplementary Methods.

The simulator requires as input the genotypes and the relatedness component:

./mtSet_simPheno --bfile bfile --cfile cfile --pfile pfile

where

  • bfile is the name of of the binary bed file (bfile.bed,bfile.bim,bfile.fam are required).
  • cfile is the name of the covariance matrix file (cfile.cov,cfile.cov.id are required). If none is specified, the covariance matrix is expected to be in the current folder, having the same filename as the bed file.
  • pfile is the name of the output file (pfile.phe,pfile.region). The file pfile.phe contains the phenotypic values (each sample is saved in one row, each trait in one column). The file pfile.region contains the randomly selected causal region (chromsom, start position, end position). If pfile is not specified, the files are saved in the current folder having an automatic generated filename containing the bed filename and the values of all simulation parameters.

By changing the following parameters different genetic architectures can be simulated and, in particular, the simulation experiments of our paper can be reproduced.

Option Default Datatype Explanation
--seed 0 int seed for random number generator
--nTraits 4 int number of simulated phenotypes
--windowSize 1.5e4 int size of causal region
--vTotR 0.05 float variance explained by the causal region
--nCausalR 10 int number of causal variants in the region
--pCommonR 0.8 float percentage of shared causal variants
--vTotBg 0.4 float variance explained by the polygenic background effects
--pHidden 0.6 float residual variance explained by hidden confounders (in %)
--pCommon 0.8 float background and residual signal that is shared across traits (in %)
--chrom None int specifies the chromosome of the causal region
--minPos None int specifies the min. chromosomal position of the causal region (in basepairs)
--maxPos None int specifies the max. chromosomal position of the causal region (in basepairs)

Clone this wiki locally