Search Site

Current Location

Home / Informatics / Statistical Genetics
Statistical Genetics PDF Print E-mail

Advances in genotyping provide both the opportunity and challenge for statistical genetics to dissect phenotypes.  Through well planned experimental designs and cutting edge statistical methods, we are narrowing down target regions in order to better understand biological diversity at the gene level. Two examples of these techniques are the nested association mapping (NAM) population in maize as well as  association analysis with mixed model for simultaneous correction of population structure and unequal relatedness among sampled individuals.

SAS code of joint linkage and association analysis on NAM uses GLMSELECT to do a stepwise regression to build a QTL model followed by a scan of the genome to calculate a LOD score for each marker.

The SAS code of mixed linear model for association analysis can be found here.

P3D and Compression
Using two methods, Population Parameters Previously Determined (P3D) and compression, we can reduce computing time of a mixed linear model (MLM) while maintaining or even increasing statistical power for genome-wide association studies (GWAS). P3D eliminates the need to estimate population parameters (such as variance components) for testing each genetic marker without compromising statistical power.  P3D in a reduced mixed linear model (without markers) is used to screen markers in the second step. For GWAS with millions of markers, population parameters only need to be estimated once.

Compression reduces computing time. The computing time to solve an MLM is proportional to the cube of the number individuals. The compression method clusters the individuals into fewer groups based on the kinship among the individuals. Replacing individuals with the corresponding groups dramatically reduces the computing time. By choosing a suitable number of groups and a method of grouping, the compressed MLM increases the statistical power for GWAS.

These two methods were implemented with TASSEL, SAS and R. The SAS code and demonstration data are described in the supplementary documentation of the Nature Genetics paper. The R package also  implemented genomic prediction through compressed MLM. The software package is named GAPIT (Genomic Association and Prediction Integrated Tool).  

 
Joomla Templates by Joomlashack