Search Site

Current Location

Home / Informatics / Unified Mixed Model
Unified Mixed Model PDF Print E-mail

 

Unified Mixed-Model Method for Association Mapping Association mapping with complex pedigrees, families, founding effects and population structure

 

For association mapping, a given sample may contain either population structure (associated with local adaptation or diversifying selection), or familial relatedness (from recent coancestry), or both.  As population structure and familial relatedness can result in spurious associations, they have constrained the use of association studies in human and plant genetics.  We have recently developed a unified mixed-model method to simultaneously account for multiple levels of both gross level population structure (Q) and finer scale relative kinship (K).   As this new method crosses the boundary between family-based and mixed association samples, it provides a powerful complement to the current methods for association mapping.  The superiority of this novel method in controlling for both Type I and Type II error rates over other method has been demonstrated with both 1) human quantitative gene expression dissection as well as 2) quantitative trait dissection in 277 diverse inbred maize lines with complex familial relationship and population structure.

Q matrix

Q is an n × p population structure incidence matrix where n is the number of individuals assayed and p is the number of populations defined; Q is inferred from Pritchard’s STRUCTURE (Pritchard et al., 2000) estimates with p populations (p is Pritchard’s K). 

K Matrix

Coancestry or kinship coefficients is the probability that two homologous genes are identical by decent. Coancestry coefficients can be estimated at a population level (Θ or Fst) or between two individuals (Θij, between genes randomly sampled, one from individual i and the other from individual j).  Marker based relative kinship estimates have been developed (Loiselle et al., 1995; Lynch and Ritland, 1999; Ritland, 1996; Rousset, 2002) and can be defined as

Fij = (Qij-Qm)/(1-Qm) Θij ,

where Qij is the probability of identity by state for random genes from i and j, and Qm is the average probability of identity by state for genes coming from random individuals in the population from which i and j where drawn.  

SPAGeDi software (Hardy and Vekemans, 2002) was used to estimate the Loiselle (Loiselle et al., 1995) kinship coefficient using our SNP data set.

There will commonly be negative Fij: Negative values between individuals are set to zero as this indicates they are less related than random individuals.  Not setting these values to zero would increase the Type I and Type II error rates in the association test The steps involved in carrying out this approach are as follows:

Step 1. Create a Q matrix

Obtain population structure matrix by running STRUCTURE.  Format the output from STRUCTURE to a text file readable by TASSEL.  

Step 2. Create a K Matrix

Obtain relative kinship matrix by running SPAGeDi.  Set negative values to zero.  Format the output from SPAGeDi to a text file readable by TASSEL.  

Step 3. Generate Candidate SNPs

Import the candidate gene sequence data to TASSEL directly, or format you candidate marker data to a text file readable by TASSEL.  

Step 4. Trait

Format trait data to a text file readable by TASSEL.  

Step 5. Mixed model

Run the mixed model in TASSEL.  Details about how to use TASSEL in general and for mixed model can be found at TASSEL Documentation.

We also have implemented this approach in SAS.   Here is the SAS code.

Key references

Yu, J.*, G. Pressoir*, W.H. Briggs, I. Vroh Bi, M. Yamasaki, J.F. Doebley, M.D. McMullen, B.S. Gaut, D.M. Nielsen, J.B. Holland, S. Kresovich, and E.S. Buckler. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics.  PDF

* Jianming Yu email Jianming Yu and Gael Pressoir email Gael Pressoirt contributed equally to this work

TASSEL software

Mixed model

Henderson, C.R. 1984. Application of linear models in animal breeding Univ. of Guelph, Ontario.

Kennedy, B.W., M. Quinton, and J.A. van Arendonk. 1992. Estimation of effects of single genes on quantitative traits. J Anim Sci 70:2000-12.

Q (Population structure)

Pritchard, J.K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945-59.

STRUCTURE software, http://pritch.bsd.uchicago.edu/structure.html

K (relative kinship)

Loiselle, B.A., V.L. Sork, J. Nason, and C. Graham. 1995. Spatial genetic structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am. J. Bot. 82:1420-1425.

Ritland, K. 1996. Estimators for pairwise relatedness and individual inbreeding coefficients. Genet. Res. 67:175-186.

SPAGeDi software, http://www.ulb.ac.be/sciences/ecoevol/spagedi.html

Human gene expression

Morley, M., C.M. Molony, T.M. Weber, J.L. Devlin, K.G. Ewens, R.S. Spielman, and V.G. Cheung. 2004. Genetic analysis of genome-wide variation in human gene expression. Nature 430:743-7.

SNP Consortium Linkage Map Project database

Maize association mapping

Flint-Garcia, S.A., A. Thuillet, J. Yu, G. Pressoir, S.M. Romero, S.E. Mitchell, J.F. Doebley, S. Kresovich, M.M. Goodman, and E.S. Buckler. 2005. Maize association population: A high resolution platform for QTL dissection. Plant J. 44:1054-1064. PDF

Thornsberry, J.M., M.M. Goodman, J. Doebley, S. Kresovich, D. Nielsen, and E.S. Buckler. 2001. Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet 28:286-9.  PDF

Maize Molecular and Functional Diversity Project, http://www.panzea.org/

 
Joomla Templates by Joomlashack