Presentation on theme: "Genomic selection Aaron Lorenz Department of Agronomy and Horticulture."— Presentation transcript:
Genomic selection Aaron Lorenz Department of Agronomy and Horticulture
Role of markers in crop improvement Bernardo, 2008
Genomic selection DNA marker data Phenotypic data Model training Predict and select Selection candidates Training Population Calibration Set No QTL mapping No testing for significant markers
A genome-wide approach typically provides better predictions Lorenzana and Bernardo (2009) Lorenz (2013) Genomic r A MAS r A MAS GS MAS GS
Whittaker et al. (2000) When doing MAS, cannot include all the markers, so must select subset of markers to fit. No entirely satisfactory way of doing this exists. Objective is to evaluate ridge regression. –Superior to subset selection when objective is to make predictions.
Whittaker et al. (2000) Find subset of markers Q. Interested in Cannot include all markers in Q –Increases variance of β –If number of markers really large, not enough d.f.
Whittaker et al. (2000) Ridge regression – include all variables, but replace normal least-squares estimators with Normal estimates shrunk toward 0 –Degree of shrinkage determined by lambda Choose lambda to minimize model error Addition of λI term reduces collinearity and prevents the matrix X T X from becoming singular.
Whittaker et al. (2000)
Objective: “Compare statistical methods for their accuracy in predicting total breeding value of individuals in a situation where a limited number of recorded individuals are genotyped for many markers.” - Computer simulation individuals - Need to estimate 50,000 haplotype effects MHG 2001
r(GEBV:True BV) Genomic selection models
1.Shrinkage models RR-BLUP, G-BLUP 2.Dimension reduction methods Partial least squares Principal component regression 3.Variable selection models BayesB, BayesCπ, BayesDπ 4.Kernel and machine learning methods Support vector machine regression Training population Line Line Line Line Line n22111 Line Yield Mrk 1 Mrk 2 … Mrk p … LARGE p !! smaller n !!
Baseline model --More predictors than variables. --Solution: fit predictors as random effects. -- Constrain possible effects. -- What distribution is β being sampled from?
Priors and penalizations (examples)
Double exponential distribution Normal distribution Represent two different assumptions about the underlying distribution of QTL effects
G-BLUP Similar to tradition BLUP with pedigrees Calculate genomic relationship matrix Use genomic relationships in mixed-linear model to predict breeding value of relatives
Training Pop. Selection candidates Relationships between TP and selection candidates leveraged for prediction Training Pop. Selection candidates
Equivalency between RR-BLUP and G-BLUP Only valid with the normal prior! From MVN distribution properties:
Predicting prediction accuracy Prediction accuracy: Daetwyler et al. (2008) Lian et al. (2014) N = training pop size h 2 = trait heritability M e = effective number of loci r 2 = LD between marker and QTL (see Lian ref)
Factors affecting prediction accuracy Training population size Trait heritability –Influence of G x E, precision of measurements Marker density Effective population size of breeding population –i.e., genetic diversity of breeding population Genetic relationship between training population and selection candidates Statistical model
BuschAg University of MN NDSU 6-row PC 1 PC polymorphic markers Subpop 1 Subpop 2 Validation sets Training sets Effect of relationships: Predicting across populations
Effect of relationships: Presence of relatives in TP Clark et al. (2012) Pred accuracy Mean relationship of top ten relatives
Models typically similar in accuracy Accuracy Models also equivalent in: Bernardo and Yu (2007) [Maize] Lorenzana and Bernardo (2009) [Several plant species] Van Raden et al. (2009) [Holstein] Hayes (2009) [Holstein] RR-BLUPBayesCpiBayesian LASSO
Why? Extensive LD in plant and animal breeding programs –Perfect situation for G-BLUP –Long stretches of genome that are identical by descent means relationships calculated with markers are good indicators of relationships at causal polymorphisms. –Extensive LD also means it’s hard for variable selection models to zero in on markers in tight LD with casual polymorphisms. Expect variable selection models will be superior when –Individuals are unrelated –Very large TP (millions?) –Very high marker density so that markers in LD with causal polymorphisms
Resources and packages rrBLUP package –cran.r-project.org/web/packages/rrBLUP/rrBLUP.pdf –Endelman, J.B Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4: –Endelman, J.B., and J-L. Jannink Shrinkage estimation of the realized relationship matrix. G3:2:1045 BLR (Bayesian Linear Regression) package –http://bglr.r-forge.r-project.org/http://bglr.r-forge.r-project.org/ –Perez et al Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome 3:
References Bernardo, R Molecular markers and selection for complex traits in plants: Learning from the last 20 years. Crop Sci 48: Clark, S.A., J.M. Hickey, H.D. Daetwyler and van der Werf, Julius HJ The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes. Genet. Sel. Evol. 44:. Daetwyler, H.D., B. Villanueva and J.A. Woolliams Accuracy of predicting the genetic risk of disease using a genome-wide approach. Plos One 3:. de los Campos, G., J.M. Hickey, R. Pong-Wong, H.D. Daetwyler and M.P.L. Calus Whole- genome regression and prediction methods applied to plant and animal breeding. Genetics 193: Lian, L., A. Jacobson, S. Zhong and R. Bernardo Genomewide prediction accuracy within 969 maize biparental populations. Crop Sci. Lorenz, A.J Resource allocation for maximizing prediction accuracy and genetic gain of genomic selection in plant breeding: A simulation experiment. G3-Genes Genomes Genetics 3: Lorenzana, R.E. and R. Bernardo Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theor. Appl. Genet. 120: Meuwissen, T.H., B.J. Hayes and M.E. Goddard Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: Whittaker, J.C., R. Thompson and M.C. Denham Marker-assisted selection using ridge regression. Genet. Res. 75: