Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aaron Lorenz Department of Agronomy and Horticulture

Similar presentations

Presentation on theme: "Aaron Lorenz Department of Agronomy and Horticulture"— Presentation transcript:

1 Aaron Lorenz Department of Agronomy and Horticulture
Genomic selection Aaron Lorenz Department of Agronomy and Horticulture

2 Role of markers in crop improvement
Varies by objective, germplasm, trait genetic architecture. Bernardo, 2008

3 Genomic selection Training Population Calibration Set DNA marker data
Phenotypic data Model training Training Population Calibration Set Predict and select No QTL mapping No testing for significant markers I’m not going to give much background on genomic prediction because I think it has become well known enough as a method. I’ll only introduce some terminology. The basic idea is you genotype a large population of indivduals, phenotype them well (this is called a training population or sometimes calibration set), combine all marker and phenotypic data into a single statistical model (termed model training), and use the model to predict the genetic value of individuals that had been genotyped but not phenotyped. You get your predictions and you select on them just as you would on phenotypes. Note that there is no QTL mapping, no delcaration of signficant markers. We’re using all markers…. Selection candidates

4 A genome-wide approach typically provides better predictions
Genomic rA One very nice thing about taking a genome-wide approach is that it typically works better than a QTL mapping/MAS approach. This study from Rex Bernardo’s lab looked at 36 instances of population-trait combinations, and in nearly every case GS does substantially better than MAS. This figures on the right is from a simulation study of my own showing the advantage in prediction accuracy of GS compared to MAS. MAS GS MAS GS MAS rA Lorenz (2013) Lorenzana and Bernardo (2009)

5 Whittaker et al. (2000) When doing MAS, cannot include all the markers, so must select subset of markers to fit. No entirely satisfactory way of doing this exists. Objective is to evaluate ridge regression. Superior to subset selection when objective is to make predictions.

6 Whittaker et al. (2000) Find subset of markers Q. Interested in
Cannot include all markers in Q Increases variance of β If number of markers really large, not enough d.f.

7 Whittaker et al. (2000) Ridge regression – include all variables, but replace normal least-squares estimators with Normal estimates shrunk toward 0 Degree of shrinkage determined by lambda Choose lambda to minimize model error Addition of λI term reduces collinearity and prevents the matrix XTX from becoming singular.

8 Whittaker et al. (2000)

9 MHG 2001 Objective: “Compare statistical methods for their accuracy in predicting total breeding value of individuals in a situation where a limited number of recorded individuals are genotyped for many markers.” - Computer simulation individuals - Need to estimate 50,000 haplotype effects The whole story starts with a simulation study in The authors set out to see how accurately the BVs could be predicted assuming very dense marker data was available, much densier marker data than was available at the time, but they were looking forward. Authors noted arbitrariness of setting marker effect to full value or zero simply because it surpassed some predetermined, and arbitrary, threshold.

10 MHG 2001 r(GEBV:True BV) Genomic selection models

11 Genomic selection models
LARGE p !! Shrinkage models RR-BLUP, G-BLUP Dimension reduction methods Partial least squares Principal component regression Variable selection models BayesB, BayesCπ, BayesDπ Kernel and machine learning methods Support vector machine regression Training population Line Yield Mrk 1 Mrk 2 … Mrk p Line 1 76 1 Line 2 56 Line 3 45 Line 4 67 Line n 22 …and in the day of high-density markers, this means we probably have many more markers than observations, resulting in the well-known large p, small n problem. This means ordintary least squares cannot be used for estimation, but a variety of other more sophisticated models can be used. The most population is RR-BLUP, where markers are treated as random effects to be sampled from a common distribution. That’s all I’ll say about that. smaller n !!

12 Baseline model --More predictors than variables.
--Solution: fit predictors as random effects. -- Constrain possible effects. -- What distribution is β being sampled from?

13 Priors and penalizations (examples)

14 Double exponential distribution
Normal distribution Represent two different assumptions about the underlying distribution of QTL effects

15 de Los Campos et al. (2013) Priors

16 Marker effect estimates
Large-effect QTL simulated Many small-effect QTL simulated BayesCπ I didn’t think that example was illustrative enough, so I simulated some data. Here, we have a large effect QTL present. You can see RR-BLUP shrinks this thing way down, whereas BayesCpi, the variable selection method, allows it to have an effect probably closer to reality. RR-BLUP

17 Comparing marker effects between models

18 G-BLUP Similar to tradition BLUP with pedigrees
Calculate genomic relationship matrix Use genomic relationships in mixed-linear model to predict breeding value of relatives

19 Training Pop. Training Pop. Selection candidates Selection candidates
Relationships between TP and selection candidates leveraged for prediction

20 Equivalency between RR-BLUP and G-BLUP
From MVN distribution properties: Only valid with the normal prior!

21 Predicting prediction accuracy
Daetwyler et al. (2008) Lian et al. (2014) N = training pop size h2 = trait heritability Me = effective number of loci r2 = LD between marker and QTL (see Lian ref)

22 Factors affecting prediction accuracy
Training population size Trait heritability Influence of G x E, precision of measurements Marker density Effective population size of breeding population i.e., genetic diversity of breeding population Genetic relationship between training population and selection candidates Statistical model

23 Effect of relationships: Predicting across populations
1180 polymorphic markers Validation sets Subpop 2 PC 2 Subpop 1 Training sets Here is a typical example. Here we have a PCA plot from marker data of barley lines from three different breeding programs. PC 1 BuschAg University of MN NDSU 6-row

24 Effect of relationships: Presence of relatives in TP
Pred accuracy Mean relationship of top ten relatives Clark et al. (2012)

25 Models typically similar in accuracy
Models also equivalent in: Bernardo and Yu (2007) [Maize] Lorenzana and Bernardo (2009) [Several plant species] Van Raden et al. (2009) [Holstein] Hayes (2009) [Holstein] RR-BLUP BayesCpi Bayesian LASSO Accuracy Despite the different assumptions in genetic architecture made by the different models, and the fact the QTL effects are not of equal size and do have different genetic architectures, including epistasis, the simplest model, RR-BLUP, assuming all QTL effects of the same variance often do just as well, especially in empirical studies, as the more “realistic models”. The reason for this is probably that LD within domesticated species is extensive, and therefore several markers absorb the effect of large-effect QTL, making it seem that many markers control a trait, as RR-BLUP assumes.

26 Why? Extensive LD in plant and animal breeding programs
Perfect situation for G-BLUP Long stretches of genome that are identical by descent means relationships calculated with markers are good indicators of relationships at causal polymorphisms. Extensive LD also means it’s hard for variable selection models to zero in on markers in tight LD with casual polymorphisms. Expect variable selection models will be superior when Individuals are unrelated Very large TP (millions?) Very high marker density so that markers in LD with causal polymorphisms

27 Resources and packages
rrBLUP package Endelman, J.B Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4: Endelman, J.B., and J-L. Jannink Shrinkage estimation of the realized relationship matrix. G3:2:1045 BLR (Bayesian Linear Regression) package Perez et al Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome 3:

28 References Bernardo, R Molecular markers and selection for complex traits in plants: Learning from the last 20 years. Crop Sci 48: Clark, S.A., J.M. Hickey, H.D. Daetwyler and van der Werf, Julius HJ The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes. Genet. Sel. Evol. 44:. Daetwyler, H.D., B. Villanueva and J.A. Woolliams Accuracy of predicting the genetic risk of disease using a genome-wide approach. Plos One 3:. de los Campos, G., J.M. Hickey, R. Pong-Wong, H.D. Daetwyler and M.P.L. Calus Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327-+. Lian, L., A. Jacobson, S. Zhong and R. Bernardo Genomewide prediction accuracy within 969 maize biparental populations. Crop Sci. Lorenz, A.J Resource allocation for maximizing prediction accuracy and genetic gain of genomic selection in plant breeding: A simulation experiment. G3-Genes Genomes Genetics 3: Lorenzana, R.E. and R. Bernardo Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theor. Appl. Genet. 120: Meuwissen, T.H., B.J. Hayes and M.E. Goddard Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: Whittaker, J.C., R. Thompson and M.C. Denham Marker-assisted selection using ridge regression. Genet. Res. 75:

Download ppt "Aaron Lorenz Department of Agronomy and Horticulture"

Similar presentations

Ads by Google