Download presentation

Presentation is loading. Please wait.

255
**Association Mapping versus Genomic Selection**

To discover genes and genetic variants that control a trait Knowledge can be applied understand mechanism, genetic architecture, design pathways with diversity, ideas for transgenic improvement Genomic Selection To identify germplasm with the best breeding values and performance Can identify complementary varieties that should be crossed for future improvement.

256
**Association-based selection methods: Genomic selection**

We have MAS, why do we need something different? Historical introduction to genomic selection The basic idea Methods Theory Selected simulation results Empirical results long-term genomic selection Introgressing diversity using GS

257
**MAS problems Relevant germplasm Bias of estimated effects**

Effects too small for detection

258
**Association mapping identifies QTL rapidly while scanning relevant germplasm**

Intermated recombinant inbreds Relevance to breeding germplasm 5 Positional cloning Near-isogenic lines Depends Low High Recombinant inbred lines Research time (year) Pedigree 1 Association mapping F2 / BC 1 1 x 104 1 x 107 Resolution (bp)

259
**Bias in Effect Estimation**

Significance Threshold Effect Estimate (True + Error) Average “Detected” Effect Estimated Bias Keep in all loci => No threshold => Estimated effects are unbiased

260
**In polygenic traits, much is hidden**

X^2_gamma is a function of the significance threshold at which you want to detect the factors E.g., h2 = 0.8 α = 0.01 1200 Lande & Thompson 1990

261
**Genomic selection principles**

Meuwissen et al Genetics 157: No distinction between “significant” and “non-significant”; no arbitrary inclusion / exclusion: all markers contribute to prediction More effects must be estimated than there are phenotypic observations Estimated effects are unbiased Capture small effects Mention bias again; If time mention verbally the bit about separate identification and estimation data sets By “capture” I don’t mean to say that the regression coefficient associated with a marker is equal to the effect of an underlying QTL but again, it IS unbiased.

262
**Genomic selection: Prediction using many markers**

Train GS Model Genotyping & Phenotyping Training Population Genotyping Breeding Material Make Selections Calculate GEBV Meuwissen et al Genetics 157:

263
**Statistical modeling: The two cultures**

Observed inputs Nature Observed responses X Y Can we understand Y? Can we predict Y? Regression X Y X Y ? Identify causal inputs Regression Decision trees Whatever works Breiman 2001 Stat. Sci. 16:

264
**Need to shorten breeding cycle**

i cumulates over breeding cycles

265
**Phenotypic Selection 3 Years Select Cross 1 Season F1 × Inducer**

Inbreed Phenotype 3 Years 1 Season F1 × Inducer Self DH0 2 Seasons 1 Rep N=2270 S=100 5 Reps N=100 S=10 2 Years Release

266
Genomic Selection Select Cross Inbreed Phenotype 1 Year! Release

267
FastGS Select Cross Inbreed Phenotype 1 Season = ⅓ Year!! Release

268
**Selection Intensities**

Phenotypic N = 2270, S = 10: i = 2.4 FastGS N = 370, S = 43: i = 1.7 9 × i ≅ 15 (!!!) Inbreeding:

269
Rates of gain per year

270
Impacts Schaeffer, L.R Strategy for applying genome-wide selection in dairy cattle. J. Anim. Breed. Genet. 123:

271
**Cost per genetic standard deviation**

Schaeffer 2006 Cost per genetic standard deviation Phenotypic $116 M Genomic $4.2 M

272
Potential Impact Heffner, E.L. et al Genomic Selection for Crop Improvement. Crop Science 49:1-12 Test varieties and release Make crosses and advance generations Genotype New Germplasm Line Development Cycle Genomic Selection Advance lines with highest GEBV Phenotype (lines have already been genotyped) Train prediction model Advance lines informative for model improvement Model Training Cycle Updated Model

273
**What (I think) is revolutionary**

Test varieties and release Make crosses and advance generations Genotype New Germplasm Line Development Cycle Genomic Selection Advance lines with highest GEBV Phenotype (lines have already been genotyped) Train prediction model Advance lines informative for model improvement Model Training Cycle Updated Model Phenotypic Selection For a century, breeding has focused on better ways to evaluate lines. Henceforth it will focus on how to improve a model.

274
**A Focus for Information**

Current pheno–geno data Historical pheno–geno data Linkage and association mapping Biological knowledge Genomic Prediction Model Development Select Cross Cultivar Release Population Improvement

275
**The Alleletarian Revolution**

The breeding line as the focus of evaluation has been dethroned in favor of the allele A line is useful to us only with respect to the alleles it carries Time-honored practice: replicate (progeny test) lines But alleles are replicated regardless of what line carries them

276
**Methods Linear models: Machine learning methods Effects are random**

Methods differ in marker effect priors Machine learning methods Regression trees

277
**Linear models: Priors on coefficients**

Ridge regression BayesB (SSVS) BayesCπ else \mathbf{y} = \mu + \sum_k x_k \beta_k + \mathbf{e} \beta_k \sim \_?\_ P(\beta = 0) = \pi \pi \sim U(0, 1) \sigma^2_\beta \sim \chi^{-2}(\nu_\beta, S_\beta) \sigma^2_k \sim \chi^{-2}(\nu, S) \beta_k \sim N(0, \sigma^2_k) else

278
**Density Var(β) Ridge regression BayesB BayesCπ**

Fig.Priors. Graphical representation of the priors for the variance of marker effects, β, for ridge regression, BayesB, and BayesCπ. Circles represent point mass probabilities; their height on the y-axis has no meaning but is for clarity. Ridge regression uses a single non-zero point mass for the variance of β (circle with black border). BayesB uses a mixture of a point mass at zero of a fixed size, and a continuous scale inverted χ2 distribution. BayesCπ uses a mixture of a point mass at zero and a non-zero point mass. The non-zero value is estimated (horizontal arrows), as is the mixture probability (allowing the size of the zero and non-zero point mass circles to fluctuate inversely to each other).

279
**Machine learning methods**

Random Forests Forest of regression trees Each tree on a bootstrapped sample Nodes split on randomly sampled features Prediction is forest mean Can capture interactions M M M1 1 M2

280
**Additive models and breeding value**

Breeding value = Mean phenotype of progeny Most important parent selection criterion Recombination: parents do not always pass combinations of genes to their progeny > Sum of individual locus effects Linear models capture this; Machine learning methods may not

281
**Theory How accurate will GS be?**

Impact of GS on inbreeding / loss of diversity Genomic selection captures pedigree relatedness among candidates

282
**Prediction accuracy = Correlation(predicted, true)**

R = irAσA rA = corr(selection criterion, breeding value) On simulated data corr(Â, A) is easy On real data: Click for h^2 !

283
**Predict prediction accuracy**

Daetwyler, H.D. et al Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach. PLoS ONE 3:e3395 Assume all loci affecting the trait are known and are independent Assume marker effects are fixed

284
**λ Replicating hurts: 2000 with 1 plot is better than 1000 with 2 plots**

0.02 0.5 0.1 1 2 5 10 20 Replicating hurts: 2000 with 1 plot is better than 1000 with 2 plots \lambda = \frac{n_P}{n_G}

285
**Predict prediction accuracy**

Hayes, B.J. et al Increased accuracy of artificial selection by using the realized relationship matrix. Genetics Research 91:47-60. Detail on the population genetics that drive nG Assume marker effects are random Still assume all markers independent and estimated separately

286
**Analytical approximations**

Daetwyler et al., 2008 Hayes et al., 2009 NP / NG NP / NG

287
Take Homes Even with traits of very low heritability (h2 = 0.01), sufficient nP gives accuracy Replication may not be good The number of loci estimated (nG) is a critical parameter If you don’t know where the QTL are, higher marker coverage requires higher nG N.B. All conclusions assuming only 100% LD!

288
**Genetic diversity loss / inbreeding**

Daetwyler, H.D. et al Inbreeding in genome-wide selection. J. Anim. Breed. Genet. 124: Avoid selecting close relatives together What is the correlation in the estimated breeding value between full sibs? Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection Correlation sibling estimates

289
**Genetic diversity loss / inbreeding**

σ2W > 0 σ2W = 0 _BLUP_ __GS__ Aj = ½AS + ½AD + aj Mendelian sampling term Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection \sigma^2_W = var(\hat{a}_j) \sigma^2_B = var(\frac{1}{2}\hat{A}_S + \frac{1}{2}\hat{A}_D) Correlation sibling estimates

290
**Daetwyler et al. 2007 Take Homes**

Genomic selection captures the Mendelian sampling term. Correlation between the estimates of sibling performance are reduced Co-selection of sibs is reduced Rate of inbreeding / loss of diversity is reduced Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection

291
**A word on pedigree relatedness**

Five individuals, a, b, c, d, and e. a, b, and c unrelated d offspring of a and b e offspring of a and c a b c d e 1 A =

292
**Ridge Regression Habier, D. et al. 2007. Genetics 177:2389-2397**

\hat{a}_i = \sum_k x_k\hat{\beta}_k = \mathbf{X} \mathbf{\hat{\beta}} var(\mathbf{\hat{a}}) = \mathbf{\hat{A}}\hat{\sigma}^2_a = \mathbf{X}\mathbf{X}^T\hat{\sigma}^2_\beta \mathbf{\hat{A}} = \mathbf{X}\mathbf{X}^T\frac{\hat{\sigma}^2_\beta}{\hat{\sigma}^2_a} Hayes, B.J. et al Genetics Research 91:47-60.

293
**Habier et al. simulation set up**

294
**Genetic relationship decays fast**

Prediction from pedigree relationship loses acccuracy very quickly Decay rate is initially more rapid then stabilizes after about 5 generations Rapid initial decay reflects that the closest marker may not be in highest LD with the QTL RR-BLUP accuracy decays more rapidly than Bayes-B because more markers absorb the effect of a QTL Training population here

295
Habier et al Take homes The ability of genomic selection to capture information on genetic relatedness is valuable That information decays rapidly The amount of that information relates to the number of markers fitted by a model: Ridge regression > BayesB Bayes-B captured more LD information: Long-term accuracy: BayesB > Ridge regression

296
**Accuracy due to relationships vs. LD**

Round and square symbols, ridge regression and Bayes-B. Symbols with gray (inside or around) and without, 40 QTL and 200 QTL. Black and non-black symbols, 4000 and 400 markers. Small and large symbols, training population size of 400 and 2000.

297
**Stochastic vs deterministic prediction**

Habier et al. Zhong et al. NP / NG Even though the condition of complete LD between markers is not met stochastic is coming out better than deterministic. The logical hypothesis is that additional accuracy derives from marker ability to account for relatedness

298
**To replicate or not to replicate**

504 Lines replicated once 168 Lines replicated three times Ridge Regression BayesB

299
**Genetic diversity loss / inbreeding**

σ2W > 0 σ2W = 0 _BLUP_ __GS__ Aj = ½AS + ½AD + aj Mendelian sampling term Correlation sibling estimates Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection Capturing relationship Information increases σ2B NOT σ2W

300
**Simulation setting: Meuwissen; Habier; Solberg**

Ne = 100; 1000 generations Mutation / Drift / Recombination equilibrium High marker mutation rate (2.5 x 10-3 / loc / gen); higher “haplotype mutation rate” Mutation effect distribution Gamma (1.66, 0.4): “effective QTL number” is only about 6 (!) > Watch out how you simulate!

301
**Results Prediction accuracy estimated by simulation MHG HFD**

RR-BLUP BayesB These accuracies are ASTOUNDING If h2 = 1, r = 0.71

302
**Noteworthy discussion**

Markers flanking QTL not always in model QTL effects captured by multiple markers No need to “detect” QTL Recombination causes accuracy to decay Faster than if QTL captured by flanking markers Markers far from QTL contribute to capture its effect Ne / 2 markers per Morgan achieves close to maximum accuracy Dependent on high marker mutation rates (?)

303
**Solberg et al. 2008 Density: Number of markers per Morgan SSR: ¼ Ne**

SNP: 4 Ne 8 Ne

304
**Zhong et al. 2009 Zhong, S. et al. 2009. Genetics 182:355-364.**

42 diverse 2-row barley 1040 markers ~ evenly spaced Mating designs to generate 500 high and low LD training dataset 20 or 80 QTL; h2 = 0.4

305
**Ridge regression Vs. BayesB**

QTL: Observed Unobserved 20QTL – HiLD 20QTL – LoLD 80QTL – HiLD 80QTL – LoLD Ridge Regression BayesB Zhong et al. 2009

306
Take-home messages Ridge regression is not affected by the number of QTL / the QTL effect size BayesB performs better with large marker-associated effects Co-linearity is more detrimental to BayesB High marker density and training pop. size? Yes: BayesB No: RR-BLUP

307
VanRaden et al. 2009 VanRaden, P.M. et al Invited Review: Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92:16-24.

308
VanRaden et al. 2009 Some traits have major genes, others do not

309
VanRaden et al. 2009 The larger the training population, the better. Where diminishing returns will begin is not in sight. Predictor

310
**Take Homes Training population requirements very large**

BayesB did not help == no large marker-associated effects == Like the “Case of the missing heritability” in human GWAS studies Are many quantitative traits driven by very low frequency variants? RR would capture this case better than BayesB

311
**Empirical data on crops: TP size**

312
**Empirical data on crops: Marker No.**

313
**Empirical data on Humans: Marker No.**

Out of 295K SNP Yang et al Nat. Genet /ng.608

314
**Long-term genomic selection**

Marker data from elite six-row barley program 880 Markers 100 hidden as additive-effect QTL Evaluate 200 progeny, select 20 Phenotypic compared to genomic selection

315
**Breeding / model update cycles**

Season 1 Season 2 Season 3 Season 4 Season 5 Season 6 Phenotypic Selection Cross & Inbreed Evaluate & Select Cross & Inbreed Evaluate & Select Cross, Inb. Genomic Selection Evaluation is possible every other season. Candidates from every other cycle can be evaluated. There is still a lag: Parents of C2 are selected based on evaluation of C0.

316
**Response in genotypic value**

Mean Genotypic Value Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

317
**Accuracy Mean Realized Accuracy Phenotypic Selection**

Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

318
**Genetic variance Mean Genotypic Standard Deviation**

Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

319
**Lost favorable alleles**

Mean Number Lost Favorable Allleles Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

320
**Goddard 2008; Hayes et al. 2009 \begin{array}{ll}**

\text{Marker Effect Weight} &= f\left[ P(\text{favorable allele}) \right] \\ &= f(p_f) \\ \end{array} =\frac{1}{\sqrt{p_f}}

321
**Response in genotypic value**

Phenotypic Breeding Cycle Mean Genotypic Value Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Unweighted Weighted

322
**Genetic variance Unweighted Weighted Phenotypic Selection**

Phenotypic Breeding Cycle Mean Genotypic Standard Deviation Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Unweighted Weighted

323
**Lost favorable alleles**

Phenotypic Breeding Cycle Mean Number Lost Favorable Alleles Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Unweighted Weighted

324
**Long term genomic selection**

The acceleration of the breeding cycle is key Some favorable alleles will be lost Likely those not in LD with any marker Managing diversity / favorable alleles appears a good idea This can be done using the same data as used for genomic prediction

325
**Introgressing diversity**

GS relies on marker–QTL allele association An “exotic” line comes from a sub-population divergent from the breeding population After sub-populations separate Drift moves allele frequencies independently Drift & recombination shift associations independently Will the GS prediction model identify valuable segments from the exotic?

326
Three approaches Create a bi-parental family with the exotic (Bernardo 2009) Develop a mini-training population for that family Improve the family Bring it into the main breeding population Develop a separate training population for the exotic sub-population (Ødegård et al. 2009) Develop a single multi-subpopulation (species-wide?) training population (Goddard 2006)

327
**Need higher marker density**

Ancestral LD sub-population specific LD Tightly–linked: ancestral LD Loosely–linked: sub-population specific LD

328
**Consistency of association across barley subpopulations**

1.0 0 cM recombination distance 5 cM recombination distance 0.8 0.6 Correlation of r 0.4 0.2 0.0 0.0 0.5 Genetic Distance

329
**Example: Dairy cattle breeds**

330
**Oat sub-populations (UOPN)**

G1 G2 G3 N=136 N=149 N=161

331
**Combined sub-population TP (β-Glucan)**

G2 and G3 0.11 TP VP G3 G1 and G2 0.50 G1 G3 G2 0.39

332
**Introgressing diversity using GS**

Need higher marker density Analysis of consistency of r may indicate whether current density is sufficient Not sure we have it for barley If you have the density, a multi-subpopulation training population seems like a good idea Focuses the model on tighter ancestral LD rather than looser sub-population specific LD

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google