Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association Mapping versus Genomic Selection Association Mapping To discover genes and genetic variants that control a trait Knowledge can be applied understand.

Similar presentations


Presentation on theme: "Association Mapping versus Genomic Selection Association Mapping To discover genes and genetic variants that control a trait Knowledge can be applied understand."— Presentation transcript:

1 Association Mapping versus Genomic Selection Association Mapping To discover genes and genetic variants that control a trait Knowledge can be applied understand mechanism, genetic architecture, design pathways with diversity, ideas for transgenic improvement Genomic Selection To identify germplasm with the best breeding values and performance Can identify complementary varieties that should be crossed for future improvement. 255

2 Association-based selection methods: Genomic selection We have MAS, why do we need something different? Historical introduction to genomic selection – The basic idea – Methods – Theory – Selected simulation results – Empirical results – long-term genomic selection – Introgressing diversity using GS 256

3 MAS problems Relevant germplasm Bias of estimated effects Effects too small for detection 257

4 Resolution (bp) Research time (year) 11 x x Association mapping Positional cloning Recombinant inbred lines Pedigree Intermated recombinant inbreds F2 / BC Near-isogenic lines Relevance to breeding germplasm Depends Low High Association mapping identifies QTL rapidly while scanning relevant germplasm 258

5 Bias in Effect Estimation 259 Effect Estimate (True + Error) Significance Threshold Keep in all loci => No threshold => Estimated effects are unbiased Average “Detected” Effect Estimated Bias

6 In polygenic traits, much is hidden Lande & Thompson 1990 E.g., h 2 = 0.8 α =

7 Genomic selection principles Meuwissen et al Genetics 157: No distinction between “significant” and “non- significant”; no arbitrary inclusion / exclusion: all markers contribute to prediction More effects must be estimated than there are phenotypic observations Estimated effects are unbiased Capture small effects 261

8 Make Selections Calculate GEBV Genotyping Breeding Material Train GS Model Genotyping & Phenotyping Training Population Genomic selection: Prediction using many markers Meuwissen et al Genetics 157:

9 Statistical modeling: The two cultures Breiman 2001 Stat. Sci. 16: Observed inputs Nature Observed responses XY Can we understand Y? Regression XY Identify causal inputs Can we predict Y? XY Regression Decision trees Whatever works ? 263

10 Need to shorten breeding cycle i cumulates over breeding cycles 264

11 Release Select Cross Inbreed Phenotype F 1 × Inducer Self DH 0 2 Seasons 1 Rep N=2270 S=100 5 Reps N=100 S=10 2 Years 1 Season 3 Years Phenotypic Selection 265

12 Release Select Cross Inbreed Phenotype 1 Year! Genomic Selection 266

13 Release Select Cross Inbreed Phenotype FastGS 1 Season = ⅓ Year!! 267

14 Selection Intensities Phenotypic – N = 2270, S = 10:i = 2.4 FastGS – N = 370, S = 43:i = 1.7 – 9 × i ≅ 15 Inbreeding: (!!!) 268

15 Rates of gain per year 269

16 Impacts Schaeffer, L.R Strategy for applying genome-wide selection in dairy cattle. J. Anim. Breed. Genet. 123:

17 Schaeffer Phenotypic Genomic $116 M $4.2 M Cost per genetic standard deviation

18 Potential Impact Test varieties and release Make crosses and advance generations Genotype New Germplasm Line Development Cycle Genomic Selection Advance lines with highest GEBV Phenotype (lines have already been genotyped) Train prediction model Advance lines informative for model improvement Model Training Cycle Updated Model Heffner, E.L. et al Genomic Selection for Crop Improvement. Crop Science 49:

19 What (I think) is revolutionary Test varieties and release Make crosses and advance generations Genotype New Germplasm Line Development Cycle Genomic Selection Advance lines with highest GEBV Phenotype (lines have already been genotyped) Train prediction model Advance lines informative for model improvement Model Training Cycle Updated Model For a century, breeding has focused on better ways to evaluate lines. Henceforth it will focus on how to improve a model. Phenotypic Selection 273

20 A Focus for Information Select Cross Cultivar Release Population Improvement Genomic Prediction Model Development Current pheno–geno data Historical pheno–geno data Linkage and association mapping Biological knowledge 274

21 The Alleletarian Revolution The breeding line as the focus of evaluation has been dethroned in favor of the allele A line is useful to us only with respect to the alleles it carries Time-honored practice: replicate (progeny test) lines But alleles are replicated regardless of what line carries them 275

22 Methods Linear models: – Effects are random – Methods differ in marker effect priors Machine learning methods – Regression trees 276

23 Linear models: Priors on coefficients Ridge regression BayesB (SSVS) BayesCπ 277 else

24 Density Var(β) Ridge regression BayesB BayesCπ 278

25 Machine learning methods Random Forests – Forest of regression trees – Each tree on a bootstrapped sample – Nodes split on randomly sampled features – Prediction is forest mean Can capture interactions 0 M1 1 0 M2 1 M1 01 M

26 Additive models and breeding value Breeding value = Mean phenotype of progeny – Most important parent selection criterion – Recombination: parents do not always pass combinations of genes to their progeny – > Sum of individual locus effects Linear models capture this; Machine learning methods may not 280

27 Theory How accurate will GS be? Impact of GS on inbreeding / loss of diversity Genomic selection captures pedigree relatedness among candidates 281

28 Prediction accuracy = Correlation(predicted, true) R = ir A σ A r A = corr(selection criterion, breeding value) On simulated data corr(Â, A) is easy On real data: 282

29 Predict prediction accuracy Daetwyler, H.D. et al Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach. PLoS ONE 3:e3395 Assume all loci affecting the trait are known and are independent Assume marker effects are fixed 283

30 λ Replicating hurts: 2000 with 1 plot is better than 1000 with 2 plots 284

31 Predict prediction accuracy Hayes, B.J. et al Increased accuracy of artificial selection by using the realized relationship matrix. Genetics Research 91: Detail on the population genetics that drive n G Assume marker effects are random Still assume all markers independent and estimated separately 285

32 Analytical approximations Daetwyler et al., 2008 N P / N G Hayes et al., 2009 N P / N G 286

33 Take Homes Even with traits of very low heritability (h 2 = 0.01), sufficient n P gives accuracy Replication may not be good The number of loci estimated (n G ) is a critical parameter If you don’t know where the QTL are, higher marker coverage requires higher n G N.B. All conclusions assuming only 100% LD! 287

34 Genetic diversity loss / inbreeding Daetwyler, H.D. et al Inbreeding in genome-wide selection. J. Anim. Breed. Genet. 124: Avoid selecting close relatives together What is the correlation in the estimated breeding value between full sibs? 288 Correlation sibling estimates

35 Genetic diversity loss / inbreeding 289 A j = ½A S + ½A D + a j Mendelian sampling term Correlation sibling estimates σ2Bσ2B σ 2 W > 0 σ2Bσ2B σ 2 W = 0 _BLUP_ __GS__

36 Daetwyler et al Take Homes Genomic selection captures the Mendelian sampling term. – Correlation between the estimates of sibling performance are reduced – Co-selection of sibs is reduced – Rate of inbreeding / loss of diversity is reduced 290

37 A word on pedigree relatedness Five individuals, a, b, c, d, and e. – a, b, and c unrelated – d offspring of a and b – e offspring of a and c 291 abcde a100½½ b010½0 c0010½ d½½01¼ e½0½¼1 A =

38 Ridge Regression 292 Habier, D. et al Genetics 177: Hayes, B.J. et al Genetics Research 91:47-60.

39 Habier et al. simulation set up 293

40 Genetic relationship decays fast 294 Training population here Prediction from pedigree relationship loses acccuracy very quickly Decay rate is initially more rapid then stabilizes after about 5 generations Rapid initial decay reflects that the closest marker may not be in highest LD with the QTL RR-BLUP accuracy decays more rapidly than Bayes-B because more markers absorb the effect of a QTL

41 Habier et al Take homes The ability of genomic selection to capture information on genetic relatedness is valuable That information decays rapidly The amount of that information relates to the number of markers fitted by a model: – Ridge regression > BayesB Bayes-B captured more LD information: – Long-term accuracy: BayesB > Ridge regression 295

42 Accuracy due to relationships vs. LD 296

43 Stochastic vs deterministic prediction N P / N G Zhong et al. Habier et al. 297

44 To replicate or not to replicate 504 Lines replicated once168 Lines replicated three times Ridge Regression BayesB 298

45 Genetic diversity loss / inbreeding 299 A j = ½A S + ½A D + a j Mendelian sampling term Correlation sibling estimates σ2Bσ2B σ 2 W > 0 σ2Bσ2B σ 2 W = 0 _BLUP_ __GS__ Capturing relationship Information increases σ 2 B NOT σ 2 W

46 Simulation setting: Meuwissen; Habier; Solberg Ne = 100; 1000 generations Mutation / Drift / Recombination equilibrium High marker mutation rate (2.5 x / loc / gen); higher “haplotype mutation rate” Mutation effect distribution Gamma (1.66, 0.4): “effective QTL number” is only about 6 (!) – > Watch out how you simulate! 300

47 Results Prediction accuracy estimated by simulation MHGHFD RR-BLUP BayesB These accuracies are ASTOUNDING If h 2 = 1, r =

48 Noteworthy discussion Markers flanking QTL not always in model – QTL effects captured by multiple markers – No need to “detect” QTL Recombination causes accuracy to decay – Faster than if QTL captured by flanking markers – Markers far from QTL contribute to capture its effect N e / 2 markers per Morgan achieves close to maximum accuracy – Dependent on high marker mutation rates (?) 302

49 Solberg et al Density: Number of markers per Morgan SSR:¼ N e ½ N e 1 N e 2 N e SNP:1 N e 2 N e 4 N e 8 N e 303

50 Zhong et al Zhong, S. et al Genetics 182: diverse 2-row barley 1040 markers ~ evenly spaced Mating designs to generate 500 high and low LD training dataset 20 or 80 QTL; h 2 =

51 Ridge regression Vs. BayesB Zhong et al QTL – HiLD20QTL – LoLD80QTL – HiLD80QTL – LoLD Ridge Regression BayesB Observed Unobserved QTL: 305

52 Take-home messages Ridge regression is not affected by the number of QTL / the QTL effect size BayesB performs better with large marker- associated effects Co-linearity is more detrimental to BayesB High marker density and training pop. size? Yes: BayesBNo: RR-BLUP 306

53 VanRaden et al VanRaden, P.M. et al Invited Review: Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92:

54 VanRaden et al Some traits have major genes, others do not 308

55 VanRaden et al The larger the training population, the better. Where diminishing returns will begin is not in sight. 309 Predictor

56 Take Homes Training population requirements very large BayesB did not help == no large marker-associated effects == Like the “Case of the missing heritability” in human GWAS studies – Are many quantitative traits driven by very low frequency variants? – RR would capture this case better than BayesB 310

57 Empirical data on crops: TP size 311

58 Empirical data on crops: Marker No. 312

59 Empirical data on Humans: Marker No. Yang et al Nat. Genet /ng.608 Out of 295K SNP 313

60 Long-term genomic selection Marker data from elite six-row barley program 880 Markers 100 hidden as additive-effect QTL Evaluate 200 progeny, select 20 Phenotypic compared to genomic selection 314

61 Breeding / model update cycles Evaluation is possible every other season. Candidates from every other cycle can be evaluated. There is still a lag: Parents of C2 are selected based on evaluation of C0. Season 1Season 2Season 3Season 4Season 5Season 6 Phenotypic Selection Cross & Inbreed Evaluate & Select Cross & Inbreed Evaluate & Select Cross & Inbreed Evaluate & Select Cross & Inbreed Evaluate & Select Cross, Inb. & Select Cross, Inb. & Select Cross, Inb. & Select Cross, Inb. & Select Evaluate Genomic Selection 315

62 Response in genotypic value Phenotypic Breeding Cycle Mean Genotypic Value Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection 316

63 Accuracy Phenotypic Breeding Cycle Mean Realized Accuracy Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection 317

64 Genetic variance Phenotypic Breeding Cycle Mean Genotypic Standard Deviation Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection 318

65 Lost favorable alleles Phenotypic Breeding Cycle Mean Number Lost Favorable Allleles Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection 319

66 Goddard 2008; Hayes et al

67 Response in genotypic value Phenotypic Breeding Cycle Mean Genotypic Value Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle UnweightedWeighted 321

68 Genetic variance Phenotypic Breeding Cycle Mean Genotypic Standard Deviation Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle UnweightedWeighted 322

69 Lost favorable alleles Phenotypic Breeding Cycle Mean Number Lost Favorable Alleles Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle UnweightedWeighted 323

70 Long term genomic selection The acceleration of the breeding cycle is key Some favorable alleles will be lost – Likely those not in LD with any marker Managing diversity / favorable alleles appears a good idea This can be done using the same data as used for genomic prediction 324

71 Introgressing diversity GS relies on marker–QTL allele association An “exotic” line comes from a sub-population divergent from the breeding population After sub-populations separate – Drift moves allele frequencies independently – Drift & recombination shift associations independently Will the GS prediction model identify valuable segments from the exotic? 325

72 Three approaches Create a bi-parental family with the exotic (Bernardo 2009) – Develop a mini-training population for that family – Improve the family – Bring it into the main breeding population Develop a separate training population for the exotic sub-population (Ødegård et al. 2009) Develop a single multi-subpopulation (species- wide?) training population (Goddard 2006) 326

73 Need higher marker density Ancestral LD Tightly–linked: ancestral LD Loosely–linked: sub-population specific LD sub-population specific LD 327

74 0 cM recombination distance 5 cM recombination distance Genetic Distance Correlation of r Consistency of association across barley subpopulations

75 Example: Dairy cattle breeds 329

76 G1G2G3 N=136N=149N=161 Oat sub-populations (UOPN) 330

77 Combined sub-population TP (β-Glucan) G1G2 and G TPVP G3 G1 and G G1 G3 G

78 Introgressing diversity using GS Need higher marker density Analysis of consistency of r may indicate whether current density is sufficient – Not sure we have it for barley If you have the density, a multi-subpopulation training population seems like a good idea – Focuses the model on tighter ancestral LD rather than looser sub-population specific LD 332


Download ppt "Association Mapping versus Genomic Selection Association Mapping To discover genes and genetic variants that control a trait Knowledge can be applied understand."

Similar presentations


Ads by Google