Presentation is loading. Please wait.

Presentation is loading. Please wait.

Similar presentations


Presentation on theme: ""— Presentation transcript:

255 Association Mapping versus Genomic Selection
To discover genes and genetic variants that control a trait Knowledge can be applied understand mechanism, genetic architecture, design pathways with diversity, ideas for transgenic improvement Genomic Selection To identify germplasm with the best breeding values and performance Can identify complementary varieties that should be crossed for future improvement.

256 Association-based selection methods: Genomic selection
We have MAS, why do we need something different? Historical introduction to genomic selection The basic idea Methods Theory Selected simulation results Empirical results long-term genomic selection Introgressing diversity using GS

257 MAS problems Relevant germplasm Bias of estimated effects
Effects too small for detection

258 Association mapping identifies QTL rapidly while scanning relevant germplasm
Intermated recombinant inbreds Relevance to breeding germplasm 5 Positional cloning Near-isogenic lines Depends Low High Recombinant inbred lines Research time (year) Pedigree 1 Association mapping F2 / BC 1 1 x 104 1 x 107 Resolution (bp)

259 Bias in Effect Estimation
Significance Threshold Effect Estimate (True + Error) Average “Detected” Effect Estimated Bias Keep in all loci => No threshold => Estimated effects are unbiased

260 In polygenic traits, much is hidden
X^2_gamma is a function of the significance threshold at which you want to detect the factors E.g., h2 = 0.8 α = 0.01 1200 Lande & Thompson 1990

261 Genomic selection principles
Meuwissen et al Genetics 157: No distinction between “significant” and “non-significant”; no arbitrary inclusion / exclusion: all markers contribute to prediction More effects must be estimated than there are phenotypic observations Estimated effects are unbiased Capture small effects Mention bias again; If time mention verbally the bit about separate identification and estimation data sets By “capture” I don’t mean to say that the regression coefficient associated with a marker is equal to the effect of an underlying QTL but again, it IS unbiased.

262 Genomic selection: Prediction using many markers
Train GS Model Genotyping & Phenotyping Training Population Genotyping Breeding Material Make Selections Calculate GEBV Meuwissen et al Genetics 157:

263 Statistical modeling: The two cultures
Observed inputs Nature Observed responses X Y Can we understand Y? Can we predict Y? Regression X Y X Y ? Identify causal inputs Regression Decision trees Whatever works Breiman 2001 Stat. Sci. 16:

264 Need to shorten breeding cycle
i cumulates over breeding cycles

265 Phenotypic Selection 3 Years Select Cross 1 Season F1 × Inducer
Inbreed Phenotype 3 Years 1 Season F1 × Inducer Self DH0 2 Seasons 1 Rep N=2270 S=100 5 Reps N=100 S=10 2 Years Release

266 Genomic Selection Select Cross Inbreed Phenotype 1 Year! Release

267 FastGS Select Cross Inbreed Phenotype 1 Season = ⅓ Year!! Release

268 Selection Intensities
Phenotypic N = 2270, S = 10: i = 2.4 FastGS N = 370, S = 43: i = 1.7 9 × i ≅ 15 (!!!) Inbreeding:

269 Rates of gain per year

270 Impacts Schaeffer, L.R Strategy for applying genome-wide selection in dairy cattle. J. Anim. Breed. Genet. 123:

271 Cost per genetic standard deviation
Schaeffer 2006 Cost per genetic standard deviation Phenotypic $116 M Genomic $4.2 M

272 Potential Impact Heffner, E.L. et al Genomic Selection for Crop Improvement. Crop Science 49:1-12 Test varieties and release Make crosses and advance generations Genotype New Germplasm Line Development Cycle Genomic Selection Advance lines with highest GEBV Phenotype (lines have already been genotyped) Train prediction model Advance lines informative for model improvement Model Training Cycle Updated Model

273 What (I think) is revolutionary
Test varieties and release Make crosses and advance generations Genotype New Germplasm Line Development Cycle Genomic Selection Advance lines with highest GEBV Phenotype (lines have already been genotyped) Train prediction model Advance lines informative for model improvement Model Training Cycle Updated Model Phenotypic Selection For a century, breeding has focused on better ways to evaluate lines. Henceforth it will focus on how to improve a model.

274 A Focus for Information
Current pheno–geno data Historical pheno–geno data Linkage and association mapping Biological knowledge Genomic Prediction Model Development Select Cross Cultivar Release Population Improvement

275 The Alleletarian Revolution
The breeding line as the focus of evaluation has been dethroned in favor of the allele A line is useful to us only with respect to the alleles it carries Time-honored practice: replicate (progeny test) lines But alleles are replicated regardless of what line carries them

276 Methods Linear models: Machine learning methods Effects are random
Methods differ in marker effect priors Machine learning methods Regression trees

277 Linear models: Priors on coefficients
Ridge regression BayesB (SSVS) BayesCπ else \mathbf{y} = \mu + \sum_k x_k \beta_k + \mathbf{e} \beta_k \sim \_?\_ P(\beta = 0) = \pi \pi \sim U(0, 1) \sigma^2_\beta \sim \chi^{-2}(\nu_\beta, S_\beta) \sigma^2_k \sim \chi^{-2}(\nu, S) \beta_k \sim N(0, \sigma^2_k) else

278 Density Var(β) Ridge regression BayesB BayesCπ
Fig.Priors. Graphical representation of the priors for the variance of marker effects, β, for ridge regression, BayesB, and BayesCπ. Circles represent point mass probabilities; their height on the y-axis has no meaning but is for clarity. Ridge regression uses a single non-zero point mass for the variance of β (circle with black border). BayesB uses a mixture of a point mass at zero of a fixed size, and a continuous scale inverted χ2 distribution. BayesCπ uses a mixture of a point mass at zero and a non-zero point mass. The non-zero value is estimated (horizontal arrows), as is the mixture probability (allowing the size of the zero and non-zero point mass circles to fluctuate inversely to each other).

279 Machine learning methods
Random Forests Forest of regression trees Each tree on a bootstrapped sample Nodes split on randomly sampled features Prediction is forest mean Can capture interactions M M M1 1 M2

280 Additive models and breeding value
Breeding value = Mean phenotype of progeny Most important parent selection criterion Recombination: parents do not always pass combinations of genes to their progeny > Sum of individual locus effects Linear models capture this; Machine learning methods may not

281 Theory How accurate will GS be?
Impact of GS on inbreeding / loss of diversity Genomic selection captures pedigree relatedness among candidates

282 Prediction accuracy = Correlation(predicted, true)
R = irAσA rA = corr(selection criterion, breeding value) On simulated data corr(Â, A) is easy On real data: Click for h^2 !

283 Predict prediction accuracy
Daetwyler, H.D. et al Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach. PLoS ONE 3:e3395 Assume all loci affecting the trait are known and are independent Assume marker effects are fixed

284 λ Replicating hurts: 2000 with 1 plot is better than 1000 with 2 plots
0.02 0.5 0.1 1 2 5 10 20 Replicating hurts: 2000 with 1 plot is better than 1000 with 2 plots \lambda = \frac{n_P}{n_G}

285 Predict prediction accuracy
Hayes, B.J. et al Increased accuracy of artificial selection by using the realized relationship matrix. Genetics Research 91:47-60. Detail on the population genetics that drive nG Assume marker effects are random Still assume all markers independent and estimated separately

286 Analytical approximations
Daetwyler et al., 2008 Hayes et al., 2009 NP / NG NP / NG

287 Take Homes Even with traits of very low heritability (h2 = 0.01), sufficient nP gives accuracy Replication may not be good The number of loci estimated (nG) is a critical parameter If you don’t know where the QTL are, higher marker coverage requires higher nG N.B. All conclusions assuming only 100% LD!

288 Genetic diversity loss / inbreeding
Daetwyler, H.D. et al Inbreeding in genome-wide selection. J. Anim. Breed. Genet. 124: Avoid selecting close relatives together What is the correlation in the estimated breeding value between full sibs? Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection Correlation sibling estimates

289 Genetic diversity loss / inbreeding
σ2W > 0 σ2W = 0 _BLUP_ __GS__ Aj = ½AS + ½AD + aj Mendelian sampling term Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection \sigma^2_W = var(\hat{a}_j) \sigma^2_B = var(\frac{1}{2}\hat{A}_S + \frac{1}{2}\hat{A}_D) Correlation sibling estimates

290 Daetwyler et al. 2007 Take Homes
Genomic selection captures the Mendelian sampling term. Correlation between the estimates of sibling performance are reduced Co-selection of sibs is reduced Rate of inbreeding / loss of diversity is reduced Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection

291 A word on pedigree relatedness
Five individuals, a, b, c, d, and e. a, b, and c unrelated d offspring of a and b e offspring of a and c a b c d e 1 A =

292 Ridge Regression Habier, D. et al. 2007. Genetics 177:2389-2397
\hat{a}_i = \sum_k x_k\hat{\beta}_k = \mathbf{X} \mathbf{\hat{\beta}} var(\mathbf{\hat{a}}) = \mathbf{\hat{A}}\hat{\sigma}^2_a = \mathbf{X}\mathbf{X}^T\hat{\sigma}^2_\beta \mathbf{\hat{A}} = \mathbf{X}\mathbf{X}^T\frac{\hat{\sigma}^2_\beta}{\hat{\sigma}^2_a} Hayes, B.J. et al Genetics Research 91:47-60.

293 Habier et al. simulation set up

294 Genetic relationship decays fast
Prediction from pedigree relationship loses acccuracy very quickly Decay rate is initially more rapid then stabilizes after about 5 generations Rapid initial decay reflects that the closest marker may not be in highest LD with the QTL RR-BLUP accuracy decays more rapidly than Bayes-B because more markers absorb the effect of a QTL Training population here

295 Habier et al Take homes The ability of genomic selection to capture information on genetic relatedness is valuable That information decays rapidly The amount of that information relates to the number of markers fitted by a model: Ridge regression > BayesB Bayes-B captured more LD information: Long-term accuracy: BayesB > Ridge regression

296 Accuracy due to relationships vs. LD
Round and square symbols, ridge regression and Bayes-B. Symbols with gray (inside or around) and without, 40 QTL and 200 QTL. Black and non-black symbols, 4000 and 400 markers. Small and large symbols, training population size of 400 and 2000.

297 Stochastic vs deterministic prediction
Habier et al. Zhong et al. NP / NG Even though the condition of complete LD between markers is not met stochastic is coming out better than deterministic. The logical hypothesis is that additional accuracy derives from marker ability to account for relatedness

298 To replicate or not to replicate
504 Lines replicated once 168 Lines replicated three times Ridge Regression BayesB

299 Genetic diversity loss / inbreeding
σ2W > 0 σ2W = 0 _BLUP_ __GS__ Aj = ½AS + ½AD + aj Mendelian sampling term Correlation sibling estimates Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection Capturing relationship Information increases σ2B NOT σ2W

300 Simulation setting: Meuwissen; Habier; Solberg
Ne = 100; 1000 generations Mutation / Drift / Recombination equilibrium High marker mutation rate (2.5 x 10-3 / loc / gen); higher “haplotype mutation rate” Mutation effect distribution Gamma (1.66, 0.4): “effective QTL number” is only about 6 (!) > Watch out how you simulate!

301 Results Prediction accuracy estimated by simulation MHG HFD
RR-BLUP BayesB These accuracies are ASTOUNDING If h2 = 1, r = 0.71

302 Noteworthy discussion
Markers flanking QTL not always in model QTL effects captured by multiple markers No need to “detect” QTL Recombination causes accuracy to decay Faster than if QTL captured by flanking markers Markers far from QTL contribute to capture its effect Ne / 2 markers per Morgan achieves close to maximum accuracy Dependent on high marker mutation rates (?)

303 Solberg et al. 2008 Density: Number of markers per Morgan SSR: ¼ Ne
SNP: 4 Ne 8 Ne

304 Zhong et al. 2009 Zhong, S. et al. 2009. Genetics 182:355-364.
42 diverse 2-row barley 1040 markers ~ evenly spaced Mating designs to generate 500 high and low LD training dataset 20 or 80 QTL; h2 = 0.4

305 Ridge regression Vs. BayesB
QTL: Observed Unobserved 20QTL – HiLD 20QTL – LoLD 80QTL – HiLD 80QTL – LoLD Ridge Regression BayesB Zhong et al. 2009

306 Take-home messages Ridge regression is not affected by the number of QTL / the QTL effect size BayesB performs better with large marker-associated effects Co-linearity is more detrimental to BayesB High marker density and training pop. size? Yes: BayesB No: RR-BLUP

307 VanRaden et al. 2009 VanRaden, P.M. et al Invited Review: Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92:16-24.

308 VanRaden et al. 2009 Some traits have major genes, others do not

309 VanRaden et al. 2009 The larger the training population, the better. Where diminishing returns will begin is not in sight. Predictor

310 Take Homes Training population requirements very large
BayesB did not help == no large marker-associated effects == Like the “Case of the missing heritability” in human GWAS studies Are many quantitative traits driven by very low frequency variants? RR would capture this case better than BayesB

311 Empirical data on crops: TP size

312 Empirical data on crops: Marker No.

313 Empirical data on Humans: Marker No.
Out of 295K SNP Yang et al Nat. Genet /ng.608

314 Long-term genomic selection
Marker data from elite six-row barley program 880 Markers 100 hidden as additive-effect QTL Evaluate 200 progeny, select 20 Phenotypic compared to genomic selection

315 Breeding / model update cycles
Season 1 Season 2 Season 3 Season 4 Season 5 Season 6 Phenotypic Selection Cross & Inbreed Evaluate & Select Cross & Inbreed Evaluate & Select Cross, Inb. Genomic Selection Evaluation is possible every other season. Candidates from every other cycle can be evaluated. There is still a lag: Parents of C2 are selected based on evaluation of C0.

316 Response in genotypic value
Mean Genotypic Value Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

317 Accuracy Mean Realized Accuracy Phenotypic Selection
Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

318 Genetic variance Mean Genotypic Standard Deviation
Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

319 Lost favorable alleles
Mean Number Lost Favorable Allleles Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

320 Goddard 2008; Hayes et al. 2009 \begin{array}{ll}
\text{Marker Effect Weight} &= f\left[ P(\text{favorable allele}) \right] \\ &= f(p_f) \\ \end{array} =\frac{1}{\sqrt{p_f}}

321 Response in genotypic value
Phenotypic Breeding Cycle Mean Genotypic Value Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Unweighted Weighted

322 Genetic variance Unweighted Weighted Phenotypic Selection
Phenotypic Breeding Cycle Mean Genotypic Standard Deviation Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Unweighted Weighted

323 Lost favorable alleles
Phenotypic Breeding Cycle Mean Number Lost Favorable Alleles Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Unweighted Weighted

324 Long term genomic selection
The acceleration of the breeding cycle is key Some favorable alleles will be lost Likely those not in LD with any marker Managing diversity / favorable alleles appears a good idea This can be done using the same data as used for genomic prediction

325 Introgressing diversity
GS relies on marker–QTL allele association An “exotic” line comes from a sub-population divergent from the breeding population After sub-populations separate Drift moves allele frequencies independently Drift & recombination shift associations independently Will the GS prediction model identify valuable segments from the exotic?

326 Three approaches Create a bi-parental family with the exotic (Bernardo 2009) Develop a mini-training population for that family Improve the family Bring it into the main breeding population Develop a separate training population for the exotic sub-population (Ødegård et al. 2009) Develop a single multi-subpopulation (species-wide?) training population (Goddard 2006)

327 Need higher marker density
Ancestral LD sub-population specific LD Tightly–linked: ancestral LD Loosely–linked: sub-population specific LD

328 Consistency of association across barley subpopulations
1.0 0 cM recombination distance 5 cM recombination distance 0.8 0.6 Correlation of r 0.4 0.2 0.0 0.0 0.5 Genetic Distance

329 Example: Dairy cattle breeds

330 Oat sub-populations (UOPN)
G1 G2 G3 N=136 N=149 N=161

331 Combined sub-population TP (β-Glucan)
G2 and G3 0.11 TP VP G3 G1 and G2 0.50 G1 G3 G2 0.39

332 Introgressing diversity using GS
Need higher marker density Analysis of consistency of r may indicate whether current density is sufficient Not sure we have it for barley If you have the density, a multi-subpopulation training population seems like a good idea Focuses the model on tighter ancestral LD rather than looser sub-population specific LD


Download ppt ""

Similar presentations


Ads by Google