Slides:



Advertisements
Similar presentations
193 G 10. G 194 G G 10 G 197 G
Advertisements

15 The Genetic Basis of Complex Inheritance
BIOL EVOLUTION AT MORE THAN ONE GENE SO FAR Evolution at a single locus No interactions between genes One gene - one trait REAL evolution: 10,000.
The genetic dissection of complex traits
Lecture 8 Short-Term Selection Response
Reliable genomic evaluations across breeds and borders Sander de Roos CRV, the Netherlands.
#250.
What is an association study? Define linkage disequilibrium
Field of Velosity CA=292 degh v = 0 mm. Field of Velosity CA=293 degh v = 0.14 mm.
Planning breeding programs for impact
Phenotypes for training and validation of genome wide selection methods K G DoddsAgResearch, Invermay B AuvrayAgResearch, Invermay P R AmerAbacusBio, Dunedin.
Software for Incorporating Marker Data in Genetic Evaluations Kathy Hanford U.S. Meat Animal Research Center Agricultural Research Service U.S. Department.
MARKER ASSISTED SELECTION Individuals carrying the trait of interest are selected based on a marker which is linked to the trait and not on the trait itself.
Frary et al. Advanced Backcross QTL analysis of a Lycopersicon esculentum x L. pennellii cross and identification of possible orthologs in the Solanaceae.
Association Mapping as a Breeding Strategy
Qualitative and Quantitative traits
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
ASSOCIATION MAPPING WITH TASSEL Presenter: VG SHOBHANA PhD Student CPMB.
Genomic Tools for Oat Improvement
PBG 650 Advanced Plant Breeding
Aaron Lorenz Department of Agronomy and Horticulture
Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Basics of Linkage Analysis
1 15 The Genetic Basis of Complex Inheritance. 2 Multifactorial Traits Multifactorial traits are determined by multiple genetic and environmental factors.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Use of Quantitative Trait Loci (QTL) in Dairy Sire Selection Fabio Monteiro de Rezende Universidade Federal Rural de Pernambuco (UFRPE) - Brazil.
Quantitative Genetics
Quantitative Genetics
Mark E. Sorrells & Elliot Heffner Department of Plant Breeding & Genetics Association Breeding Strategies for Crop Improvement.
Mating Programs Including Genomic Relationships and Dominance Effects Chuanyu Sun 1, Paul M. VanRaden 2, Jeff R. O'Connell 3 1 National Association of.
Chuanyu Sun Paul VanRaden National Association of Animal breeders, USA Animal Improvement Programs Laboratory, USA Increasing long term response by selecting.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Module 7: Estimating Genetic Variances – Why estimate genetic variances? – Single factor mating designs PBG 650 Advanced Plant Breeding.
Multifactorial Traits
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
INTRODUCTION TO ASSOCIATION MAPPING
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Association between genotype and phenotype
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Council on Dairy Cattle Breeding April 27, 2010 Interpretation of genomic breeding values from a unified, one-step national evaluation Research project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Use of breeding populations to detect and use QTL Jean-Luc Jannink Iowa State University 2006 American Oat Workers Conference Fargo, ND24 July 2006.
Lecture 22: Quantitative Traits II
VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
MULTIPLE GENES AND QUANTITATIVE TRAITS
PBG 650 Advanced Plant Breeding
Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos.
Genome Wide Association Studies using SNP
Mapping Quantitative Trait Loci
MULTIPLE GENES AND QUANTITATIVE TRAITS
Basic concepts on population genetics
Genome-wide Association Studies
What are BLUP? and why they are useful?
Chapter 7 Beyond alleles: Quantitative Genetics
Perspectives from Human Studies and Low Density Chip
Using Haplotypes in Breeding Programs
Precision animal breeding
The Basic Genetic Model
Presentation transcript:

Association Mapping versus Genomic Selection To discover genes and genetic variants that control a trait Knowledge can be applied understand mechanism, genetic architecture, design pathways with diversity, ideas for transgenic improvement Genomic Selection To identify germplasm with the best breeding values and performance Can identify complementary varieties that should be crossed for future improvement.

Association-based selection methods: Genomic selection We have MAS, why do we need something different? Historical introduction to genomic selection The basic idea Methods Theory Selected simulation results Empirical results long-term genomic selection Introgressing diversity using GS

MAS problems Relevant germplasm Bias of estimated effects Effects too small for detection

Association mapping identifies QTL rapidly while scanning relevant germplasm Intermated recombinant inbreds Relevance to breeding germplasm 5 Positional cloning Near-isogenic lines Depends Low High Recombinant inbred lines Research time (year) Pedigree 1 Association mapping F2 / BC 1 1 x 104 1 x 107 Resolution (bp)

Bias in Effect Estimation Significance Threshold Effect Estimate (True + Error) Average “Detected” Effect Estimated Bias Keep in all loci => No threshold => Estimated effects are unbiased

In polygenic traits, much is hidden X^2_gamma is a function of the significance threshold at which you want to detect the factors E.g., h2 = 0.8 α = 0.01 1200 Lande & Thompson 1990

Genomic selection principles Meuwissen et al. 2001 Genetics 157:1819-1829 No distinction between “significant” and “non-significant”; no arbitrary inclusion / exclusion: all markers contribute to prediction More effects must be estimated than there are phenotypic observations Estimated effects are unbiased Capture small effects Mention bias again; If time mention verbally the bit about separate identification and estimation data sets By “capture” I don’t mean to say that the regression coefficient associated with a marker is equal to the effect of an underlying QTL but again, it IS unbiased.

Genomic selection: Prediction using many markers Train GS Model Genotyping & Phenotyping Training Population Genotyping Breeding Material Make Selections Calculate GEBV Meuwissen et al. 2001 Genetics 157:1819-1829

Statistical modeling: The two cultures Observed inputs Nature Observed responses X Y Can we understand Y? Can we predict Y? Regression X Y X Y ? Identify causal inputs Regression Decision trees Whatever works Breiman 2001 Stat. Sci. 16:199-231

Need to shorten breeding cycle i cumulates over breeding cycles

Phenotypic Selection 3 Years Select Cross 1 Season F1 × Inducer Inbreed Phenotype 3 Years 1 Season F1 × Inducer Self DH0 2 Seasons 1 Rep N=2270 S=100 5 Reps N=100 S=10 2 Years Release

Genomic Selection Select Cross Inbreed Phenotype 1 Year! Release

FastGS Select Cross Inbreed Phenotype 1 Season = ⅓ Year!! Release

Selection Intensities Phenotypic N = 2270, S = 10: i = 2.4 FastGS N = 370, S = 43: i = 1.7 9 × i ≅ 15 (!!!) Inbreeding:

Rates of gain per year

Impacts Schaeffer, L.R. 2006. Strategy for applying genome-wide selection in dairy cattle. J. Anim. Breed. Genet. 123:218-223.

Cost per genetic standard deviation Schaeffer 2006 Cost per genetic standard deviation Phenotypic $116 M Genomic $4.2 M

Potential Impact Heffner, E.L. et al. 2009. Genomic Selection for Crop Improvement. Crop Science 49:1-12 Test varieties and release Make crosses and advance generations Genotype New Germplasm Line Development Cycle Genomic Selection Advance lines with highest GEBV Phenotype (lines have already been genotyped) Train prediction model Advance lines informative for model improvement Model Training Cycle Updated Model

What (I think) is revolutionary Test varieties and release Make crosses and advance generations Genotype New Germplasm Line Development Cycle Genomic Selection Advance lines with highest GEBV Phenotype (lines have already been genotyped) Train prediction model Advance lines informative for model improvement Model Training Cycle Updated Model Phenotypic Selection For a century, breeding has focused on better ways to evaluate lines. Henceforth it will focus on how to improve a model.

A Focus for Information Current pheno–geno data Historical pheno–geno data Linkage and association mapping Biological knowledge Genomic Prediction Model Development Select Cross Cultivar Release Population Improvement

The Alleletarian Revolution The breeding line as the focus of evaluation has been dethroned in favor of the allele A line is useful to us only with respect to the alleles it carries Time-honored practice: replicate (progeny test) lines But alleles are replicated regardless of what line carries them

Methods Linear models: Machine learning methods Effects are random Methods differ in marker effect priors Machine learning methods Regression trees

Linear models: Priors on coefficients Ridge regression BayesB (SSVS) BayesCπ else \mathbf{y} = \mu + \sum_k x_k \beta_k + \mathbf{e} \beta_k \sim \_?\_ P(\beta = 0) = \pi \pi \sim U(0, 1) \sigma^2_\beta \sim \chi^{-2}(\nu_\beta, S_\beta) \sigma^2_k \sim \chi^{-2}(\nu, S) \beta_k \sim N(0, \sigma^2_k) else

Density Var(β) Ridge regression BayesB BayesCπ Fig.Priors. Graphical representation of the priors for the variance of marker effects, β, for ridge regression, BayesB, and BayesCπ. Circles represent point mass probabilities; their height on the y-axis has no meaning but is for clarity. Ridge regression uses a single non-zero point mass for the variance of β (circle with black border). BayesB uses a mixture of a point mass at zero of a fixed size, and a continuous scale inverted χ2 distribution. BayesCπ uses a mixture of a point mass at zero and a non-zero point mass. The non-zero value is estimated (horizontal arrows), as is the mixture probability (allowing the size of the zero and non-zero point mass circles to fluctuate inversely to each other).

Machine learning methods Random Forests Forest of regression trees Each tree on a bootstrapped sample Nodes split on randomly sampled features Prediction is forest mean Can capture interactions 0 M1 1 0 M2 1 M1 1 M2

Additive models and breeding value Breeding value = Mean phenotype of progeny Most important parent selection criterion Recombination: parents do not always pass combinations of genes to their progeny > Sum of individual locus effects Linear models capture this; Machine learning methods may not

Theory How accurate will GS be? Impact of GS on inbreeding / loss of diversity Genomic selection captures pedigree relatedness among candidates

Prediction accuracy = Correlation(predicted, true) R = irAσA rA = corr(selection criterion, breeding value) On simulated data corr(Â, A) is easy On real data: Click for h^2 !

Predict prediction accuracy Daetwyler, H.D. et al. 2008. Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach. PLoS ONE 3:e3395 Assume all loci affecting the trait are known and are independent Assume marker effects are fixed

λ Replicating hurts: 2000 with 1 plot is better than 1000 with 2 plots 0.02 0.5 0.1 1 2 5 10 20 Replicating hurts: 2000 with 1 plot is better than 1000 with 2 plots \lambda = \frac{n_P}{n_G}

Predict prediction accuracy Hayes, B.J. et al. 2009. Increased accuracy of artificial selection by using the realized relationship matrix. Genetics Research 91:47-60. Detail on the population genetics that drive nG Assume marker effects are random Still assume all markers independent and estimated separately

Analytical approximations Daetwyler et al., 2008 Hayes et al., 2009 NP / NG NP / NG

Take Homes Even with traits of very low heritability (h2 = 0.01), sufficient nP gives accuracy Replication may not be good The number of loci estimated (nG) is a critical parameter If you don’t know where the QTL are, higher marker coverage requires higher nG N.B. All conclusions assuming only 100% LD!

Genetic diversity loss / inbreeding Daetwyler, H.D. et al. 2007. Inbreeding in genome-wide selection. J. Anim. Breed. Genet. 124:369-376 Avoid selecting close relatives together What is the correlation in the estimated breeding value between full sibs? Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection Correlation sibling estimates

Genetic diversity loss / inbreeding σ2W > 0 σ2W = 0 _BLUP_ __GS__ Aj = ½AS + ½AD + aj Mendelian sampling term Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection \sigma^2_W = var(\hat{a}_j) \sigma^2_B = var(\frac{1}{2}\hat{A}_S + \frac{1}{2}\hat{A}_D) Correlation sibling estimates

Daetwyler et al. 2007 Take Homes Genomic selection captures the Mendelian sampling term. Correlation between the estimates of sibling performance are reduced Co-selection of sibs is reduced Rate of inbreeding / loss of diversity is reduced Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection

A word on pedigree relatedness Five individuals, a, b, c, d, and e. a, b, and c unrelated d offspring of a and b e offspring of a and c a b c d e 1 ½ ¼ A =

Ridge Regression Habier, D. et al. 2007. Genetics 177:2389-2397 \hat{a}_i = \sum_k x_k\hat{\beta}_k = \mathbf{X} \mathbf{\hat{\beta}} var(\mathbf{\hat{a}}) = \mathbf{\hat{A}}\hat{\sigma}^2_a = \mathbf{X}\mathbf{X}^T\hat{\sigma}^2_\beta \mathbf{\hat{A}} = \mathbf{X}\mathbf{X}^T\frac{\hat{\sigma}^2_\beta}{\hat{\sigma}^2_a} Hayes, B.J. et al. 2009. Genetics Research 91:47-60.

Habier et al. simulation set up

Genetic relationship decays fast Prediction from pedigree relationship loses acccuracy very quickly Decay rate is initially more rapid then stabilizes after about 5 generations Rapid initial decay reflects that the closest marker may not be in highest LD with the QTL RR-BLUP accuracy decays more rapidly than Bayes-B because more markers absorb the effect of a QTL Training population here

Habier et al. 2007 Take homes The ability of genomic selection to capture information on genetic relatedness is valuable That information decays rapidly The amount of that information relates to the number of markers fitted by a model: Ridge regression > BayesB Bayes-B captured more LD information: Long-term accuracy: BayesB > Ridge regression

Accuracy due to relationships vs. LD Round and square symbols, ridge regression and Bayes-B. Symbols with gray (inside or around) and without, 40 QTL and 200 QTL. Black and non-black symbols, 4000 and 400 markers. Small and large symbols, training population size of 400 and 2000.

Stochastic vs deterministic prediction Habier et al. Zhong et al. NP / NG Even though the condition of complete LD between markers is not met stochastic is coming out better than deterministic. The logical hypothesis is that additional accuracy derives from marker ability to account for relatedness

To replicate or not to replicate 504 Lines replicated once 168 Lines replicated three times Ridge Regression BayesB

Genetic diversity loss / inbreeding σ2W > 0 σ2W = 0 _BLUP_ __GS__ Aj = ½AS + ½AD + aj Mendelian sampling term Correlation sibling estimates Sib co-selection increases the rate of inbreeding / loss of diversity The extent of co-selection depends on the ratio of sigma_B captured relative to sigma_W captured The Bulmer effect reduces sigma_B and therefore reduces co-selection Capturing relationship Information increases σ2B NOT σ2W

Simulation setting: Meuwissen; Habier; Solberg Ne = 100; 1000 generations Mutation / Drift / Recombination equilibrium High marker mutation rate (2.5 x 10-3 / loc / gen); higher “haplotype mutation rate” Mutation effect distribution Gamma (1.66, 0.4): “effective QTL number” is only about 6 (!) > Watch out how you simulate!

Results Prediction accuracy estimated by simulation MHG HFD RR-BLUP 0.73 0.64 BayesB 0.85 0.69 These accuracies are ASTOUNDING If h2 = 1, r = 0.71

Noteworthy discussion Markers flanking QTL not always in model QTL effects captured by multiple markers No need to “detect” QTL Recombination causes accuracy to decay Faster than if QTL captured by flanking markers Markers far from QTL contribute to capture its effect Ne / 2 markers per Morgan achieves close to maximum accuracy Dependent on high marker mutation rates (?)

Solberg et al. 2008 Density: Number of markers per Morgan SSR: ¼ Ne SNP: 4 Ne 8 Ne

Zhong et al. 2009 Zhong, S. et al. 2009. Genetics 182:355-364. 42 diverse 2-row barley 1040 markers ~ evenly spaced Mating designs to generate 500 high and low LD training dataset 20 or 80 QTL; h2 = 0.4

Ridge regression Vs. BayesB QTL: Observed Unobserved 20QTL – HiLD 20QTL – LoLD 80QTL – HiLD 80QTL – LoLD Ridge Regression BayesB Zhong et al. 2009

Take-home messages Ridge regression is not affected by the number of QTL / the QTL effect size BayesB performs better with large marker-associated effects Co-linearity is more detrimental to BayesB High marker density and training pop. size? Yes: BayesB No: RR-BLUP

VanRaden et al. 2009 VanRaden, P.M. et al. 2009. Invited Review: Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92:16-24.

VanRaden et al. 2009 Some traits have major genes, others do not

VanRaden et al. 2009 The larger the training population, the better. Where diminishing returns will begin is not in sight. Predictor

Take Homes Training population requirements very large BayesB did not help == no large marker-associated effects == Like the “Case of the missing heritability” in human GWAS studies Are many quantitative traits driven by very low frequency variants? RR would capture this case better than BayesB

Empirical data on crops: TP size

Empirical data on crops: Marker No.

Empirical data on Humans: Marker No. Out of 295K SNP Yang et al. 2010. Nat. Genet. 10.1038/ng.608

Long-term genomic selection Marker data from elite six-row barley program 880 Markers 100 hidden as additive-effect QTL Evaluate 200 progeny, select 20 Phenotypic compared to genomic selection

Breeding / model update cycles Season 1 Season 2 Season 3 Season 4 Season 5 Season 6 Phenotypic Selection Cross & Inbreed Evaluate & Select Cross & Inbreed Evaluate & Select Cross, Inb. Genomic Selection Evaluation is possible every other season. Candidates from every other cycle can be evaluated. There is still a lag: Parents of C2 are selected based on evaluation of C0.

Response in genotypic value Mean Genotypic Value Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

Accuracy Mean Realized Accuracy Phenotypic Selection Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

Genetic variance Mean Genotypic Standard Deviation Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

Lost favorable alleles Mean Number Lost Favorable Allleles Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Phenotypic Breeding Cycle

Goddard 2008; Hayes et al. 2009 \begin{array}{ll} \text{Marker Effect Weight} &= f\left[ P(\text{favorable allele}) \right] \\ &= f(p_f) \\ \end{array} =\frac{1}{\sqrt{p_f}}

Response in genotypic value Phenotypic Breeding Cycle Mean Genotypic Value Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Unweighted Weighted

Genetic variance Unweighted Weighted Phenotypic Selection Phenotypic Breeding Cycle Mean Genotypic Standard Deviation Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Unweighted Weighted

Lost favorable alleles Phenotypic Breeding Cycle Mean Number Lost Favorable Alleles Genomic; Small Training Pop Genomic; Large Training Pop Phenotypic Selection Unweighted Weighted

Long term genomic selection The acceleration of the breeding cycle is key Some favorable alleles will be lost Likely those not in LD with any marker Managing diversity / favorable alleles appears a good idea This can be done using the same data as used for genomic prediction

Introgressing diversity GS relies on marker–QTL allele association An “exotic” line comes from a sub-population divergent from the breeding population After sub-populations separate Drift moves allele frequencies independently Drift & recombination shift associations independently Will the GS prediction model identify valuable segments from the exotic?

Three approaches Create a bi-parental family with the exotic (Bernardo 2009) Develop a mini-training population for that family Improve the family Bring it into the main breeding population Develop a separate training population for the exotic sub-population (Ødegård et al. 2009) Develop a single multi-subpopulation (species-wide?) training population (Goddard 2006)

Need higher marker density Ancestral LD sub-population specific LD Tightly–linked: ancestral LD Loosely–linked: sub-population specific LD

Consistency of association across barley subpopulations 1.0 0 cM recombination distance 5 cM recombination distance 0.8 0.6 Correlation of r 0.4 0.2 0.0 0.0 0.5 Genetic Distance

Example: Dairy cattle breeds

Oat sub-populations (UOPN) G1 G2 G3 N=136 N=149 N=161

Combined sub-population TP (β-Glucan) G2 and G3 0.11 TP VP G3 G1 and G2 0.50 G1 G3 G2 0.39

Introgressing diversity using GS Need higher marker density Analysis of consistency of r may indicate whether current density is sufficient Not sure we have it for barley If you have the density, a multi-subpopulation training population seems like a good idea Focuses the model on tighter ancestral LD rather than looser sub-population specific LD