Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mark E. Sorrells & Elliot Heffner Department of Plant Breeding & Genetics Association Breeding Strategies for Crop Improvement.

Similar presentations


Presentation on theme: "Mark E. Sorrells & Elliot Heffner Department of Plant Breeding & Genetics Association Breeding Strategies for Crop Improvement."— Presentation transcript:

1 Mark E. Sorrells & Elliot Heffner Department of Plant Breeding & Genetics Association Breeding Strategies for Crop Improvement

2 Presentation Overview Molecular Plant Breeding Strategies Marker Assisted Selection Association Breeding Genomic (Genome-Wide) Selection Methods, examples, applications

3 Historical Improvement of Breeding Methods Mass Selection Family Selection Methods Progeny Testing Marker Assisted Selection Genomic Selection

4 Molecular Breeding Goals Allele discovery Allele characterization & validation Parental & progeny selection for superior alleles, transgressive segregation

5 Strategies for Molecular Breeding Genomic Selection (Meuwissen, Hayes & Goddard 2001) Requires genome-wide markers that are used to develop a prediction model for estimating a breeding value for each individual Marker/QTL effects are estimated for individuals in a breeding population without phenotyping Marker Assisted Selection Only significant markers are used for selection, usually qualitative traits Association Breeding (Breseghello & Sorrells 2006) Uses conventional hybridization/MAS/Testing for significant markers Allows for updating breeding values for new and existing alleles Phenotyping and association analysis are used as often as necessary for allele discovery and validation

6 Marker Assisted Selection Successes: Significant impacts in backcrossing Simple, monogenic trait improvement i.e. BC major genes into elite varieties Limitations: Best suited for major genes BC is the most conservative breeding method Pyramiding limited to a few target genes Genes with small effects that underlie most of the important traits determine the success of new varieties

7 Association Mapping versus Bi-Parental QTL Mapping Association Mapping can be conducted relevant adapted groups of accessions Direct inference to a breeding population is possible Relevant genetic background effects are sampled Phenotypic variation is observed for most traits of interest Marker polymorphism higher than for biparental populations Routine variety trial evaluations provide high quality phenotypic data Characterize the structure of genetic variation in relevant populations Novel alleles can be identified and their relative value can be assessed as often as necessary

8 Type I error (false positives) can be higher because of: Low heritability & small-effect QTL (heterogeneity of genetic background) Population structure Estimates of population structure or kinship are used in a linear mixed effects model to reduce the frequency of false positive associations High sampling variance of rare alleles Rare alleles are usually excluded from the analysis Association Mapping versus QTL Mapping

9 Association Analysis: An Example Breseghello & Sorrells 2006-Genetics & Field Crops Research 2007 Association Panel of Elite Soft Winter Wheat Varieties: 149 adapted soft wheat varieties; milling quality, seed size Markers: Preliminary screen: 18 unlinked SSRs 93 markers saturating two QTL regions Population Structure: TASSEL software - www.maizegenetics.net Structure without admixture Kinship - SPAGeDi (Hardy & Vekemans) Association Analysis: Linear mixed-effects model Markers were fixed effects from selected QTL regions Subpopulations or Kinship were random effects

10 Linkage Disequilibrium: Germplasm Selection Breseghello & Sorrells 2006 Genetics 149 lines genotyped with 18 unlinked SSR markers-95 selected Most similar lines were excluded p<.0001 p<.001 p<.01 149 lines 95 lines R 2 probability for unlinked SSR markers Elite Soft Winter Wheat Varieties: Milling quality, seed size "Normalizing" the sample reduced: population sub structure, frequency of rare alleles long range LD

11 Previous QTL Information: Kernel Size and Shape Recombinant Inbred Wheat Population: Synthetic W7984 x Opata (ITMI population) QTL for kernel size on 5A Size 5A Width 2D Breseghello & Sorrells 2006 Doubled-Haploid Wheat Population: AC Reed x Grandin QTL for kernel size (width) near Xwmc18-2D

12 Chromosome 2D: Associations & LD Estimate Significant LD was below 1cM Association analysis confirmed the kernel width QTL & identified other QTL

13 Significant LD Extended 3-5cM Chromosome 5A: Associations & LD Estimate Association analysis confirmed the kernel weight QTL

14 Estimated allele effects Kernel Weight N. of Cultivars: 41 45 43 49 Best Linear Unbiased Estimates - Allele effects (REML) were all compared to mean null alleles (missing & rare alleles)

15 Estimated allele effects Kernel Width No. of Cultivars: 41 14 8 15 18 24 5 10 19 Best Linear Unbiased Estimates - Allele effects (REML) were all compared to mean null alleles (missing & rare alleles)

16 Germplasm New Populations Evaluation of Elite Synthetics, Lines, Varieties Evaluation Trials Genotypic & Phenotypic data Association Mapping: Characterize QTL/Marker Allele Associations Application of Association Analysis in a Breeding Program Elite germplasm feeds back into hybridization nursery MAS identifies desired segregates up front so phenotypic selection intensity can be increased for other traits MAS identifies desired segregates up front so phenotypic selection intensity can be increased for other traits Association mapping facilitates allele discovery and validation Association mapping facilitates allele discovery and validation Marker Assisted Selection Parental Selection Hybridization

17 Association Analysis as a Breeding Strategy Issues: Breeding programs are dynamic, complex genetic entities that require frequent evaluation of marker / QTL relationships. Accurate detection and estimation of QTL effects required In new germplasm, pre-existing marker alleles may be linked to undesirable QTL alleles instead of the target allele Population structure can cause a high frequency of false positive associations between markers and QTL

18 Genomic Selection Methodology Genomic Selection Methodology Meuwissen et al. 2001 Genetics 157:1819-1829; Goddard & Hayes 2007 In a Breeding Population individuals are genotyped but not phenotyped A genomic estimated breeding value (GEBV) for each individual is obtained by summing the marker effects for that genotypeA genomic estimated breeding value (GEBV) for each individual is obtained by summing the marker effects for that genotype Prediction model is used to impose multiple generations of selection A Training Population is genotyped with a large number of markers and phenotyped for important traits Genome-wide markers are used to estimate all genetic effects simultaneously One or more markers are assumed to be in LD with each QTL affecting the trait Prediction model attempts to captures the total additive genetic variance

19 Test varieties and release Phenotype (lines have already been genotyped) Train prediction model Make crosses and advance generations Genotype Advance lines informative for model improvement New Germplasm Line Development Cycle Genomic Selection Advance lines with highest GEBV Model Training Cycle GS in a Plant Breeding Program Heffner, Sorrells & Jannink. Crop Science 49:1-12 Genomic selection reduces cycle time & cost by reducing frequency of phenotyping Genomic selection reduces cycle time & cost by reducing frequency of phenotyping Training Population Breeding Population

20 Choosing a Statistical Model for GS Model performance is based on correlation between GEBV and TBV Must estimate many QTL effects from a limited number of phenotypes Least Squares regression sets an arbitrary threshold for significance resulting in overestimation of significant effects and loss of small effects Variable selection or shrinkage estimation can be used to deal with oversaturated regression models Many QTL effects can be estimated simultaneously in linear mixed models for the prediction of random effects

21 Choosing a Statistical Model for GS Shrinkage Analysis Ridge Regression BLUP All effects are estimated simultaneously Assumes equal variance for all QTL effects Shrinks large QTL effects towards zero Bayesian Shrinkage Regression - a.k.a. BayesA, B (Meuwissen et al) Scaled inverse - Chi-square distribution Variance is estimated for each marker Bayes B assigns the value of zero to a portion of the markers Bayesian Variable Selection: Stochastic Search of Variable Selection Variance is estimated for each marker Both Shrinkage & Variable Selection Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani; Xu) Minimizes residual sum of squares constraining sum of regression coefficients Model - free methods Kernel regression & Reproducing Kernel Hilbert Spaces regression (Gianola et al)

22 Factors Affecting the Accuracy of GEBVs Level and distribution of LD between markers and QTL R 2 > 0.2 desirable; haplotypes may increase LD but reduce power Size of training population Larger is better but over time re-training models may be required Heritability of the trait More records are required for low heritability traits Distribution of QTL effects Many small effect QTL or low LD favor BLUP for capturing small effect QTL that may not be in LD with a marker

23 Genomic Selection in Dairy Cattle Hayes et al. 2009 Comparisons of GEBV Reliabilities = square of correlation between GEBV & TBV All included a polygenic effect (parental average BV) in calculating GEBV Australia - 798 Holstein-Friesian bulls Australian Selection Index = sire 38% < BLUP 44% < BayesA 48% 4,500 bullsNew Zealand - 4,500 bulls Genomic estimated breeding values were 50-67% for milk vs 34% for parental average 3,576 Holstein bullsUnited States - 3,576 Holstein bulls Genomic estimated breeding values were 50% for selection index vs 27% for parental average 1,583 Holstein bullsThe Netherlands - 1,583 Holstein bulls Genomic estimated breeding values were 9 to 33% higher than parental average

24 Adapting Genomic Selection to Plant Breeding For most crop species, large populations can be generated For animals, many daughters are tested for each bullFor animals, many daughters are tested for each bull Plant breeders use more diverse mating schemes Animal Parental values are mainly based on half-sib families Inbred lines, testcross hybrids and clonally propagated crops can be replicated in time and space Each animal is a unique genotype and heterozygousEach animal is a unique genotype and heterozygous GxE is a major issue in plant breedingGxE is a major issue in plant breeding LD in self-pollinated crops tends to be quite high = 5-20 cM for r 2 of 0.1 to 0.2

25 Genomic Selection & Marker Assisted Recurrent Selection Schemes for Maize Inbred Development - An Example Bernardo & Yu 2008 Computer simulation to compare Genomic Selection to Marker Assisted Recurrent Selection Genomic Selection: A large number of markers are used to estimate breeding value. Trait values are the sum of an individual’s breeding values across all markers. MARS: Only significant markers for target traits are used for selection Simulations: Number of QTL - 20, 40, & 100 Heritabilities - 0.2, 0.5, 0.8

26 Off-season nurseries Training Population to develop prediction equations Computer Simulations: QTL - 20, 40, & 100 H 2 - 0.2, 0.5, 0.8 Genomic Selection: DH testcrosses are training population. Phenotyped & genotyped to train model. MARS uses only significant markers. Two cycles of selection Bernardo & Yu 2008

27 Genomic Selection & Marker Assisted Recurrent Selection Schemes for Maize Inbred Development Bernardo & Yu 2008 Results of simulations: Response to genomic selection was 18-43% higher than MARS across different population sizes, numbers of QTL and heritabilities. Advantage of GS over MARS was greatest for low h 2 and many QTL. % Advantage of GS over MARS #QTL Heritability 0.2 0.4 0.8 20 130 121 118 40 136 132 135 100 143 128 130

28 Parent Recombination F1 Greenhouse Advance Advanced Regional Testing 7 years to parent selection and advanced testing Phenotypic + MAS Selection Pedigree + Phenotype F4 GH Advance SSD F5 GH Advance SSD F5DL Field Single Plot Yield Trial; PS F5DL Field Yield Trials 3 Locations; PS F5DL Field Yield Trials 3 Locations; PS F2 GH Adv. SSD ; Geno; MAS 1 F3 GH Adv. SSD ; Geno; MAS 2 Self-Pollinated Crop Genomic Selection vs. Phenotypic/MAS Selection Timeline F5DL Field Single Row Seed Increase ; PS F5DL Field Single Row Seed Increase ; PS F5DL Field Yield Trials 3 Locations; PS F5DL Field Yield Trials 3 Locations; PS GEBV + Phenotype F2 GH Advance SSD F3 GH Advance SSD 3 years to parent selection Genomic Selection 5 years to advanced testing F5DL Field Single Row Seed Increase; GS+PS F5DL Field Single Row Seed Increase; GS+PS F5DL Field Trials 3 Locations ; GS+PS F5DL Field Trials 3 Locations ; GS+PS F4 GH Advance SSD F5 GH Adv. SSD; Genotype; GS

29 Genomic Selection Experiments in the Cornell Wheat Breeding Program Within families - Cayuga x Caledonia DH population Pre-harvest Sprouting Heritability = 0.44 209 lines across 16 environments (6 years) 15 QTL explaining < 40% of the phenotypic variance Across Families - Master Nursery 400 advanced breeding lines (F7+) Augmented field design Three locations over 3 years DArT markers ~ 1500 polymorphisms

30 Cornell Winter Wheat Breeding Program 2 or 3 way cross of parent material F1 F3-F4: Early Generation bulk (mass selection for Ht & seed size) F5: Space Plant select individual plants F6: Head row (1m) and single row selection F7: Screening Nursery (3m plots) Prelim Line Selection F8-F10: Master Nursery 400 lines; 4 meter plots; Advanced line selection Regional Trials (1-4 years) and Variety release F2 (MAS for 1-5 loci) Seed Increase Final Screening (3m plots) F3-F4: Early Generation Genomic Selection F5: Space Plant select individual plants

31 Preliminary Evaluation of GS for Preharvest Sprouting (PHS) in Cayuga x Caledonia Collaboration with Hiroyoshi Iwata Population size = 209; 16 Environments; Heritability = 0.44 Population size = 209; 16 Environments; Heritability = 0.44 LOO cross validation: Leave a line out of the analysis, then predict it using marker data; repeat for all lines. Provides an estimate of selection based only on markers LOO cross validation: Leave a line out of the analysis, then predict it using marker data; repeat for all lines. Provides an estimate of selection based only on markers Models: RR-BLUP, Bayes A & B; GEBV with and without the phenotype of the predicted line in the model training step Models: RR-BLUP, Bayes A & B; GEBV with and without the phenotype of the predicted line in the model training step Average Correlation between phenotype in training population and True Breeding Value = 0.67 Average Correlation between phenotype in training population and True Breeding Value = 0.67

32 GS w/o Pheno vs. True Breeding Value (TV) RRPre:TVBayesAPre:TVBayesBPre:TVAve Pre:TV 0.6290.6280.5870.634 0.400.390.350.40 Corr to TV R^2 Prediction r(GEBV accuracy) = 0.67  h 2 = 0.44 R(gain) = irσ A

33 GS with Phenotype vs. True Breeding Value Correlation to TBV = 0.73 R 2 = 0.53

34 Preliminary Conclusions GS with Markers + Phenotype > Phenotype > GS with Markers GS with Markers + Phenotype > Phenotype > GS with Markers RR computationally >40+ times faster RR computationally >40+ times faster Only 209 genotypes produced GEBVs that are comparable to phenotypic selection Only 209 genotypes produced GEBVs that are comparable to phenotypic selection GEBVs w/ phenotype have better precision than phenotype alone - implications for advanced testing GEBVs w/ phenotype have better precision than phenotype alone - implications for advanced testing

35 Summary: Association Breeding and Genomic Selection Association Breeding: New alleles can be identified and characterized to determine their value Allelic values of previously identified alleles can be dynamically updated based on advanced trial data as desired Genomic Selection: Captures small-effect QTL and genetic relationships Can increase gain from selection & reduce advanced testing Requires a large number of markers and accurate prediction models Both Association Breeding & Genomic Selection: Genome saturation is not required (but does improve prediction) and supplemental markers can focus on specific QTL regions and candidate genes The most important advantages are reductions in the length of the selection cycle and the associated phenotyping cost resulting in greater gain per year.The most important advantages are reductions in the length of the selection cycle and the associated phenotyping cost resulting in greater gain per year.

36 Future Opportunities Develop economic assessment of GS in breeding strategies Prediction of Epistatic effects Develop prediction models for different target environments & high value traits Predict utility of new germplasm from other sources

37 Acknowledgements USDA Soft Wheat Quality Lab, Wooster, OH Embrapa USDA Cooperative State Research, Education and Extension Service, Coordinated Agricultural Project USDA National Needs Fellowship Grant 2005-38420-15785: Provided Fellowship for Elliot Heffner Provided assistantship for Flavio Breseghello


Download ppt "Mark E. Sorrells & Elliot Heffner Department of Plant Breeding & Genetics Association Breeding Strategies for Crop Improvement."

Similar presentations


Ads by Google