2007 Paul VanRaden 1, Jeff O’Connell 2, George Wiggans 1, Kent Weigel 3 1 Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2 University of Maryland School of Medicine, Baltimore, MD, USA 3 University of Wisconsin Dept. Dairy Science, Madison, WI, USA 2010 Genomic Evaluation with Many More Genotypes and Phenotypes Genomic Evaluation with Many More Genotypes and Phenotypes
9WCGALP, Leipzig, Germany, August 2010 (2)Paul VanRaden 2010 Topics Methods to combine different marker densities and datasets More markers: 500,000 simulation More animals: 3,000 marker subset More breeds: multi-trait markers More traits: same genotype cost
9WCGALP, Leipzig, Germany, August 2010 (3)Paul VanRaden 2010 Methods to Trace Inheritance Few markers Pedigree needed Prob (paternal or maternal alleles inherited) computed within families Many markers Can find matching DNA segments without pedigree Prob (haplotypes are identical) mostly near 0 or 1 if segments contain many markers
9WCGALP, Leipzig, Germany, August 2010 (4)Paul VanRaden 2010 Haplotype Probabilities with Few Markers (12 SNP / chromosome)
9WCGALP, Leipzig, Germany, August 2010 (5)Paul VanRaden 2010 Haplotype Probabilities with More Markers (50 SNP / chromosome)
9WCGALP, Leipzig, Germany, August 2010 (6)Paul VanRaden 2010 Haplotyping Program findhap.f90 Begin with population haplotyping Divide chromosomes into segments, ~250 SNP / segment List haplotypes by genotype match Similar to FastPhase, IMPUTE, or long range phasing End with pedigree haplotyping Detect crossover, fix noninheritance Impute nongenotyped ancestors
9WCGALP, Leipzig, Germany, August 2010 (7)Paul VanRaden 2010 Recent Program Revisions Improved imputation and reliability Changes since January 2010 Use known haplotype if second is unknown Use current instead of base frequency Combine parent haplotypes if crossover is detected Begin search with parent or grandparent haplotypes Store 2 most popular progeny haplotypes Simulated crossover rate increased
9WCGALP, Leipzig, Germany, August 2010 (8)Paul VanRaden 2010 Coding of Alleles and Segments Genotypes 0 = BB, 1 = AB or BA, 2 = AA 3 = B_, 4 = A_, 5 = __ (missing) Allele frequency used for missing Haplotypes 0 = B, 1 = not known, 2 = A Segment inheritance (example) Son has haplotype numbers 5 and 8 Sire has haplotype numbers 8 and 21 Son got haplotype number 5 from dam
9WCGALP, Leipzig, Germany, August 2010 (9)Paul VanRaden 2010 Most Frequent Haplotypes Most Frequent Haplotypes 1st segment of chromosome % % % % % % % % % % For efficiency, store haplotypes just once. Most frequent haplotype in Holsteins had 4,316 copies =.0516 * 41,822 animals * 2 chromosomes each
9WCGALP, Leipzig, Germany, August 2010 (10)Paul VanRaden 2010 Population Haplotyping Steps Put first genotype into haplotype list Check next genotype against list Do any homozygous loci conflict? – If haplotype conflicts, continue search – If match, fill any unknown SNP with homozygote – 2 nd haplotype = genotype minus 1 st haplotype – Search for 2 nd haplotype in rest of list If no match in list, add to end of list Sort list to put frequent haplotypes 1st
9WCGALP, Leipzig, Germany, August 2010 (11)Paul VanRaden 2010 Check New Genotype Against List Check New Genotype Against List 1st segment of chromosome % % % % % Get 2 nd haplotype by removing 1 st from genotype: Search for 1 st haplotype that matches genotype: % % % % %
9WCGALP, Leipzig, Germany, August 2010 (12)Paul VanRaden 2010 Simulated 500K Tests How many 500K genotypes needed? Is computation affordable? Two subsets of mixed 500K and 50K: Of 33,414 HO, only 1,406 (young) had 500K Also bulls > 99% reliability, total 3,726 Linkage generated in base population Efficient and similar to autoregressive Linkage affects gain from more markers
9WCGALP, Leipzig, Germany, August 2010 (13)Paul VanRaden 2010 Holstein Linkage Disequilibrium
9WCGALP, Leipzig, Germany, August 2010 (14)Paul VanRaden 2010 Simulated Linkage
9WCGALP, Leipzig, Germany, August 2010 (15)Paul VanRaden 2010 Computer Requirements 500,000 markers, 33,414 animals StepGbytesCPU hours Simulate genotypes391.8 Pop’n haplotypes21.2 Pedigree haplotypes31.8 Store genotypes13- Store haplotypes3- Iterate allele effects (for 5 traits) 830
9WCGALP, Leipzig, Germany, August 2010 (16)Paul VanRaden 2010 Measures of Haplotyping Success Does estimated = true genotype? Does estimated = true linkage for adjacent heterozygous markers? Does estimated = true paternity? How many alleles remain missing? What is the error rate (Druet, 2010)? What is corr 2 (estimated, true genotype)? Are resulting GEBVs reliable?
9WCGALP, Leipzig, Germany, August 2010 (17)Paul VanRaden K Imputation Results # 500K01,4063,79833,414 Percentages:50K50K & 500K500K Missing before Missing after Errors (young) Errors (old) Reliability Gain vs. 50K
9WCGALP, Leipzig, Germany, August 2010 (18)Paul VanRaden K Imputation Results # 500K=01,4063,79833,414 % wrong50K50K & 500K500K GenotypeYng Old LinkageYng Old PaternityYng Old
9WCGALP, Leipzig, Germany, August 2010 (19)Paul VanRaden 2010 Imputation Summary 1,406 young animals genotyped at 500K REL gain 0.8% vs. 1.4% with all 500K Imputation better if ancestors also genotyped Could genotype additional reference bulls instead of re-genotyping bulls already done 32,008 animals imputed from 50K 10% SNP known before, 93% after 97-98% of 500K genotypes correct.839 squared correlation (estimated, true genotype)
9WCGALP, Leipzig, Germany, August 2010 (20)Paul VanRaden 2010 Multi-Breed Genomic Evaluation Treat allele effects as independent, same, or correlated, using data of 5,331 purebred Holsteins, 1,361 purebred Jerseys, and 506 purebred Brown Swiss
9WCGALP, Leipzig, Germany, August 2010 (21)Paul VanRaden 2010 Protein Yield R 2 SNP effects for breeds: HolsteinJerseyBrown Swiss None (PA) Independent Same Correlated Optimum correlation was.3 with 43K markers, and would be larger with more markers
9WCGALP, Leipzig, Germany, August 2010 (22)Paul VanRaden 2010 Correlation with Single-Breed GEBV
9WCGALP, Leipzig, Germany, August 2010 (23)Paul VanRaden 2010 Fewer Markers, More Animals Fewer Markers, More Animals Half of young animals assigned 3K Proven bulls, cows all had 43K Dams imputed using 43K and 3K Half of ALL animals assigned 3K Could 3K reference animals help? 10,000 proven bulls yet to genotype Should cows with 3K be predictors?
9WCGALP, Leipzig, Germany, August 2010 (24)Paul VanRaden 2010 Reliability from 3K, 43K Mixture Chips3K3K and 43K43K # 43KN = 0½ All½ Young40,351 Missing %: Before After.0531 Reliability % Rel - PA Rel
9WCGALP, Leipzig, Germany, August 2010 (25)Paul VanRaden 2010 Correlations 2 of 3K and PA with 43K Genotyped ancestors had 43K Consistent gains across traits Reliability gain from progeny with 3K was 79-87% of gain from 43K Gain % = [Corr(3K,43K) 2 - Corr(PA,43K) 2 ] / [1 - Corr(PA,43K) 2 ] Large benefits for smaller cost
9WCGALP, Leipzig, Germany, August 2010 (26)Paul VanRaden 2010 Conclusions - 1 Missing genotypes can be filled easily Population and pedigree haplotyping can both process long segments efficiently Imputing 500,000 SNP for 33,414 Holsteins required 3 Gbyte memory, 3 CPU hours Haplotyping implemented for April 2010 routine U.S. evaluation Several recent improvements to accuracy Ready to include lower or higher density genotypes in evaluations
9WCGALP, Leipzig, Germany, August 2010 (27)Paul VanRaden 2010 Conclusions - 2 More markers improved reliability < 2% 1,406 high density genotypes sufficient 32,008 other animals imputed from 50K to 500K in simulation Fewer markers can decrease cost More animals can greatly increase reliability and selection differential Multi-breed model improves reliability only slightly (< 1%) at current density
9WCGALP, Leipzig, Germany, August 2010 (28)Paul VanRaden 2010 Acknowledgments Katie Olson computed the multi- breed genomic evaluation Mel Tooker assisted with graphics and computation Bob Schnabel helped improve marker locations on the map