Paul VanRaden and Chuanyu Sun Animal Genomics and Improvement Lab USDA-ARS, Beltsville, MD, USA National Association of Animal Breeders Columbia, MO, USA.

Slides:



Advertisements
Similar presentations
2007 Paul VanRaden, Mel Tooker, and Nicolas Gengler Animal Improvement Programs Lab, Beltsville, MD, USA, and Gembloux Agricultural U., Belgium
Advertisements

Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
2007 Paul VanRaden 1, Jeff O’Connell 2, George Wiggans 1, Kent Weigel 3 1 Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2 University of Maryland.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Genomic imputation and evaluation using 1074 high density Holstein genotypes P. M. VanRaden 1, D. J. Null 1 *, G.R. Wiggans 1, T.S. Sonstegard 2, E.E.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund.
From sequence data to genomic prediction
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
2007 Paul VanRaden and Jeff O’Connell Animal Improvement Programs Lab, Beltsville, MD U MD College of Medicine, Baltimore, MD
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
2007 Paul VanRaden, George Wiggans, Jeff O’Connell, John Cole, Animal Improvement Programs Laboratory Tad Sonstegard, and Curt Van Tassell Bovine Functional.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Wiggans, 2013RL meeting, Aug. 15 (1) Dr. George R. Wiggans, Acting Research Leader Bldg. 005, Room 306, BARC-West (main office);
Introduction to Animal Breeding & Genomics Sinead McParland Teagasc, Moorepark, Ireland.
2007 Paul VanRaden 1, Curt Van Tassell 2, George Wiggans 1, Tad Sonstegard 2, Jeff O’Connell 1, Bob Schnabel 3, Jerry Taylor 3, and Flavio Schenkel 4,
Mating Programs Including Genomic Relationships and Dominance Effects
Mating Programs Including Genomic Relationships and Dominance Effects Chuanyu Sun 1, Paul M. VanRaden 2, Jeff R. O'Connell 3 1 National Association of.
Chuanyu Sun Paul VanRaden National Association of Animal breeders, USA Animal Improvement Programs Laboratory, USA Increasing long term response by selecting.
WiggansARS Big Data Workshop – July 16, 2015 (1) George R. Wiggans Animal Genomics and Improvement Laboratory Agricultural Research Service, USDA Beltsville,
BickhartADSA Meeting(1) 2013 Tools to Exploit Sequence data to find new markers and Disease Loci in Cattle D. M. Bickhart, H. A. Lewin and G. E. Liu.
Impacts of inclusion of foreign data in genomic evaluation of dairy cattle K. M. Olson 1, P. M. VanRaden 2, D. J. Null 2, and M. E. Tooker 2 1 National.
2007 J. B. Cole 1,*, P. M. VanRaden 1, J. R. O'Connell 3, C. P. Van Tassell 1,2, T. S. Sonstegard 2, R. D. Schnabel 4, J. F. Taylor 4, and G. R. Wiggans.
2007 Paul VanRaden Animal Improvement Programs Lab, Beltsville, MD 2011 Avoiding bias from genomic pre- selection in converting.
Wiggans, 2013SRUC Imputation (1) Dr. George R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD ,
2007 Paul VanRaden, Curt Van Tassell, George Wiggans, Tad Sonstegard, and Jeff O’Connell Animal Improvement Programs Laboratory and Bovine Functional Genomics.
Wiggans, th WCGALP (1) G.R. Wiggans*, T.A. Cooper, D.J. Null, and P.M. VanRaden Animal Genomics and Improvement Laboratory Agricultural Research.
An Efficient Method of Generating Whole Genome Sequence for Thousands of Bulls Chuanyu Sun 1 and Paul M. VanRaden 2 1 National Association of Animal Breeders,
Paul VanRaden Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 2013 Paul VanRaden University.
MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads Hua Bao Sun Yat-sen University, Guangzhou,
2007 Paul VanRaden and Mel Tooker Animal Improvement Programs Laboratory, USDA Agricultural Research Service, Beltsville, MD, USA
John B. Cole 1, Daniel J. Null *1, Chuanyu Sun 2, and Paul M. VanRaden 1 1 Animal Genomics and Improvement 2 Sexing Technologies Laboratory Navasota, TX.
2007 Paul VanRaden, Mel Tooker, Jan Wright, Chuanyu Sun, and Jana Hutchison Animal Improvement Programs Lab, Beltsville, MD National Association of Animal.
2007 Paul VanRaden Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2009 Mixing Different SNP Densities Mixing Different.
2007 Melvin Tooker and Paul VanRaden Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2009 Happy Bulls, Happy Cows,
2007 Paul VanRaden, George Wiggans, Jeff O’Connell, John Cole, Animal Improvement Programs Laboratory Tad Sonstegard, and Curt Van Tassell Bovine Functional.
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (1) J. R. O’Connell 1 and P. M. VanRaden 2 1 University of Maryland School of Medicine,
Methods in genome wide association studies. Norú Moreno
Paul VanRaden, 1 Katie Olson, 2 Dan Null, 1 Mehdi Sargolzaei, 3 Marco Winters, 4 and Jan-Thijs van Kaam 5 1 Animal Improvement Programs Laboratory, ARS,
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
2007 Melvin Tooker Animal Improvement Programs Laboratory USDA Agricultural Research Service, Beltsville, MD, USA
G.R. Wiggans 1, T.S. Sonstegard 1, P.M. VanRaden 1, L.K. Matukumalli 1,2, R.D. Schnabel 3, J.F. Taylor 3, F.S. Schenkel 4, and C.P. Van Tassell 1 1 Agricultural.
G.R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 2009 G.R. WiggansCouncil.
WiggansCDCB industry meeting – Sept. 29, 2015 (1) George R. Wiggans Animal Genomics and Improvement Laboratory Agricultural Research Service, USDA Beltsville,
G.R. Wiggans* and P.M. VanRaden Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD
John B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD AIPL Report.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
California Pacific Medical Center
2007 Paul VanRaden Animal Improvement Programs Laboratory USDA Agricultural Research Service, Beltsville, MD, USA
Genotype Calling Matt Schuerman. Biological Problem How do we know an individual’s SNP values (genotype)? Each SNP can have two values (A/B) Each individual.
Council on Dairy Cattle Breeding April 27, 2010 Interpretation of genomic breeding values from a unified, one-step national evaluation Research project.
2007 Paul VanRaden and Melvin Tooker* Animal Improvement Programs Laboratory 2010 Gains in reliability from combining subsets.
2007 Paul VanRaden 1, Jeff O’Connell 2, George Wiggans 1, Kent Weigel 3 1 Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2 University of Maryland.
2007 Paul VanRaden 1, Jeff O’Connell 2, George Wiggans 1, Kent Weigel 3 1 Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2 University of Maryland.
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
2007 Paul VanRaden Animal Improvement Programs Lab, Beltsville, MD Iterative combination of national phenotype, genotype, pedigree,
Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
G.R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD G.R. WiggansADSA 18.
2007 Paul VanRaden, Dan Null, Katie Olson, Jana Hutchison Animal Improvement Programs Lab, Beltsville, MD National Association of Animal Breeders, Columbia,
G.R. Wiggans 1, T. A. Cooper 1 *, K.M. Olson 2 and P.M. VanRaden 1 1 Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville,
2007 Paul VanRaden Animal Improvement Programs Laboratory, USDA Agricultural Research Service, Beltsville, MD, USA 2008 New.
Multibreed Genomic Evaluations in Purebred Dairy Cattle K. M. Olson 1 and P. M. VanRaden 2 1 National Association of Animal Breeders 2 AIPL, ARS, USDA.
G.R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD Select Sires‘ Holstein.
G.R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 2011 National Breeders.
2007 Paul VanRaden 1, Curt Van Tassell 2, George Wiggans 1, Tad Sonstegard 2, Bob Schnabel 3, Jerry Taylor 3, and Flavio Schenkel 4, Paul VanRaden 1, Curt.
My vision for dairy genomics
Methods to compute reliabilities for genomic predictions of feed intake Paul VanRaden, Jana Hutchison, Bingjie Li, Erin Connor, and John Cole USDA, Agricultural.
Genotype Imputation with Millions of Reference Samples
Perspectives from Human Studies and Low Density Chip
Using Haplotypes in Breeding Programs
Presentation transcript:

Paul VanRaden and Chuanyu Sun Animal Genomics and Improvement Lab USDA-ARS, Beltsville, MD, USA National Association of Animal Breeders Columbia, MO, USA Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (1) Fast Imputation Using Medium- or Low-Coverage Sequence Data

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (2) Topics l Cost of chip vs. sequence data w Chips: Nonlinear increase with SNP density w Sequence: Linear increase with read depth l Imputation methods for sequence data w Few programs designed for low read depth l Value of including HD chip in sequence data

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (3) Analysis of chip vs. sequence data Chip dataSequence data Genotypes are observedGenotype probabilities AA, AB, BB (2, 1, 0)Counts of A, counts of B Exact data, SNP subsetApproximate data, all SNP Impute only missing dataImpute all genotypes 3K, 6K, 50K, 77K, 777K30 million SNPs + CNVs Error rate < 0.05%Error rate 0.5% to 10% Computation importantComputation is crucial

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (4) Imputation algorithm (findhap v4) l Prior allele probabilities = pop’n frequency l Compute Prob(nA, nB | genotypes, errate) l Test ancestor haplotype likelihoods first l Find most likely 2 haplotypes from library l Compute haplotype posteriors from priors l Test long, then medium, then short segments

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (5) Data sets and imputation tests Data category / parameterLevels tested Simulated sequenced bulls250, 500, 1,000, 10,000 Read depths1, 2, 4, 8, 16 Error rates0%, 1%, 4%, 16% Include HD chip in sequenceYes or no SNPs in sequence and HD30 million and 600,000 Human chromosome 221,102 actual genomes SNPs in sequence and HD394,724 and 39,440

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (6) Computation required l Bulls: 250 sequenced HD, 1 chromosome l Time (10 processors): findhap 10 min, BeagleV4 3 days l Memory: findhap 5 Gbytes, Beagle <5 Gbytes l Input data: findhap 0.5 Gbytes, Beagle 5 Gbytes w findhap: 2 bytes / SNP [A, B counts stored as hexadecimal] w Beagle: 20 bytes / SNP [Prob(AA), Prob (AB), Prob(BB)] l Output data: findhap 1 byte vs. Beagle 20 bytes / SNP

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (7) Accuracy of Findhap vs. Beagle Sequence + HDImpute from HD ProgramDepthCorrectCorr’nCorrectCorr’n Findhap8X X X Beagle8X X X bulls had sequence + HD, 250 others were imputed from HD

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (8) Accuracy from HD for bulls * depth Sequenced BullsDepth Total DepthCorrectCorr’n 2508X2,000X X2,000X ,0002X2,000X ,0001X10,000X Sequences had 1% error, HD imputed using findhap

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (9) Accuracy including HD in sequence Sequenced bullsBulls with HD only ReadHD in sequence? DepthNoYesNoYes 16X X X X X Correlations of estimated with true genotypes for 500 bulls sequenced with 1% error and 250 bulls with HD only

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (10) Imputation from 10K, 60K, 1X, or 2X Reference population is 500 bulls, 8X read depth, 1% error

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (11) Sequenced human read depth * error Correct genotypes %Genotype correlation ReadError rate Depth 0%1%4%16% 0%1%4%16% 16X X X X X humans sequenced for 394,724 SNPs on chromosome 22

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (12) Software at l Simulate genotypes (programs written 2007) w pedsim.f90, markersim.f90, genosim.f90 l Simulate A and B counts, Poisson plus error w geno2seq.f90 l Impute using haplotype likelihood ratios w findhap.f90 version 4

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (13) Actual HD genotype correlations 2

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (14) Simulated HD correlations 2

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (15) Conclusions l High read depth is expensive (linear cost) l Low read depth requires additional math w Haplotype probabilities | (A B counts, error) l Imputation improved with findhap version 4 w Up to 400 times faster than Beagle w findhap more accurate for low coverage l Some gain from including HD in sequence

Paul VanRaden 10 World Congress Genetics Applied Livest. Prod., Vancouver, Canada, August 19, 2014 (16) Acknowledgments l Jeff O’Connell and Derek Bickhart provided helpful advice on sequence analysis and software design and testing