Presentation is loading. Please wait.

Presentation is loading. Please wait.

WiggansARS Big Data Workshop – July 16, 2015 (1) George R. Wiggans Animal Genomics and Improvement Laboratory Agricultural Research Service, USDA Beltsville,

Similar presentations


Presentation on theme: "WiggansARS Big Data Workshop – July 16, 2015 (1) George R. Wiggans Animal Genomics and Improvement Laboratory Agricultural Research Service, USDA Beltsville,"— Presentation transcript:

1 WiggansARS Big Data Workshop – July 16, 2015 (1) George R. Wiggans Animal Genomics and Improvement Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA george.wiggans@ars.usda.gov Big data in support of genetic improvement of dairy cattle 100 011110 1220020012 02121110111121 101111001121100020122002220111120210120021112211002111202 00111100101101101022001100220110112002011010202221211221012 20100111000112202212221120211201202010020220200021221110002 2112201110121001112111021121100201021000220002022 20100020110000220221102211210112111012222001201 1222002000200202020122211002222222002212111122 210021111200110111011200202220001112011010212 112121110202210021120121100111110211121102000 12200010110111020220022111010201112111101122 2021021021211011022122001211011211012022011 01 2220021002110001110021102110111000222002112 2 212121100022201020022221200122112121011101 11 200201102020012222220021110 20011201 211122 10101121211 122200 202111 2112 12112121 10120 1021 01 11220 012 10 0 21 00 2 2 11 12 1 0 21 1 2 12001 2 0 12 22 `

2 WiggansARS Big Data Workshop – July 16, 2015 (2) Mission l Genetic improvement of dairy cattle for economically important traits w Yield (milk, fat, and protein) w Conformation (overall and individual traits) w Longevity (productive life) w Fertility (conception and pregnancy rates) w Calving (dystocia and stillbirth) w Disease resistance (mastitis)

3 WiggansARS Big Data Workshop – July 16, 2015 (3) Data types l Identification information for animal, sire, and dam: w Name w ID number w Birth date l Animal genotypes from marker panels that that range from 2,900 to 777,962 markers w Breed w Herd w Country Courtesy of Illumina, Inc.

4 WiggansARS Big Data Workshop – July 16, 2015 (4) Data types (continued) l Records for milk yield, fat percentage, protein percentage, and somatic cell count (1/month) l Appraiser-assigned scores for  16 body and udder characteristics related to conformation (e.g., stature) l Breeding records that include indicator for conception success l Calving difficulty scores and stillbirth occurrences

5 WiggansARS Big Data Workshop – July 16, 2015 (5) Data amounts l Pedigree records:71,974,045 l Animal genotypes:1,035,590 l Lactation records (since 1960):132,629,200 l Daily yield records (since 1990):641,864,015 l Reproduction event records:176,559,035 l Calving difficulty scores:29,528,607 l Stillbirth scores:19,567,198

6 WiggansARS Big Data Workshop – July 16, 2015 (6) Computing environment l Computation server w 2.27 GHz CPU (32 cores, 64 threads) w 660 GB RAM w 2.7 TB local storage l Database server w 3.4 GHz CPU (12 cores, 24 threads) w 264 GB RAM w 1.3 TB local storage l Shared storage w 38 TB

7 WiggansARS Big Data Workshop – July 16, 2015 (7) Data management l Variable length segments for database rows to minimize space and overhead in identifying data l All marker genotypes for an animal stored each as a single byte in a character large object (CLOB) l All breedings and monthly milk yield and component information for a cow’s lactation stored in variable character data types

8 WiggansARS Big Data Workshop – July 16, 2015 (8) Programming languages lClC w Database interface including data editing l FORTRAN w Calculation of genetic merit estimates l SAS w Data preparation, checking, and delivery

9 WiggansARS Big Data Workshop – July 16, 2015 (9) Calculation schedule l Triannual genetic merit estimates from processed phenotypic data l Monthly genomic evaluations based on estimates of marker effects using genotypic data and triannual phenotype-based evaluations l Weekly evaluations using marker effect estimates from monthly evaluations

10 WiggansARS Big Data Workshop – July 16, 2015 (10) Transition to industry l Council on Dairy Cattle Breeding w Database maintenance w Calculation and distribution of genetic merit estimates w Interface with evaluation users and data suppliers l ARS w Research and development using data made available by Council

11 WiggansARS Big Data Workshop – July 16, 2015 (11) Research resource l Massive amount of genomic data  Location of causal genetic variants l Investigation of haplotypes never found in a homozygous state ÜDiscovery of chromosomal abnormalities resulting in early embryonic death l Investigation of sons of heterozygous sires ÜSpecific markers associated with differences between sons by haplotype

12 WiggansARS Big Data Workshop – July 16, 2015 (12) Genetic merit of marketed Holstein bulls Average gain: $19.42/year Average gain: $47.95/year Average gain: $87.49/year

13 WiggansARS Big Data Workshop – July 16, 2015 (13) Working with sequence data l Sequence available from 1000 Bull Genomes Project hosted in Australia l Project funded by industry to sequence over 200 bulls to create a haplotype library l A posteriori granddaughter design to locate chromosomal segments of interest from 71 bulls each with over 100 genotyped and progeny- tested sons

14 WiggansARS Big Data Workshop – July 16, 2015 (14) Imputing sequence data l Haplotype library supports imputation l Genotypes from genotyping chips can be imputed to full sequence l Lower accuracy of sequence data compared with chip genotypes accommodated by dealing in dosages to represent allele content l Findhap v4 (VanRaden) fast and more accurate than Beagle at low × coverage

15 WiggansARS Big Data Workshop – July 16, 2015 (15) Alignment of sequence data l Alignment – determining location of chromosomal segments provided by sequencer l Findmap – matches segment against library of haplotypes l Preserves low-frequency variants l Does not identify new variants l Uses a hash table to find variant enabling rapid processing

16 WiggansARS Big Data Workshop – July 16, 2015 (16) Accuracy of Findhap vs. Beagle* Sequence + HDImputed from HD ProgramDepthCorrectCorrelationCorrectCorrelation Findhap8×98.70.98195.00.926 4×95.80.93993.10.897 2×91.30.87989.20.837 Beagle8×99.00.98497.10.956 4×95.00.91878.20.582 2×79.50.60263.50.100 *250 bulls had sequence + HD; 250 others were imputed from HD

17 WiggansARS Big Data Workshop – July 16, 2015 (17) Data storage and backup l Disk storage being added w Compression option being investigated l Back up to tape with weekly submission to off- site storage l Expect to have internet 2 connection w Facilitate sharing of sequence data

18 WiggansARS Big Data Workshop – July 16, 2015 (18) Summary l Highly successful program leading to annual increases in genetic merit for production efficiency l Large database of phenotypic and genomic data provided by industry l Research projects to determine mechanism of genetic control of economically important traits l Data processing techniques developed so that rapid turnaround could be realized


Download ppt "WiggansARS Big Data Workshop – July 16, 2015 (1) George R. Wiggans Animal Genomics and Improvement Laboratory Agricultural Research Service, USDA Beltsville,"

Similar presentations


Ads by Google