Thomas Willems, Melissa Gymrek, G

Slides:



Advertisements
Similar presentations
Previous Estimates of Mitochondrial DNA Mutation Level Variance Did Not Account for Sampling Error: Comparing the mtDNA Genetic Bottleneck in Mice and.
Advertisements

Characterizing the short tandem repeat mutation process at every locus in the genome Melissa Gymrek Genome Informatics
The Structure of Common Genetic Variation in United States Populations
Michael Dannemann, Janet Kelso  The American Journal of Human Genetics 
Comprehensively Evaluating cis-Regulatory Variation in the Human Prostate Transcriptome by Using Gene-Level Allele-Specific Expression  Nicholas B. Larson,
Marc A. Coram, Huaying Fang, Sophie I. Candille, Themistocles L
Adaptive Evolution of Gene Expression in Drosophila
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
Comparing Algorithms for Genotype Imputation
Henry R. Johnston, David J. Cutler 
Haplotype Estimation Using Sequencing Reads
The Fine-Scale and Complex Architecture of Human Copy-Number Variation
Haibao Tang, Ewen F. Kirkness, Christoph Lippert, William H
Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation- Maximization Algorithm for Unphased Diploid Genotype Data  Daniele.
Evolutionary Rewiring of Human Regulatory Networks by Waves of Genome Expansion  Davide Marnetto, Federica Mantica, Ivan Molineris, Elena Grassi, Igor.
Improved Heritability Estimation from Genome-wide SNPs
Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante 
Rounak Dey, Ellen M. Schmidt, Goncalo R. Abecasis, Seunggeun Lee 
Volume 111, Issue 2, Pages (July 2016)
Alternative Splicing QTLs in European and African Populations
Weight Loss after Gastric Bypass Is Associated with a Variant at 15q26
Biased Gene Conversion Skews Allele Frequencies in Human Populations, Increasing the Disease Burden of Recessive Alleles  Joseph Lachance, Sarah A. Tishkoff 
Arpita Ghosh, Fei Zou, Fred A. Wright 
Michael Dannemann, Janet Kelso  The American Journal of Human Genetics 
Jingjing Li, Xiumei Hong, Sam Mesiano, Louis J
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes  Matthieu Deschamps, Guillaume Laval,
Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data  Gao T. Wang, Bo Peng, Suzanne M. Leal  The.
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
The β-Globin Recombinational Hotspot Reduces the Effects of Strong Selection around HbC, a Recently Arisen Mutation Providing Resistance to Malaria  Elizabeth.
A Subset-Based Approach Improves Power and Interpretation for the Combined Analysis of Genetic Association Studies of Heterogeneous Traits  Samsiddhi.
Integrative Multi-omic Analysis of Human Platelet eQTLs Reveals Alternative Start Site in Mitofusin 2  Lukas M. Simon, Edward S. Chen, Leonard C. Edelstein,
Biased Allelic Expression in Human Primary Fibroblast Single Cells
Jennifer A. Hempelmann, Sheena M. Scroggins, Colin C
Selection and Reduced Population Size Cannot Explain Higher Amounts of Neandertal Ancestry in East Asian than in European Human Populations  Bernard Y.
Wei Zhang, Shiwei Duan, Emily O. Kistner, Wasim K. Bleibel, R
Structural Architecture of SNP Effects on Complex Traits
Studying Gene and Gene-Environment Effects of Uncommon and Common Variants on Continuous Traits: A Marker-Set Approach Using Gene-Trait Similarity Regression 
Catarina D. Campbell, Nick Sampas, Anya Tsalenko, Peter H
Matthieu Foll, Oscar E. Gaggiotti, Josephine T
Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies 
Highly Sensitive Method for Genomewide Detection of Allelic Composition in Nonpaired, Primary Tumor Specimens by Use of Affymetrix Single-Nucleotide–Polymorphism.
Genotype Imputation with Millions of Reference Samples
A Three–Single-Nucleotide Polymorphism Haplotype in Intron 1 of OCA2 Explains Most Human Eye-Color Variation  David L. Duffy, Grant W. Montgomery, Wei.
Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent  Sharon R. Browning, Brian L. Browning  The.
Increased Level of Linkage Disequilibrium in Rural Compared with Urban Communities: A Factor to Consider in Association-Study Design  Veronique Vitart,
Shusuke Numata, Tianzhang Ye, Thomas M
Shuhua Xu, Wei Huang, Ji Qian, Li Jin 
Volume 7, Issue 3, Pages e12 (September 2018)
Pier Francesco Palamara, Todd Lencz, Ariel Darvasi, Itsik Pe’er 
Imputing Phenotypes for Genome-wide Association Studies
Xiaoquan Wen, Yeji Lee, Francesca Luca, Roger Pique-Regi 
Evolutionary History of the ADRB2 Gene in Humans
L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals  Xiaowei Wu, Mary Sara McPeek 
Features of Evolution and Expansion of Modern Humans, Inferred from Genomewide Microsatellite Markers  Lev A. Zhivotovsky, Noah A. Rosenberg, Marcus W.
Long Runs of Homozygosity Are Enriched for Deleterious Variation
Pleiotropic Effects of Trait-Associated Genetic Variation on DNA Methylation: Utility for Refining GWAS Loci  Eilis Hannon, Mike Weedon, Nicholas Bray,
Yu Zhang, Tianhua Niu, Jun S. Liu 
Jung-Ying Tzeng, Chih-Hao Wang, Jau-Tsuen Kao, Chuhsing Kate Hsiao 
Volume 26, Issue 23, Pages (December 2016)
Mutability of Y-Chromosomal Microsatellites: Rates, Characteristics, Molecular Bases, and Forensic Implications  Kaye N. Ballantyne, Miriam Goedbloed,
Enhanced Localization of Genetic Samples through Linkage-Disequilibrium Correction  Yael Baran, Inés Quintela, Ángel Carracedo, Bogdan Pasaniuc, Eran Halperin 
Xing Hua, Haiming Xu, Yaning Yang, Jun Zhu, Pengyuan Liu, Yan Lu 
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
Genotype-Imputation Accuracy across Worldwide Human Populations
Harold A. Nieuwboer, René Pool, Conor V. Dolan, Dorret I
Messages through Bottlenecks: On the Combined Use of Slow and Fast Evolving Polymorphic Markers on the Human Y Chromosome  Peter de Knijff  The American.
Hannah R. Elliott, David C. Samuels, James A. Eden, Caroline L
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
Jung-Ying Tzeng, Daowen Zhang  The American Journal of Human Genetics 
Bruce Rannala, Jeff P. Reeve  The American Journal of Human Genetics 
Presentation transcript:

Population-Scale Sequencing Data Enable Precise Estimates of Y-STR Mutation Rates  Thomas Willems, Melissa Gymrek, G. David Poznik, Chris Tyler-Smith, Yaniv Erlich  The American Journal of Human Genetics  Volume 98, Issue 5, Pages 919-933 (May 2016) DOI: 10.1016/j.ajhg.2016.04.001 Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 1 Method for Estimating Y-STR Mutation Rates Schematic of our procedure for estimating Y-STR mutation rates. The method first genotypes Y-SNPs (step 1) and uses these calls to build a single Y-SNP phylogeny (step 2). This phylogeny provides the evolutionary context required for inferring Y-STR mutational dynamics; samples in the cohort occupy the leaves of the tree, and all other nodes represent unobserved ancestors. Steps 3–6 are then run on each Y-STR individually. After an STR genotyping tool is used for determining each sample’s maximum-likelihood genotype and the number of repeats in each read (step 3), an EM algorithm analyzes all of these repeat counts to learn a stutter model (step 4). In combination with the read-level repeat counts, this model is used for computing each sample’s genotype posteriors (step 5). After a mutation model is randomly initialized, Felsenstein’s pruning algorithm and numerical optimization are used to repeatedly evaluate and improve the likelihood of the model until convergence. The mutation rate in the resulting model provides the maximum-likelihood estimate. The American Journal of Human Genetics 2016 98, 919-933DOI: (10.1016/j.ajhg.2016.04.001) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 2 Validating MUTEA by Using Simulations STR sequencing reads with PCR stutter noise were simulated for a variety of sample sizes and mutation models (“simulation parameters” panels). Applying MUTEA (red line) to these reads led to relatively unbiased mutation-rate estimates (upper panel) with small SDs (second panel). As a negative control, we also applied a naive approach to correct for stutter noise (blue line). This approach computed genotype posteriors by using the fraction of supporting reads, resulting in markedly biased mutation-rate estimates. The American Journal of Human Genetics 2016 98, 919-933DOI: (10.1016/j.ajhg.2016.04.001) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 3 Concordance of Mutation-Rate Estimates across Datasets The heatmap in the upper right corner presents the correlation between log mutation rates obtained from two father-son capillary-based studies (“Ballantyne”21 and “Burgarella”39) and those we obtained by using the 1000 Genomes WGS data (“1000 Genomes”), the Simons Genome WGS data (“SGDP”), and the capillary data available for samples in 1000 Genomes (“Powerplex”). Each cell indicates the number of markers involved in the comparison and the resulting R2. Representative scatterplots for three of these comparisons depict the pair of estimates for each marker (cyan) and the x = y line (red). The black arrow in the comparison of SGDP and Ballantyne shows the effective lower limit of the Ballantyne et al. mutation-rate estimates. The American Journal of Human Genetics 2016 98, 919-933DOI: (10.1016/j.ajhg.2016.04.001) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 4 Distribution of Y-STR Mutation Rates In red, we show the distribution of mutation rates across all STRs in this study. The set of loci with previously characterized mutation rates (orange) is substantially enriched with more-mutable loci. When stratified by motif length, loci with tetranucleotide motifs (dark blue) are the most mutable and are followed by loci with trinucleotide (light blue) and dinucleotide (green) motifs. The American Journal of Human Genetics 2016 98, 919-933DOI: (10.1016/j.ajhg.2016.04.001) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 5 Sequence Determinants of Y-STR Mutability Each panel plots the estimated log mutation rates (y axis) of STRs against either the major-allele length (x axis, left panels) or the length of the longest uninterrupted tract (x axis, right panels) for various sizes of repeat motifs (rows). The black lines represent the mutation rate predicted by a simple linear model. For a given allele length (left panels), Y-STRs with no interruptions to the repeat structure (blue) are generally more mutable than those with one or more interruptions (red). Whereas major-allele length alone is poorly correlated with mutation rate (left panels), the length of the longest uninterrupted tract (right panels) is strongly correlated regardless of the number of interruptions. The American Journal of Human Genetics 2016 98, 919-933DOI: (10.1016/j.ajhg.2016.04.001) Copyright © 2016 The American Society of Human Genetics Terms and Conditions