Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Imputation for GWAS 6 December 2012.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
An Introduction to the application of Molecular Markers
G ENOTYPE AND SNP C ALLING FROM N EXT - GENERATION S EQUENCING D ATA Authors: Rasmus Nielsen, et al. Published in Nature Reviews, Genetics, Presented.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Base quality and read quality: How should data quality be measured? Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
MALD Mapping by Admixture Linkage Disequilibrium.
University of Connecticut
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Computational Challenges in Whole-Genome Association Studies Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
General methods of SNP discovery: PolyBayes Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
NHGRI/NCBI Short-Read Archive: Data Retrieval Gabor T. Marth Boston College Biology Department NCBI/NHGRI Short-Read.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Read mapping and variant calling in human short-read DNA sequences
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Informatics for next-generation sequence analysis – SNP calling Gabor T. Marth Boston College Biology Department PSB 2008 January
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
A Primer on Genetic Variation Variety Lawrence Brody - NHGRI.
High throughput sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department BI543 Fall 2013 January 29, 2013.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
An Efficient Method of Generating Whole Genome Sequence for Thousands of Bulls Chuanyu Sun 1 and Paul M. VanRaden 2 1 National Association of Animal Breeders,
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
CS177 Lecture 10 SNPs and Human Genetic Variation
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Jeffrey Zheng School of Software, Yunnan University August 4, nd International Summit on Integrative Biology August 4-5, 2014 Chicago, USA.
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Lecture 7.01 The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College CGDN Bioinformatics Workshop June.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.
Paul VanRaden and Chuanyu Sun Animal Genomics and Improvement Lab USDA-ARS, Beltsville, MD, USA National Association of Animal Breeders Columbia, MO, USA.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Chapter 23: Evaluation of the Strength of Forensic DNA Profiling Results.
SNP Discovery in Whole-Genome Light-Shotgun 454 Pyrosequences Aaron Quinlan 1, Andrew Clark 2, Elaine Mardis 3, Gabor Marth 1 (1) Department of Biology,
Schematic of the single variant polymorphism (SNP) genotyping assay.
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
Aaron R. Quinlan and Gabor T. Marth Department of Biology, Boston College, Chestnut Hill, MA 02467
Canadian Bioinformatics Workshops
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
Integrated variant detection Erik Garrison, Boston College.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Canadian Bioinformatics Workshops
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
What is Haplotyping? T C A G
Extract DNA and RNA from the same E. coli culture
Example of a common SNP in dogs
JS 115 Validation Pre class activities Database issues- Continued
Discovery tools for human genetic variations
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Biological Databases BI420 – Introduction to Bioinformatics
10 Years of GWAS Discovery: Biology, Function, and Translation
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
IBD Estimation in Pedigrees
10 Years of GWAS Discovery: Biology, Function, and Translation
Presentation transcript:

Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor Laboratory May

Single-base variant calling in 1000G data 1.SNP discovery (for potential follow-up genotyping) 2.Possibly using genotypes called from sequence directly for haplotype phasing (genotype imputation?) Sample size x read coverage / individual = constant What is the best sample size? Not easy to answer only based on idealistic theoretical considerations Simulation studies must model many effects to be realistic

Variant discovery is a complex process aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct seq. readssamplesfragments population genotype priors allele sampling likelihoods base error probabilities aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct G1G1 G2G2 G3G3

Bayesian variant detection math Priors: (1) Nucleotide diversity; (2) Allele frequency distribution; (3) Specific diploid genotype layout Allele sampling likelihoods: Binomial distribution of the number of reads from each of the two chromosomes Base error probabilities: Likelihood that the called base faithfully represents DNA fragment, calculated from the base quality values

SNP calling and genotyping P(SNP) = total probability of all non-monomorphic genotype combinations P(Gi) = marginal probability consequence: data from other individuals influence the genotype call of a given individual: include illustration using testProb program in GigaBayes package.

Variant calling in simulated data: design Analysis by Aaron Quinlan (see poster at the Genome Meeting)

Estimated vs. population allele frequency

Allele frequency (cont’d)

SNP discovery sensitivity

Genotype density 16x: / x: / x: / x: /

Genotype density

Summary / Conclusions

Thanks