Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.

Similar presentations


Presentation on theme: "Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor."— Presentation transcript:

1 Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor Laboratory May 5-6. 2008

2 Single-base variant calling in 1000G data 1.SNP discovery (for potential follow-up genotyping) 2.Possibly using genotypes called from sequence directly for haplotype phasing (genotype imputation?) Sample size x read coverage / individual = constant What is the best sample size? Not easy to answer only based on idealistic theoretical considerations Simulation studies must model many effects to be realistic

3 Variant discovery is a complex process aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct seq. readssamplesfragments population genotype priors allele sampling likelihoods base error probabilities aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct aacgtTaggct aacgtCaggct G1G1 G2G2 G3G3

4 Bayesian variant detection math Priors: (1) Nucleotide diversity; (2) Allele frequency distribution; (3) Specific diploid genotype layout Allele sampling likelihoods: Binomial distribution of the number of reads from each of the two chromosomes Base error probabilities: Likelihood that the called base faithfully represents DNA fragment, calculated from the base quality values

5 SNP calling and genotyping P(SNP) = total probability of all non-monomorphic genotype combinations P(Gi) = marginal probability consequence: data from other individuals influence the genotype call of a given individual: include illustration using testProb program in GigaBayes package.

6 Variant calling in simulated data: design Analysis by Aaron Quinlan (see poster at the Genome Meeting)

7 Estimated vs. population allele frequency

8 Allele frequency (cont’d)

9 SNP discovery sensitivity

10 Genotype density 100 @ 16x: 0.975 +/- 0.121 200 @ 8x: 0.968 +/- 0.129 400 @ 4x: 0.924 +/- 0.151 800 @ 2x: 0.769 +/- 0.154

11 Genotype density

12 Summary / Conclusions

13 Thanks


Download ppt "Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor."

Similar presentations


Ads by Google