Download presentation
Presentation is loading. Please wait.
Published byMelvyn Newman Modified over 6 years ago
1
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class
2
Reduced Representation Sequencing
600 500 400 clone analyze
3
Random Shotgun Reads Aligned to Whole Genome
entire genome reads
4
Overlapping BAC Clones
5
ESTs may contain multiple exons
whole genome ESTs may contain multiple exons - may be alternative splice variants of a single gene
6
Results: Validations: 1.42 million SNPs throughout the human genome
average density = 1 SNP/ 1.3kb Validations: Random samples of SNPs evaluated in independent population samples to allele frequency 95% - polymorphic 4% - non-polymorphic (false positives) 1% - uniformly heterozygous Random samples of SNPs studied in different ethnic groups 82% - polymorphic in at least one ethnic group (>10% allele frequency) 77% - polymorphic in at least one ethnic group (>20% allele frequency)
7
Heterozygosity (π) of Chromosomes
Chromosome π (x10-4) Y X remaining avg = 7.65
8
Nucleotide Diversity GC content, heterozygosity
HLA locus on chromosome 6 highly heterozygous
9
SNPs in the public domain: how useful are they?
Allele frequencies of SNPs found in dbSNP TSC SNPs Overlap SNPs # STSs that failed / (5.3%) / (14.8%) PCR and sequencing Total characterized SNPs not detecteda / (17.3%) / (16.8%) Uncommon SNPsb (6.0%) (7.1%) Common SNPs in ≥1 populationc (76.7%) (76.1%) Common SNPs in ≥2 populationc (52.4%) (54.3%) Common SNPs in ≥3 populationc (27.0%) (26.9%) amonomorphic (only one of the 2 predicted alleles found in all 3 populations bminor allele frequency <20% in all 3 populations ca SNP is “common” when minor allele frequency is ≥20% Marth et al., 2001
10
Conclusions For researchers interested in using the publicly available
candidate SNPs: 66-70% chance that SNPs have <20% minor allele frequency 50% chance that SNPs have ≥20% minor allele frequency Major Concern: candidate SNPs may not be true polymorphisms but rather duplicated regions of the genome with near-identical sequences
11
Quality and Completeness of
SNP Databases # SNPs identified %validated by resequencing All SNPs % Double-hit SNPs % 12% non-validation rate may be due to: false-positive SNPs (errors in construction of SNP databases) rare variants false-negative SNPs (SNPs missed in resequencing) population-specific SNPs Different rates of non-validation reported in other surveys may reflect true SNPs that were missed or inherent differences between the different sets of genes Double hits are SNPs for which both alleles have been seen more than once and therefore, more common and more ideal for the construction of haplotype maps
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.