Presentation is loading. Please wait.

Presentation is loading. Please wait.

Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467.

Similar presentations


Presentation on theme: "Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467."— Presentation transcript:

1 Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467

2 Types of sequence variations Substitution-type single-nucleotide polymorphisms are the most abundant form of sequence variations Various insertion-deletion type polymorphisms (INDELs) are also very common

3 Are all substitutions SNPs? systematic pattern of bi-allelism within the population examined

4 What is SNP discovery? comparative analysis of multiple sequences from the same region of the genome (redundant sequence coverage) includes the organization of sequences relative to each other, and determining if sequence differences are sequencing artifacts or true polymorphisms ?

5 Steps of SNP discovery Sequence clustering Paralog identification (cluster refinement) Multiple alignment SNP detection

6 SNP discovery in diverse sequences many different types of sequences are available for polymorphism discovery EST WGS BAC BAC-end genome restriction fragments different sequence types are radically different in terms of their accuracy genome sequence: 99.9 – 99.99% single pass sequence: 98-99% early methods of SNP discovery focused on specific sequence types

7 General SNP mining – PolyBayes sequence clustering simplifies to database search with genome reference paralog filtering by counting mismatches weighed by quality values multiple alignment by anchoring fragments to genome reference SNP detection by differentiating true polymorphism from sequencing error using quality values

8 SNP validation Direct re-sequencing African Asian Caucasian Hispanic CHM 1 Pooled sequencing Validation experiments show that the SNP probability or SNP score is accurate The SNP score allows one to choose cutoff values that balance false positive rate and the recovery of rare SNPs discardkeep

9 Genome-scale SNP mining projects Random, shotgun reads from whole-genome libraries aligned to the genome reference sequence Overlaps of large-insert clone sequences

10 SNP genotyping SNP discovery: which nucleotides in the genome are polymorphic? SNP genotyping: which alleles does an individual carry at a nucleotide locus that is known to be polymorphic? a g aacgtttatgtgatt|ccagtaaa|tacggca c t aacgtttatgtgattaccagtaaattacggca aacgtttatgtgattcccagtaaattacggca person 1. aacgtttatgtgattaccagtaaattacggca aacgtttatgtgattcccagtaaagtacggca person 2.

11 Genotyping by sequence heterozygous peak homoozygous peak

12 nucleotide diversity on human chromosomes Genome variation landscape “sparse” “dense” marker density “rare” “common” allele frequency


Download ppt "Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467."

Similar presentations


Ads by Google