CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication
CS273a Lecture 8, Win07, Batzoglou Evolutionary Rates OK X X Still OK? next generation
CS273a Lecture 8, Win07, Batzoglou
Genome Evolution – Macro Events Inversions Deletions Duplications
CS273a Lecture 8, Win07, Batzoglou Synteny maps Comparison of human and mouse
CS273a Lecture 8, Win07, Batzoglou Synteny maps
CS273a Lecture 8, Win07, Batzoglou Orthology, Paralogy, Inparalogs, Outparalogs
CS273a Lecture 8, Win07, Batzoglou Synteny maps
CS273a Lecture 8, Win07, Batzoglou Dog Genome
CS273a Lecture 8, Win07, Batzoglou Synteny maps
CS273a Lecture 8, Win07, Batzoglou Building synteny maps Recommended local aligners BLASTZ Most accurate, especially for genes Chains local alignments WU-BLAST Good tradeoff of efficiency/sensitivity Best command-line options BLAT Fast, less sensitive Good for comparing very similar sequences finding rough homology map
CS273a Lecture 8, Win07, Batzoglou Index-based local alignment Dictionary: All words of length k (~10) Alignment initiated between words of alignment score T (typically T = k) Alignment: Ungapped extensions until score below statistical threshold Output: All local alignments with score > statistical threshold …… query DB query scan Question: Using an idea from overlap detection, better way to find all local alignments between two genomes?
CS273a Lecture 8, Win07, Batzoglou Local Alignments
CS273a Lecture 8, Win07, Batzoglou After chaining
CS273a Lecture 8, Win07, Batzoglou Chaining local alignments 1.Find local alignments 2.Chain -O(NlogN) L.I.S. 3.Restricted DP
CS273a Lecture 8, Win07, Batzoglou Progressive Alignment When evolutionary tree is known: Align closest first, in the order of the tree In each step, align two sequences x, y, or profiles p x, p y, to generate a new alignment with associated profile p result Weighted version: Tree edges have weights, proportional to the divergence in that edge New profile is a weighted average of two old profiles x w y z Example Profile: (A, C, G, T, -) p x = (0.8, 0.2, 0, 0, 0) p y = (0.6, 0, 0, 0, 0.4) s(p x, p y ) = 0.8*0.6*s(A, A) + 0.2*0.6*s(C, A) + 0.8*0.4*s(A, -) + 0.2*0.4*s(C, -) Result: p xy = (0.7, 0.1, 0, 0, 0.2) s(p x, -) = 0.8*1.0*s(A, -) + 0.2*1.0*s(C, -) Result: p x- = (0.4, 0.1, 0, 0, 0.5)
CS273a Lecture 8, Win07, Batzoglou Threaded Blockset Aligner Human–Cow HMR – CD Restricted Area Profile Alignment
CS273a Lecture 8, Win07, Batzoglou Neutral Substitution Rates
CS273a Lecture 8, Win07, Batzoglou Reconstructing the Ancestral Mammalian Genome Human: C Baboon: C Cat: C Dog: G C C or G G
CS273a Lecture 8, Win07, Batzoglou Finding Conserved Elements (1) Binomial method 25-bp window in the human genome Binomial distribution of k matches in N bases given the neutral probability of substitution
CS273a Lecture 8, Win07, Batzoglou Finding Conserved Elements (2) Parsimony Method Count minimum # of mutations explaining each column Assign a probability to this parsimony score given neutral model Multiply probabilities across 25-bp window of human genome A C A A G
CS273a Lecture 8, Win07, Batzoglou Finding Conserved Elements
CS273a Lecture 8, Win07, Batzoglou Finding Conserved Elements (3) GERP
CS273a Lecture 8, Win07, Batzoglou Phylo HMMs HMM Phylogenetic Tree Model Phylo HMM
CS273a Lecture 8, Win07, Batzoglou Finding Conserved Elements (3)
CS273a Lecture 8, Win07, Batzoglou How do the methods agree/disagree?
CS273a Lecture 8, Win07, Batzoglou Statistical Power to Detect Constraint L N C: cutoff # mutations D: neutral mutation rate : constraint mutation rate relative to neutral
CS273a Lecture 8, Win07, Batzoglou Statistical Power to Detect Constraint L N C: cutoff # mutations D: neutral mutation rate : constraint mutation rate relative to neutral