Download presentation
Presentation is loading. Please wait.
1
Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5
2
Final Exam 24-hour, takehome exam More straight-forward questions than in homeworks Please email Michael and Serafim by Friday, with your preference of day to take exam Exam starts Sunday, …, Thursday noon; ends Monday,..., Friday noon
3
Number of labeled unrooted tree topologies How many possibilities are there for leaf 4? 1 2 3 4 4 4
4
Number of labeled unrooted tree topologies How many possibilities are there for leaf 4? For the 4 th leaf, there are 3 possibilities 1 2 3 4
5
Number of labeled unrooted tree topologies How many possibilities are there for leaf 5? For the 5 th leaf, there are 5 possibilities 1 2 3 4 5
6
Number of labeled unrooted tree topologies How many possibilities are there for leaf 6? For the 6 th leaf, there are 7 possibilities 1 2 3 4 5
7
Number of labeled unrooted tree topologies How many possibilities are there for leaf n? For the n th leaf, there are 2n – 5 possibilities 1 2 3 4 5
8
Number of labeled unrooted tree topologies #unrooted trees for n taxa: (2n-5)*(2n-7)*...*3*1 = (2n-5)! / [2n-3*(n-3)!] #rooted trees for n taxa: (2n-3)*(2n-5)*(2n-7)*...*3 = (2n-3)! / [2n-2*(n-2)!] 1 2 3 4 5 N = 10 #unrooted: 2,027,025 #rooted: 34,459,425 N = 30 #unrooted: 8.7x10 36 #rooted: 4.95x10 38
9
Search through tree topologies: Branch and Bound Observation: adding an edge to an existing tree can only increase the parsimony cost Enumerate all unrooted trees with at most n leaves: [i 3 ][i 5 ][i 7 ]……[i 2N–5] ] where each i k can take values from 0 (no edge) to k At each point keep C = smallest cost so far for a complete tree Start B&B with tree [1][0][0]……[0] Whenever cost of current tree T is > C, then: T is not optimal Any tree extending T with more edges is not optimal: Increment by 1 the rightmost nonzero counter
10
Bootstrapping to get the best trees Main outline of algorithm 1.Select random columns from a multiple alignment – one column can then appear several times 2.Build a phylogenetic tree based on the random sample from (1) 3.Repeat (1), (2) many (say, 1000) times 4.Output the tree that is constructed most frequently
11
Probabilistic Methods A more refined measure of evolution along a tree than parsimony P(x 1, x 2, x root | t 1, t 2 ) = P(x root ) P(x 1 | t 1, x root ) P(x 2 | t 2, x root ) If we use Jukes-Cantor, for example, and x 1 = x root = A, x 2 = C, t 1 = t 2 = 1, = p A ¼(1 + 3e -4α ) ¼(1 – e -4α ) = (¼) 3 (1 + 3e -4α )(1 – e -4α ) x1x1 t2t2 x root t1t1 x2x2
12
Probabilistic Methods If we know all internal labels x u, P(x 1, x 2, …, x N, x N+1, …, x 2N-1 | T, t) = P(x root ) j root P(x j | x parent(j), t j, parent(j) ) Usually we don’t know the internal labels, therefore P(x 1, x 2, …, x N | T, t) = x N+1 x N+2 … x 2N-1 P(x 1, x 2, …, x 2N-1 | T, t) x root = x 2N-1 x1x1 x2x2 xNxN xuxu
13
Computing the Likelihood of a Tree Define P(L k | a): probability of subtree rooted at x k, given that x k = a Then, P(L k | a) = ( b P(L i | b) P(b | a, t ki ) )( c P(L j | c) P(c | a, t ki ) ) xkxk xixi xjxj t ki t kj
14
Felsenstein’s Likelihood Algorithm To calculate P(x 1, x 2, …, x N | T, t) Initialization: Set k = 2N – 1 Recursion: Compute P(L k | a) for all a If k is a leaf node: Set P(L k | a) = 1(a = x k ) If k is not a leaf node: 1. Compute P(L i | b), P(L j | b) for all b, for daughter nodes i, j 2. Set P(L k | a) = b,c P(b | a, t ki )P(L i | b) P(c | a, t kj ) P(L j | c) Termination: Likelihood at this column = P(x 1, x 2, …, x N | T, t) = a P(L 2N-1 | a)P(a)
15
Probabilistic Methods Given M (ungapped) alignment columns of N sequences, Define likelihood of a tree: L(T, t) = P(Data | T, t) = m=1…M P(x 1m, …, x nm, T, t) Maximum Likelihood Reconstruction: Given data X = (x ij ), find a topology T and length vector t that maximize likelihood L(T, t)
16
Some new sequencing technologies
17
Molecular Inversion Probes
19
Single Molecule Array for Genotyping—Solexa
20
Nanopore Sequencing http://www.mcb.harvard.edu/branton/index.htm
21
Nanopore Sequencing http://www.mcb.harvard.edu/branton/index.htm
22
Nanopore Sequencing—Assembly Resulting reads are likely to look different than Sanger reads: Long (perhaps 10,000bp-1,000,000bp) High error rate (perhaps 10% – 30%) Two colors? A/ CTG AT/ CG AG/ CT How can we assemble under such conditions?
23
Pyrosequencing
24
Pyrosequencing on a chip Mostafa Ronaghi, Stanford Genome Technologies Center 454 Life Sciences
25
Pyrosequencing Signal
26
Pyrosequencing—Assembly Resulting reads are likely to look different than Sanger reads: Short (currently 100 to 200 bp) Low error rates, except in homopolymeric runs (AAA…, CCC…, etc) Currently, not known how to do paired reads on a chip ?
27
Polony Sequencing
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.