Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI2950-C Lecture 8 Molecular Phylogeny: Parsimony and Likelihood

Similar presentations


Presentation on theme: "CSCI2950-C Lecture 8 Molecular Phylogeny: Parsimony and Likelihood"— Presentation transcript:

1 CSCI2950-C Lecture 8 Molecular Phylogeny: Parsimony and Likelihood

2 Phylogenetic Trees How are these trees built from DNA sequences?
1 4 3 2 5 Leaves represent existing species Internal vertices represent ancestors Root represents the oldest evolutionary ancestor

3 Phylogenetic Trees How are these trees built from DNA sequences?
1 4 3 2 5 Methods Distance Parsimony Minimum number of mutations Likelihood Probabilistic model of mutations

4 Outline Last Lecture: distance-based Methods Additive distances
4 Point condition UPGMA & Neighbor joining Today: Parsimony-based methods Sankoff + Fitch’s algorithms Likelihood Methods Perfect Phylogeny

5 Weighted Small Parsimony Problem: Formulation
Input: Tree T with each leaf labeled by elements of a k-letter alphabet and a k x k scoring matrix (ij) Output: Labeling of internal vertices of the tree T minimizing the weighted parsimony score

6 Sankoff Algorithm Dynamic Programming
Calculate and keep track of a score for every possible label at each vertex st(v) = minimum parsimony score of the subtree rooted at vertex v if v has character t The score at each vertex is based on scores of its children: st(parent) = mini {si( left child ) + i, t} + minj {sj( right child ) + j, t}

7 Sankoff Algorithm (cont.)
Begin at leaves: If leaf has the character in question, score is 0 Else, score is 

8 Sankoff Algorithm (cont.)
st(v) = mini {si(u) + i, t} + minj{sj(w) + j, t} si(u) i, A sum A T 3 G 4 C 9 sA(v) = 0 sA(v) = mini{si(u) + i, A} + minj{sj(w) + j, A}

9 Sankoff Algorithm (cont.)
st(v) = mini {si(u) + i, t} + minj{sj(w) + j, t} sj(u) j, A sum A T 3 G 4 C 9 sA(v) = 0 sA(v) = mini{si(u) + i, A} + minj{sj(w) + j, A} + 9 = 9

10 Sankoff Algorithm (cont.)
st(v) = mini {si(u) + i, t} + minj{sj(w) + j, t} Repeat for T, G, and C

11 Sankoff Algorithm (cont.)
Repeat for right subtree

12 Sankoff Algorithm (cont.)
Repeat for root

13 Sankoff Algorithm (cont.)
Smallest score at root is minimum weighted parsimony score In this case, 9 – so label with T

14 Sankoff Algorithm: Traveling down the Tree
The scores at the root vertex have been computed by going up the tree After the scores at root vertex are computed the Sankoff algorithm moves down the tree and assign each vertex with optimal character.

15 Sankoff Algorithm (cont.)
9 is derived from 7 + 2 So left child is T, And right child is T

16 Sankoff Algorithm (cont.)
And the tree is thus labeled…

17 Fitch’s Algorithm Solves Small Parsimony problem
Published 4 years before Sankoff (1971) Assigns a set of letters to every vertex in the tree, S(v) S(l) = observed character for each leaf l

18 Fitch’s Algorithm: Example
{a,c} {t,a} c t a a a a a a {a,c} {t,a} a a c t a a c t

19 Fitch Algorithm 1) Assign a set of possible letters Sv to every vertex vertex v, traversing the tree from leaves to root For vertex v with children u and w: Sv = Su “intersect” Sw if non-empty intersection Su “union” Sw , otherwise E.g. if the node we are looking at has a left child labeled {A, C} and a right child labeled {A, T}, the node will be given the set {A, C, T}

20 Fitch Algorithm (cont.)
2) Assign labels to each vertex, traversing the tree from root to leaves Assign root arbitrarily from its set of letters For all other vertices, if its parent’s label is in its set of letters, assign it its parent’s label Else, choose an arbitrary letter from its set as its label

21 Fitch Algorithm (cont.)

22 Fitch vs. Sankoff Both have an O(nk) runtime
Are they actually different? Let’s compare …

23 Fitch As seen previously:

24 Comparison of Fitch and Sankoff
As seen earlier, the scoring matrix for the Fitch algorithm is merely: So let’s do the same problem using Sankoff algorithm and this scoring matrix A T G C 1

25 Sankoff

26 Sankoff vs. Fitch The Sankoff algorithm gives the same set of optimal labels as the Fitch algorithm For Sankoff algorithm, character t is optimal for vertex v if st(v) = min1<i<ksi(v) Let Sv = set of optimal letters for v. Then Sv = Su “intersect” Sw if non-empty intersection Su “union” Sw , otherwise This is also the Fitch recurrence The two algorithms are identical

27 Large Parsimony Problem
Input: An n x m matrix M describing n species, each represented by an m-character string Output: A tree T with n leaves labeled by the n rows of matrix M, and a labeling of the internal vertices such that the parsimony score is minimized over all possible trees and all possible labelings of internal vertices

28 Large Parsimony Problem (cont.)
Possible search space is huge, especially as n increases (2n – 3)!! possible rooted trees (2n – 5)!! possible unrooted trees Problem is NP-complete Exhaustive search only possible w/ small n(< 10) Hence, branch and bound or heuristics used

29 Nearest Neighbor Interchange A Greedy Algorithm
A Branch Swapping algorithm Only evaluates a subset of all possible trees Defines a neighbor of a tree as one reachable by a nearest neighbor interchange A rearrangement of the four subtrees defined by one internal edge Only three different rearrangements per edge

30 Nearest Neighbor Interchange

31 Nearest Neighbor Interchange
Start with an arbitrary tree and check its neighbors Move to a neighbor if it provides the best improvement in parsimony score No way of knowing if the result is the most parsimonious tree Could be stuck in local optimum

32 Nearest Neighbor Interchange

33 Subtree Pruning and Regrafting Another Branch Swapping Algorithm

34 Tree Bisection and Reconnection Another Branch Swapping Algorithm
Most extensive swapping routine

35 Homoplasy Given: 1: CAGCAGCAG 2: CAGCAGCAG 3: CAGCAGCAGCAG 4: CAGCAGCAG 5: CAGCAGCAG 6: CAGCAGCAG 7: CAGCAGCAGCAG Most would group 1, 2, 4, 5, and 6 as having evolved from a common ancestor, with a single mutation leading to the presence of 3 and 7

36 Homoplasy But what if this was the real tree?

37 Homoplasy 6 evolved separately from 4 and 5
Parsimony groups 4, 5, and 6 together as having evolved from a common ancestor Homoplasy: Independent (or parallel) evolution of same/similar characters Parsimony results minimize homoplasy, so if homoplasy is common, parsimony may give wrong results

38 Contradicting Characters
An evolutionary tree is more likely to be correct when it is supported by multiple characters Human Lizard MAMMALIA Hair Single bone in lower jaw Lactation etc. Frog Dog Note: In this case, tails are homoplastic

39 Perfect Phylogeny Evolutionary model Binary characters {0,1}
Each character changes state only once in evolutionary history (no homoplasy!). Tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. For simplicity, assume root = (0, 0, 0, 0, 0) How can one reconstruct such a tree? A B C D E species traits 1

40 The 4-gamete condition A column i partitions the set of species into two sets i0, and i1 A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogeneous. Example: i is heterogeneous w.r.t {A,D,E} i A 0 B 0 C 0 D 1 E 1 F 1 i0 i1

41 4 Gamete Condition There exists a perfect phylogeny if and only if for all pair of columns (i, j), j is homogenous w.r.t i0 or i1. Equivalently, There exists a perfect phylogeny if and only if for all pairs of columns (i, j), the following 4 rows do not exist (0,0), (0,1), (1,0), (1,1) i A 0 B 0 C 0 D 1 E 1 F 1 i0 i1

42 4-gamete condition: proof
(only if) Every perfect phylogeny satisfies the 4-gamete condition Depending on which edge the mutation j occurs, either i0, or i1 should be homogenous. (if) If the 4-gamete condition is satisfied, does a perfect phylogeny exist? Need to give an algorithm… i0 i1 i

43 An algorithm for constructing a perfect phylogeny
We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later. In any tree, each node (except the root) has a single parent. It is sufficient to construct a parent for every node. In each step, we add a column and refine some of the nodes containing multiple children. Stop if all columns have been considered.

44 Inclusion Property For any pair of columns i, j:
i < j if and only if i1  j1 Note that if i < j then the edge containing i is an ancestor of the edge containing j i j

45 Example r A B C D E A B C D E Initially, there is a single clade r, and each node has r as its parent

46 Sort columns Sort columns according to the inclusion property: i < j if and only if i1  j1 This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order A B C D E

47 Add first column In adding column i
A B C D E In adding column i Check each edge and decide which side you belong. Finally add a node if you can resolve a clade r u B D A C E

48 Adding other columns A B C D E Add other columns on edges using the ordering property r 1 3 E 2 B 5 4 D A C

49 Unrooted case Switch the values in each column, so that 0 is the majority element. Apply the algorithm for the rooted case

50 Problems with Parsimony
Ignores branch lengths on trees A A A A A A A C A A A A A C Same parsimony score. Mutation “more likely” on longer branch.

51 Maximum Likelihood See Class Notes

52 Algorithm Summary Method Input Output Neighbor Joining
Distance matrix D T, B UPGMA Sankoff’s & Fitch’s Alg. Characters, T A, B Perfect Phylogeny Characters A, B, T Felsenstein Characters, T, B A Distance based Parsimony Probabilistic T = tree topology B = branch lengths A = ancestral states

53 Gene Tree vs. Species Tree

54 Non-tree evolution Recombination, hybridization, horizontal gene transfer

55 Using Multiple Methods
Important to keep in mind that reliance on purely one method for phylogenetic analysis provides incomplete picture When different methods (parsimony, distance-based, etc.) all give same result, more likely that the result is correct

56 How Many Times Evolution Invented Wings?
Whiting, et. al. (2003) looked at winged and wingless stick insects

57 Reinventing Wings Previous studies had shown winged  wingless transitions Wingless  winged transition much more complicated (need to develop many new biochemical pathways) Used multiple tree reconstruction techniques, all of which required re-evolution of wings

58 Most Parsimonious Evolutionary Tree of Winged and Wingless Insects
The evolutionary tree is based on both DNA sequences and presence/absence of wings Most parsimonious reconstruction gave a wingless ancestor

59 Will Wingless Insects Fly Again?
Since the most parsimonious reconstructions all required the re-invention of wings, it is most likely that wing developmental pathways are conserved in wingless stick insects

60 Phylogenetic Analysis of HIV Virus
Lafayette, Louisiana, 1994 – A woman claimed her ex-lover (who was a physician) injected her with HIV+ blood Records show the physician had drawn blood from an HIV+ patient that day But how to prove the blood from that HIV+ patient ended up in the woman?

61 HIV Transmission HIV has a high mutation rate, which can be used to trace paths of transmission Two people who got the virus from two different people will have very different HIV sequences Three different tree reconstruction methods (including parsimony) were used to track changes in two genes in HIV (gp120 and RT)

62 HIV Transmission Took multiple samples from the patient, the woman, and controls (non-related HIV+ people) In every reconstruction, the woman’s sequences were found to be evolved from the patient’s sequences, indicating a close relationship between the two Nesting of the victim’s sequences within the patient sequence indicated the direction of transmission was from patient to victim This was the first time phylogenetic analysis was used in a court case as evidence (Metzker, et. al., 2002)

63 Evolutionary Tree Leads to Conviction

64 Current popular methods
HUNDREDS of programs available! Some recommended programs: Discrete—Parsimony-based Rec-1-DCM3 Tandy Warnow and colleagues Probabilistic SEMPHY Nir Friedman and colleagues

65 Sources Metzker et al. Molecular evidence of HIV-1 transmission in a criminal case. PNAS 2002. Whiting et al. “Loss and recovery of wings in stick insects” Nature 421, Serafim Batzoglou (Phylogeny slides) (Phylogeny slides) V. Bafna (Perfect Phylogeny slides)


Download ppt "CSCI2950-C Lecture 8 Molecular Phylogeny: Parsimony and Likelihood"

Similar presentations


Ads by Google