Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Phylogenetic Trees Lecture 12 This class has been edited from Nir Friedman’s lecture which was available at www.cs.huji.ac.il/~nir. Pictures from Tal.

Similar presentations


Presentation on theme: ". Phylogenetic Trees Lecture 12 This class has been edited from Nir Friedman’s lecture which was available at www.cs.huji.ac.il/~nir. Pictures from Tal."— Presentation transcript:

1 . Phylogenetic Trees Lecture 12 This class has been edited from Nir Friedman’s lecture which was available at Pictures from Tal Pupko slides. Changes by Dan Geiger and Shlomo Moran. Based on pages in Durbin et al (the black text book).

2 2 Evolution Evolution of new organisms is driven by u Diversity l Different individuals carry different variants of the same basic blue print u Mutations l The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. u Selection bias

3 3 The Tree of Life Source: Alberts et al

4 4 D’après Ernst Haeckel, 1891 Tree of life- a better picture

5 5 Primate evolution A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

6 6 Morphological vs. Molecular u Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc. u Modern biological methods allow to use molecular features l Gene sequences l Protein sequences u Analysis based on homologous sequences (e.g., globins) in different species u Important for many aspects of biology l Classification l Understanding biological mechanisms

7 7 Morphological topology Archonta Glires Ungulata Carnivora Insectivora Xenarthra (Based on Mc Kenna and Bell, 1997)

8 8 RatQEPGGLVVPPTDA RabbitQEPGGMVVPPTDA GorillaQEPGGLVVPPTDA CatREPGGLVVPPTEG From sequences to a phylogenetic tree There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

9 9 Perissodactyla Carnivora Cetartiodactyla Rodentia 1 Hedgehogs Rodentia 2 Primates Chiroptera Moles+Shrews Afrotheria Xenarthra Lagomorpha + Scandentia Mitochondrial topology (Based on Pupko et al.,)

10 10 Nuclear topology Cetartiodactyla Afrotheria Chiroptera Eulipotyphla Glires Xenarthra Carnivora Perissodactyla Scandentia+ Dermoptera Pholidota Primate (tree by Madsenl) (Based on Pupko et al. slide)

11 11 Theory of Evolution u Basic idea l speciation events lead to creation of different species. l Speciation caused by physical separation into groups where different genetic variants become dominant u Any two species share a (possibly distant) common ancestor

12 12 Phylogenenetic trees u Leafs - current day species u Nodes - hypothetical most recent common ancestors u Edges length - “time” from one speciation to the next AardvarkBisonChimpDogElephant

13 13 Dangers in Molecular Phylogenies u Gene and protein sequences can be homologous for various reasons: u Orthologs -- sequences diverged after a speciation event. Indicative of a new specie. u Paralogs -- sequences diverged after a duplication event. u Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus).

14 14 Species Phylogeny Gene Phylogenies Speciation events Gene Duplication 1A 2A 3A3B 2B1B Phylogenies can be constructed to describe evolution genes. Three species termed 1,2,3. Two paralog genes A and B.

15 15 Dangers of Paralogs Speciation events Gene Duplication 1A 2A 3A3B 2B1B If we happen to consider only species 1A, 2B, and 3A, we get a wrong tree that does not represent the phylogeny of the host species of the given sequences because duplication does not create new species. In the sequel we assume all given sequences are orthologs.

16 16 Types of Trees A natural model to consider is that of rooted trees Common Ancestor

17 17 Types of trees Unrooted tree represents phylogeny without the root node Depending on the model, data from current day species does not distinguish between different placements of the root. In this example there are seven possible ways to place a root.

18 18 Rooted versus unrooted trees Tree a a b Tree b c Tree c Represents the three rooted trees Slide by Tal Pupko

19 19 Positioning Roots in Unrooted Trees u We can estimate the position of the root by introducing an outgroup: l a set of species that are definitely distant from all the species of interest AardvarkBisonChimpDogElephant Falcon Proposed root

20 20 Type of Data u Distance-based l Input is a matrix of distances between species l Can be fraction of residue they disagree on, or alignment score between them, or … u Character-based l Examine each character (e.g., residue) separately

21 21 Three Methods of Tree Construction u Distance- A tree that recursively combines two nodes of the smallest distance. u Parsimony – A tree with a total minimum number of character changes between nodes. u Maximum likelihood - Finding the best Bayesian network of a tree shape. The method of choice nowadays. Most known and useful software called phylip uses this method.

22 22 Distance-Based (1 st type Method) Input: distance matrix between species Outline: u Cluster species together u Initially clusters are singletons u At each iteration combine two “closest” clusters to get a new one

23 23 UPGMA Clustering  Let C i and C j be clusters, define distance between them to be  When we combine two cluster, C i and C j, to form a new cluster C k, then  Define a node K and place its daughter nodes at depth d(C i,C j )/2

24 24 Example UPGMA construction on five objects. The length of an edge = its (vertical) height d(7,8) d(2,3)

25 25 Molecular clock This phylogenetic tree has all leaves in the same level. When this property holds, the phylogenetic tree is said to satisfy a molecular clock. Namely, the time from a speciation event to the formation of current species is identical for all paths (wrong assumption in reality).

26 26 Molecular Clock UPGMA 2341 UPGMA constructs trees that satisfy a molecular clock, even if the true tree does not satisfy a molecular clock.

27 27 Restrictive Correctness of UPGMA Proposition: If the distance function is derived by adding edge distances in a tree T with a molecular clock, then UPGMA will reconstruct T. Proof idea: Move a horizontal line from the bottom of the T to the top. Whenever an internal node is formed, the algorithm will create it.

28 28 Additivity Molecular clock defines additive distances, namely, distances between objects can be realized by a tree: a b c i j k

29 29 Basic property of Additivity u Suppose input distances are additive u For any three leaves u Thus a b c i j m k

30 30 Constructing additive trees: The neighbor finding problem u Can we use this fact to construct trees assuming only additivity (but not a molecular clock)? Yes. The formula shows that if we knew that i and j are neighboring leaves, then we can construct their parent node k and compute the distances of k to all other leaves m. We remove nodes i,j and add k.

31 31 Neighbor Finding How can we find from distances alone that a pair of nodes i,j are neighboring leaves? Closest nodes aren’t necessarily neighbors. A B C D Next we show one way to find neighbors from additive distances.

32 32 Neighbor Finding Theorem (Saitou&Nei) Assume all edge weights are positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree. i j kl m T1T1 T2T2

33 33 Neighbor Joining Algorithm  Set L to contain all leaves Iteration:  Choose i,j such that D(i,j) is minimal  Create new node k, and set  remove i,j from L, and add k Terminate: when |L| =2, connect two remaining nodes i j k m

34 34 Neighbor Finding Notations used in the proof p(i,j) = the path from vertex i to vertex j; P(D,C) = (e 1,e 2,e 3 ) = (D,E,F,C) A B C D e1e1 e3e3 e2e2 For a vertex i, and an edge e=(i’,j’): N i (e) = |{k : e is on p(i,k)}|. N D (e 1 ) = 3, N D (e 2 ) = 2, N D (e 3 ) = 1 N C (e 1 ) = 1 EF

35 35 Neighbor Finding i j kl Rest of T Lemma: For leaves i,j connected by a path (i,l,…,k,j), Notation: For e=(i,m), we denote d(i,m) by d(e).

36 36 Neighbor Finding Proof of Theorem: Assume by contradiction that D(i,j) is minimal for i,j which are not neighboring leaves. Let (i,l,...,k,j) be the path from i to j. Let T 1 and T 2 be the subtrees rooted at l and k. Let |T| denote the number of leaves in T. i j kl T1T1 T2T2

37 37 Neighbor Finding Case 1: i or j has a neighboring leaf. WLOG j and m are such leaves. A. D(i,j) - D(m,j)=(L-2)(d(i,j) - d(j,m) ) – (r i +r j ) + r m + r j {Definition} =(L-2)(d(i,k)-d(k,m) )+r m -r i {Figure} i j kl m T2T2 B. r m -r i ≥ (L-2)(d(k,m)-d(i,l)) + (4-L)d(k,l) {Lemma+Figure} (since for each edge e  P(k,l), N m (e)≥2 and N i (e)  L-2, so N m (e)- N i (e ) ≥ 4-L ) Substituting B in A: D(i,j) - D(m,j) ≥ (L-2)(d(i,k)-d(i,l))+ (4-L)d(k,l) = 2d(k,l) > 0, contradicting the minimality assumption.

38 38 Neighbor Finding Case 2: Not case 1. Then both T 1 and T 2 contain 2 neighboring leaves. We show that if D(i,j) is minimal, then we must have both |T 1 | > |T 2 | and |T 2 | > |T 1 | - which is a contradiction, hence D(i,j) is not minimal. i j k l m n p T1T1 T2T2 We prove that |T 1 | > |T 2 | by assuming that |T 1 | ≤ |T 2 | and reaching a contradiction. The proof that |T 2 | > |T 1 | is similar. Let n,m be neighboring leaves in T 1.

39 39 Neighbor Finding i j k l m n p T1T1 T2T2 A. 0 ≤ D(m,n) - D(i,j)= (L-2)(d(m,n) - d(i,j) ) + (r i +r j ) – (r m +r n ) C. r i -r n < (L-2)(d(i,k) – d(n,p)) + (|T 1 |-|T 2 |)d(l,p) Adding B and C, noting that d(l,p)>d(k,p) and using the assumption |T 1 | - |T 2 | ≤ 0: D. (r i +r j ) – (r m +r n ) < (L-2)(d(i,j)-d(n,m)) + 2(|T 1 |-|T 2 |)d(k,p) Substituting D in the right hand side of A: 0 ≤ D(m,n) - D(i,j)< 2(|T 1 |-|T 2 |)d(k,p), hence |T 1 |-|T 2 | > 0, a contradiction. B. r j -r m < (L-2)(d(j,k) – d(m,p)) + (|T 1 |-|T 2 |)d(k,p) (Because N j (e)- N m (e ) < |T 1 |-|T 2 |).


Download ppt ". Phylogenetic Trees Lecture 12 This class has been edited from Nir Friedman’s lecture which was available at www.cs.huji.ac.il/~nir. Pictures from Tal."

Similar presentations


Ads by Google