Download presentation

Presentation is loading. Please wait.

Published byParker Parsells Modified about 1 year ago

1
. Phylogenetic Trees Lecture 12 This class has been edited from Nir Friedman’s lecture which was available at www.cs.huji.ac.il/~nir. Pictures from Tal Pupko slides. Changes by Dan Geiger and Shlomo Moran. www.cs.huji.ac.il Based on pages 160-176 in Durbin et al (the black text book).

2
2 Evolution Evolution of new organisms is driven by u Diversity l Different individuals carry different variants of the same basic blue print u Mutations l The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. u Selection bias

3
3 The Tree of Life Source: Alberts et al

4
4 D’après Ernst Haeckel, 1891 Tree of life- a better picture

5
5 Primate evolution A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

6
6 Morphological vs. Molecular u Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc. u Modern biological methods allow to use molecular features l Gene sequences l Protein sequences u Analysis based on homologous sequences (e.g., globins) in different species u Important for many aspects of biology l Classification l Understanding biological mechanisms

7
7 Morphological topology Archonta Glires Ungulata Carnivora Insectivora Xenarthra (Based on Mc Kenna and Bell, 1997)

8
8 RatQEPGGLVVPPTDA RabbitQEPGGMVVPPTDA GorillaQEPGGLVVPPTDA CatREPGGLVVPPTEG From sequences to a phylogenetic tree There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

9
9 Perissodactyla Carnivora Cetartiodactyla Rodentia 1 Hedgehogs Rodentia 2 Primates Chiroptera Moles+Shrews Afrotheria Xenarthra Lagomorpha + Scandentia Mitochondrial topology (Based on Pupko et al.,)

10
10 Nuclear topology Cetartiodactyla Afrotheria Chiroptera Eulipotyphla Glires Xenarthra Carnivora Perissodactyla Scandentia+ Dermoptera Pholidota Primate (tree by Madsenl) (Based on Pupko et al. slide)

11
11 Theory of Evolution u Basic idea l speciation events lead to creation of different species. l Speciation caused by physical separation into groups where different genetic variants become dominant u Any two species share a (possibly distant) common ancestor

12
12 Phylogenenetic trees u Leafs - current day species u Nodes - hypothetical most recent common ancestors u Edges length - “time” from one speciation to the next AardvarkBisonChimpDogElephant

13
13 Dangers in Molecular Phylogenies u Gene and protein sequences can be homologous for various reasons: u Orthologs -- sequences diverged after a speciation event. Indicative of a new specie. u Paralogs -- sequences diverged after a duplication event. u Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus).

14
14 Species Phylogeny Gene Phylogenies Speciation events Gene Duplication 1A 2A 3A3B 2B1B Phylogenies can be constructed to describe evolution genes. Three species termed 1,2,3. Two paralog genes A and B.

15
15 Dangers of Paralogs Speciation events Gene Duplication 1A 2A 3A3B 2B1B If we happen to consider only species 1A, 2B, and 3A, we get a wrong tree that does not represent the phylogeny of the host species of the given sequences because duplication does not create new species. In the sequel we assume all given sequences are orthologs.

16
16 Types of Trees A natural model to consider is that of rooted trees Common Ancestor

17
17 Types of trees Unrooted tree represents phylogeny without the root node Depending on the model, data from current day species does not distinguish between different placements of the root. In this example there are seven possible ways to place a root.

18
18 Rooted versus unrooted trees Tree a a b Tree b c Tree c Represents the three rooted trees Slide by Tal Pupko

19
19 Positioning Roots in Unrooted Trees u We can estimate the position of the root by introducing an outgroup: l a set of species that are definitely distant from all the species of interest AardvarkBisonChimpDogElephant Falcon Proposed root

20
20 Type of Data u Distance-based l Input is a matrix of distances between species l Can be fraction of residue they disagree on, or alignment score between them, or … u Character-based l Examine each character (e.g., residue) separately

21
21 Three Methods of Tree Construction u Distance- A tree that recursively combines two nodes of the smallest distance. u Parsimony – A tree with a total minimum number of character changes between nodes. u Maximum likelihood - Finding the best Bayesian network of a tree shape. The method of choice nowadays. Most known and useful software called phylip uses this method. http://evolution.genetics.washington.edu/phylip.html

22
22 Distance-Based (1 st type Method) Input: distance matrix between species Outline: u Cluster species together u Initially clusters are singletons u At each iteration combine two “closest” clusters to get a new one

23
23 UPGMA Clustering Let C i and C j be clusters, define distance between them to be When we combine two cluster, C i and C j, to form a new cluster C k, then Define a node K and place its daughter nodes at depth d(C i,C j )/2

24
24 Example UPGMA construction on five objects. The length of an edge = its (vertical) height. 234 1 6 8 9 0.5d(7,8) 5 7 0.5d(2,3)

25
25 Molecular clock This phylogenetic tree has all leaves in the same level. When this property holds, the phylogenetic tree is said to satisfy a molecular clock. Namely, the time from a speciation event to the formation of current species is identical for all paths (wrong assumption in reality).

26
26 Molecular Clock 1 2 3 4 UPGMA 2341 UPGMA constructs trees that satisfy a molecular clock, even if the true tree does not satisfy a molecular clock.

27
27 Restrictive Correctness of UPGMA Proposition: If the distance function is derived by adding edge distances in a tree T with a molecular clock, then UPGMA will reconstruct T. Proof idea: Move a horizontal line from the bottom of the T to the top. Whenever an internal node is formed, the algorithm will create it.

28
28 Additivity Molecular clock defines additive distances, namely, distances between objects can be realized by a tree: a b c i j k

29
29 Basic property of Additivity u Suppose input distances are additive u For any three leaves u Thus a b c i j m k

30
30 Constructing additive trees: The neighbor finding problem u Can we use this fact to construct trees assuming only additivity (but not a molecular clock)? Yes. The formula shows that if we knew that i and j are neighboring leaves, then we can construct their parent node k and compute the distances of k to all other leaves m. We remove nodes i,j and add k.

31
31 Neighbor Finding How can we find from distances alone that a pair of nodes i,j are neighboring leaves? Closest nodes aren’t necessarily neighbors. A B C D Next we show one way to find neighbors from additive distances.

32
32 Neighbor Finding Theorem (Saitou&Nei) Assume all edge weights are positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree. i j kl m T1T1 T2T2

33
33 Neighbor Joining Algorithm Set L to contain all leaves Iteration: Choose i,j such that D(i,j) is minimal Create new node k, and set remove i,j from L, and add k Terminate: when |L| =2, connect two remaining nodes i j k m

34
34 Neighbor Finding Notations used in the proof p(i,j) = the path from vertex i to vertex j; P(D,C) = (e 1,e 2,e 3 ) = (D,E,F,C) A B C D e1e1 e3e3 e2e2 For a vertex i, and an edge e=(i’,j’): N i (e) = |{k : e is on p(i,k)}|. N D (e 1 ) = 3, N D (e 2 ) = 2, N D (e 3 ) = 1 N C (e 1 ) = 1 EF

35
35 Neighbor Finding i j kl Rest of T Lemma: For leaves i,j connected by a path (i,l,…,k,j), Notation: For e=(i,m), we denote d(i,m) by d(e).

36
36 Neighbor Finding Proof of Theorem: Assume by contradiction that D(i,j) is minimal for i,j which are not neighboring leaves. Let (i,l,...,k,j) be the path from i to j. Let T 1 and T 2 be the subtrees rooted at l and k. Let |T| denote the number of leaves in T. i j kl T1T1 T2T2

37
37 Neighbor Finding Case 1: i or j has a neighboring leaf. WLOG j and m are such leaves. A. D(i,j) - D(m,j)=(L-2)(d(i,j) - d(j,m) ) – (r i +r j ) + r m + r j {Definition} =(L-2)(d(i,k)-d(k,m) )+r m -r i {Figure} i j kl m T2T2 B. r m -r i ≥ (L-2)(d(k,m)-d(i,l)) + (4-L)d(k,l) {Lemma+Figure} (since for each edge e P(k,l), N m (e)≥2 and N i (e) L-2, so N m (e)- N i (e ) ≥ 4-L ) Substituting B in A: D(i,j) - D(m,j) ≥ (L-2)(d(i,k)-d(i,l))+ (4-L)d(k,l) = 2d(k,l) > 0, contradicting the minimality assumption.

38
38 Neighbor Finding Case 2: Not case 1. Then both T 1 and T 2 contain 2 neighboring leaves. We show that if D(i,j) is minimal, then we must have both |T 1 | > |T 2 | and |T 2 | > |T 1 | - which is a contradiction, hence D(i,j) is not minimal. i j k l m n p T1T1 T2T2 We prove that |T 1 | > |T 2 | by assuming that |T 1 | ≤ |T 2 | and reaching a contradiction. The proof that |T 2 | > |T 1 | is similar. Let n,m be neighboring leaves in T 1.

39
39 Neighbor Finding i j k l m n p T1T1 T2T2 A. 0 ≤ D(m,n) - D(i,j)= (L-2)(d(m,n) - d(i,j) ) + (r i +r j ) – (r m +r n ) C. r i -r n < (L-2)(d(i,k) – d(n,p)) + (|T 1 |-|T 2 |)d(l,p) Adding B and C, noting that d(l,p)>d(k,p) and using the assumption |T 1 | - |T 2 | ≤ 0: D. (r i +r j ) – (r m +r n ) < (L-2)(d(i,j)-d(n,m)) + 2(|T 1 |-|T 2 |)d(k,p) Substituting D in the right hand side of A: 0 ≤ D(m,n) - D(i,j)< 2(|T 1 |-|T 2 |)d(k,p), hence |T 1 |-|T 2 | > 0, a contradiction. B. r j -r m < (L-2)(d(j,k) – d(m,p)) + (|T 1 |-|T 2 |)d(k,p) (Because N j (e)- N m (e ) < |T 1 |-|T 2 |).

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google