Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.

Similar presentations


Presentation on theme: "Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships."— Presentation transcript:

1 Phylogenetic Analysis

2 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships shown as branches of a tree Length and nesting reflects degree of similarity between any two items (in our case, sequences)

3 3 Phylogenetics and Cladistics Clade = a set of descendants from a single ancestor (Greek work for branch) Three basic assumptions –Any group of organisms are related b descent from a common ancestor –There is a bifurcating pattern of cladogenesis –Change in characteristics occurs in lineages over time

4 4 More default assumptions 1.Correct sequences and origins 2.Shared ancestral origin 3.Homologous sequences 4.No mixtures of nuclear and organellar sequences 5.Large enough taxa sampling size 6.Contains representative sequence variations 7.Sufficient sequence variations

5 5 Basic Terminology Clades: a group of organisms or genes that includes the most recent common ancestor of all of its members and all of the descendants of that most recent common ancestor. Taxons: any named group of organisms; not necessary a clade. Branches: branches sometimes correspond to the degree of divergence Nodes: a bifurcating branch point Branch lengths are not significant Branch lengths are significant

6 6 Basic Definition Homologous: sequences that share an arbitrary threshold level of similarity determined by alignment of matching bases Similarity: a quantifiable term that refers to a degree of relatedness between sequences, but does not necessarily reflect ancestry. Orthologs: homologs produced by speciation; derived from a common ancestor; tend to have similar function Paralogs: homologs produced by gene duplication; derived within an organism, tend to have differing functions Xenologs: homologs resulting from horizontal gene transfer between two organisms; difficult to verify; variable function but tends to be similar.

7 7 Phylogenetic Analysis Overview Objective: –determine branch length and to figure out how the tree should be drawn –Sequences most closely related drawn as neighboring branches

8 8 Phylogenetic Analysis Overview Dependent upon good multiple sequence alignment programs Group sequences with similar patterns of substitutions in order to reconstruct a phylogenetic tree

9 9 Phylogenetic Analysis Overview Consider two sequences that are related –Ancestoral sequence can be (partially) derived –With additional sequences, more information can be gathered to add to a correct derivation

10 10 Phylogenetic Analysis Overview Example: C-Terminal Motor Kinesin sequences –http://www.proweb.org/kinesin/BE4_Cterm.htmlhttp://www.proweb.org/kinesin/BE4_Cterm.html

11 11 Practical use of phylogenetic analysis To prioritize the analysis of genes in the target family – give insight into protein functions

12 12 P. asruginosa, a bacteria that is one of the top 3 causes or opportunistic infections, is noted for its antimicrobial resistance and resistance to detergents. 3 homologous outer membrane proteins, OprJ, OprM and OprN were identified as playing a role in this antimicrobial resistance.

13 13 Figure 14.2 Example of a phylogenetic tree based on genes that does not match organismal phylogeny, suggesting horizontal gene transfer has occurred. Possible horizontal gene transfer

14 14 Uses of Phylogenetic Analysis Given a set of genes, determine which genes are likely to have equivalent functions Follow changes occurring in a rapidly changing species such as a virus –Example: influenza –Study of rapidly changing genes –Next year’s strain can be predicted –Flu vaccination can be developed

15 15 UCMP GlossaryUCMP Glossary: Phylogenetics

16 16 Tree of Life Phylogenies study how the evolution of species has occurred Image: http://microbialgenome.org/primer/tree.htmlhttp://microbialgenome.org/primer/tree.html

17 17 Tree of Life Traditionally, morphological (visible features) characters have been used to classify organisms –Living organisms –Fossil records Sequence data beginning to take larger role

18 18 Tree of Life Many different resources including: –NCBI taxonomy web sites –University of Arizona’s tree of life project

19 19 NCBI Taxonomy Web Site http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/ 分類法 ; 分類學

20 20 Tree of Life http://tolweb.org/tree/phylogeny.html

21 21 Evolutionary Trees Two dimensional graph showing evolutionary relationship among a set of items can be organisms, genes, or sequences Each unit is defined by a distinct branch on the tree

22 22 Evolutionary Trees leaves represent the units (taxa) being studied nodes and branches representing the relationships among the taxa Two taxa derived from the same common ancestor will share a node in the graph

23 23 Evolutionary Trees length of each branch may be drawn according to the number of sequence level changes that occurred distance may not be in direct relation to evolutionary time uniform rate of mutation analyses use the molecular clock hypothesis

24 24 Rooted Trees One sequence (root) defined to be common ancestor of all of the other sequences A unique path leads from the root node to any other node Direction of path indicates evolutionary time Root chosen as a sequence thought to have branched off earliest

25 25 Rooted Trees If molecular clock hypothesis holds, it is possible to predict a rootmolecular clock hypothesis As the number of sequences increase, the number of possible rooted trees increases very rapidly In most cases, a bifurcating binary tree is the best model to simulate evolutionary events

26 26 Example Rooted Tree SYSTEMATICS AND MOLECULAR PHYLOGENETICS Image source: http://www.ncbi.nlm.nih.gov/About/primer/phylo.htmlhttp://www.ncbi.nlm.nih.gov/About/primer/phylo.html

27 27 Unrooted Tree (Star) Indicates evolutionary relationship without revealing the location of the oldest ancestry Fewer possible unrooted trees than a rooted tree

28 28 Example Unrooted Tree Image source: http://www.shef.ac.uk/english/language/quantling/images/quantling1.jpg http://www.shef.ac.uk/english/language/quantling/images/quantling1.jpg

29 29 Image: http://www.ncbi.nlm.nih.gov/About/primer/phylo.html http://www.ncbi.nlm.nih.gov/About/primer/phylo.html

30 30 Methods for Determining Trees Three main methods: –maximum parsimony –Distance –maximum likelihood

31 31 Maximum Parsimony Predicts evolutionary tree minimizing number of steps required to generate observed variation Multiple sequence alignment must first be obtained

32 32 Maximum Parsimony For each position, phylogenetic trees requiring the smallest number of evolutionary changes to produce the observed sequence changes are identified Trees that produce the smallest number of changes for all sequence positions are identified

33 33 Maximum Parsimony Time consuming algorithm Only works well if the sequences have a strong sequence similarity

34 34 Maximum Parsimony Example 1 A A G A G T G C A 2 A G C C G T G C G 3 A G A T A T C C A 4 A G A G A T C C G four sequences, three possible unrooted trees

35 35 Maximum Parsimony Example Possible Trees: 1 24 3 1 34 2 1 42 3

36 36 Maximum Parsimony Example Some sites are informative, and other sites are not Informative site has the same sequence character in at least two different sequences Only the informative sites need to be considered

37 37 1 A A G A G T G C A 2 A G C C G T G C G 3 A G A T A T C C A 4 A G A G A T C C G Three informative columns Maximum Parsimony Example

38 38 Maximum Parsimony Example 1 G G A 2 G G G 3 A C A 4 A C G 1 24 3 1 34 2 1 42 3 1 24 3 1 34 2 1 42 3 Column 1 Column 2 Column 3 1 24 3 1 34 2 1 42 3 Is a substitution

39 39 Distance Method Looks at the number of changes between each pair in a group of sequences Goal is to identify a tree that positions neighbors correctly and that also has branch lengths which reproduce the original data as closely as possible

40 40 Distance Method CLUSTALW uses the neighbor-joining method as a guide to multiple sequence alignments PHYLIP suite of programs employ neighbor- joining methods –http://evolution.genetics.washington.edu/phylip.htmlhttp://evolution.genetics.washington.edu/phylip.html

41 41 Distance Programs in Phylip NEIGHBOR: estimates phylogenies using either: –neighbor-joining (no molecular clock assumed) –unweighted pair group method with arithmetic mean (UPGMA) (molecular clock assumed)

42 42 Distance Analysis distance score counted as –number of mismatched positions in the alignment –number of sequence positions that must be changed to generate the second sequence Success depends on degree the distances among a set of sequences can be made additive on a predicted evolutionary tree

43 43 Example of Distance Analysis Consider the alignment: A ACGCGTTGGGCGATGGCAAC B ACGCGTTGGGCGACGGTAAT C ACGCATTGAATGATGATAAT D ACACATTGAGTGATAATAAT

44 44 Example of Distance Analysis Distances can be shown as a table A ACGCGTTGGGCGATGGCAAC B ACGCGTTGGGCGACGGTAAT C ACGCATTGAATGATGATAAT D ACACATTGAGTGATAATAAT

45 45 Example of Distance Analysis Using this information, a tree can be drawn: A ACGCGTTGGGCGATGGCAAC B ACGCGTTGGGCGACGGTAAT C ACGCATTGAATGATGATAAT D ACACATTGAGTGATAATAAT C D A B 4 1 2 2 1

46 46 Fitch and Margoliash Algorithm (3 sequences) Distance table used Sequences combined in threes –define the branches of the predicted tree –calculate the branch lengths of the tree

47 47 Fitch and Margoliash Algorithm (3 sequences) 1) Draw unrooted tree with three branches originating from common node: C c b a B A

48 48 Fitch and Margoliash Algorithm (3 sequences) 1) Calculate lengths of tree branches algebraically: distance from A to B = a + b = 22 (1) distance from A to C = a + c = 39 (2) distance from B to C = b + c = 41 (3) subtracting (3) from (2) yields: b + c = 41 -a – c = -39 __________ b – a = 2 (4) adding (1) and (4) yields 2b = 24; b = 12 so a + 12 = 22; a = 10 10 + c = 39; c = 29

49 49 Fitch and Margoliash Algorithm (3 sequences) 3) Resulting tree: C 29 12 10 B A

50 50 Fitch and Margoliash Algorithm (5 sequences) Algorithm can be extended to more sequences. Consider the distances: A B C D E a b d c e f g

51 51 Summary of Fitch-Margoliash 1) Find the mostly closely related pairs of sequences (A, B). 2) Treat the rest of the sequences as a composite. Calculate the average distance from A to all others; and from B to all others. 3) Use these values to calculate the length of the edges a and b.

52 52 Summary of Fitch-Margoliash 4) Treat A and B as a composite. Calculate the average distances between AB and each of the other sequences. Create a new distance table. 5) Identify next pair of related sequences and begin as with step 1. 6) Subtract extended branch lengths to calculate lengths of intermediate branches.

53 53 Summary of Fitch-Margoliash 7) Repeat the entire process with all possible pairs of sequences. 8) Calculate predicted distances between each pair of sequences for each tree to find the best tree.

54 54 Neighbor Joining Similar to Fitch-Margoliash Sequences chosen to give best least- squares estimate of branch length

55 55 Maximum Likelihood Calculates likelihood of a tree given an alignment Trees with least number of changes will be most likely

56 56 Maximum Likelihood (ML) Probability of each tree is product of mutation rates in each branch Likelihoods given by each column multiplied to give the likelihood of the tree

57 57 Maximum Likelihood (ML) Disadvantages: –Computationally intensive –Can only be done for a handful of sequences

58 58 Which Method to Choose? Depends upon the sequences that are being compared –strong sequence similarity: maximum parsimony –clearly recognizable sequence similarity distance methods –All others: maximum likelihood

59 59 Distance, Parsiomony and ML Distance matrix: simply count the number of differences between two sequences. Maximum Parsimony: search for a tree that requires the smallest number of changes to explain the differences observed among the taxa. ML: evaluates the probability that the chosen evolutionary model has generated the observed data. A simple model is that changes between all nucleotides (or amino acids) are equally probable. The probability for all possible reconstructions are summed up to yield the likelihood for one particular site. The likelihood for the tree is the product of the likelihoods for all alignment positions in the dataset.

60 60 Which Method to Choose? Best to choose at least two approaches Compare the results – if they are similar, you can have more confidence

61 61 Difficulties With Phylogenetic Analysis Horizontal or lateral transfer of genetic material (for instance through viruses) makes it difficult to determine phylogenetic origin of some evolutionary events. Genes selective pressure can be rapidly evolving, masking earlier changes that had occurred phylogenetically.

62 62 Difficulties With Phylogenetic Analysis Two sites within comparative sequences may be evolving at different rates. Re-arrangements of genetic material can lead to false conclusions. Duplicated genes can evolve along separate pathways, leading to different functions

63 63 Here are some 264 of the phylogeny packages, and 30 free servers

64 64 Exercise Multiple Sequence Alignment –Sequence Alignment: CLUSTALWSequence Alignment –Sample sequences: found on E-learning system

65 65 Explanation on the parameters

66 66 Exercise


Download ppt "Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships."

Similar presentations


Ads by Google