Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E.

Similar presentations


Presentation on theme: "An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E."— Presentation transcript:

1 An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E. Weisstein Indiana State University March 11-14, 2004

2 Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

3 What is phylogenetics? Phylogenetics is the study of evolutionary relationships. Relationships among species: crocodiles birds lizards snakes rodents primates marsupials

4 What is phylogenetics? Relationships among species: crocodiles birds lizards snakes rodents primates marsupials This is an example of a phylogenetic tree.

5 What is phylogenetics? Relationships within species: HIV subtypes Rwanda Ivory Coast Uganda U.S. Italy U.K. India Rwanda Ethiopia S. Africa Uganda Tanzania Romania BrazilCameroon Netherlands Taiwan Russia A B C D F G

6 So what is phylogenetics good for? Phylogenetics has direct applications to: Conservation: test wood, ivory, meat products for poaching Agriculture: analyze specific differences between cultivars Forensics: DNA fingerprinting Medicine: determine specific biochemical function of cancer-causing genes

7 1990 case: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist? HIV Example 1: Florida dentist case

8 Studied 15 recently infected patients over a four-year period Some patients stayed healthy; others quickly developed AIDS Markham et al., 1998 HIV Example 2: Viral evolution within patients Questions: Are differences in patient health due to infection with different viral strains? Does the virus evolve the same way in healthy patients vs. those who fall ill?

9 Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

10 Phylogenetic concepts: Interpreting a Phylogeny Sequence A Sequence B Sequence C Sequence D Sequence E Time Which sequence is most closely related to B? A, because B diverged from A more recently than from any other sequence. Physical position in tree is not meaningful! Only tree structure matters.

11 Phylogenetic concepts: Rooted and Unrooted Trees Time A B C D Root = A B C D X = ? A B C D ? ?? ?? X

12 How Many Trees? Unrooted treesRooted trees # sequences # pairwise distances# trees # branches /tree# trees # branches /tree 331334 4635156 5101571058 615105994510 452,027,0251734,459,42518 30435 8.69  10 36 57 4.95  10 38 58 NN (N - 1) 2 (2N - 5)! 2 N - 3 (N - 3)! 2N - 3(2N - 3)! 2 N - 2 (N - 2)! 2N - 2

13 Tree Types Root 50 million years sharks seahorses frogs owls crocodiles armadillos bats Evolutionary trees measure time. Root sharks seahorses frogs owls crocodiles armadillos bats 5% change Phylograms measure change.

14 Tree Properties Root Ultrametricity All tips are an equal distance from the root. X Y a b c d e a = b + c + d + e Root Additivity Distance between any two tips equals the total branch length between them. X Y a b c d e XY = a + b + c + d + e In simple scenarios, evolutionary trees are ultrametric and phylograms are additive.

15 Tree Building Exercise Ultrametricity All tips are an equal distance from the root. Root X Y a b c d e a = b + c + d + e Using the distance matrix given, construct an ultrametric tree.

16 Phylogenetic Methods Neighbor-joining Minimizes distance between nearest neighbors Maximum parsimony Minimizes total evolutionary change Maximum likelihood Maximizes likelihood of observed data Many different procedures exist. Three of the most popular:

17 Comparison of Methods Neighbor-joiningMaximum parsimonyMaximum likelihood Uses only pairwise distances Uses only shared derived characters Uses all data Minimizes distance between nearest neighbors Minimizes total distance Maximizes tree likelihood given specific parameter values Very fastSlowVery slow Easily trapped in local optima Assumptions fail when evolution is rapid Highly dependent on assumed evolution model Good for generating tentative tree, or choosing among multiple trees Best option when tractable (<30 taxa, homoplasy rare) Good for very small data sets and for testing trees built using other methods

18 Which procedure should we use? Neighbor- joining Maximum parsimony Maximum likelihood All that we can! ? Each method has its own strengths Use multiple methods for cross-validation In some cases, none of the three gives the correct phylogeny!

19 Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

20 Phylogenetic concepts: Homology and Homoplasy Homology: identical character due to shared ancestry (evolutionary signal) Homoplasy: identical character due to evolutionary convergence or reversal (evolutionary noise) lizards snakes rodents primates +hair Homology Homoplasy (Convergence) birds snakes rodents bats +flight Homoplasy (Reversal) worms lizards snakes +legs –legs

21 Watching the Molecular Clock Mutation occurs as a random (Poisson) process. If mutations accumulate at a constant rate over time and across all branches, the phylogeny is said to obey a molecular clock. % genetic difference 2001 2002 2001 2002 2000

22 Watching the Molecular Clock Mutation occurs as a random (Poisson) process. If mutations accumulate at a constant rate over time and across all branches, the phylogeny is said to obey a molecular clock. % genetic difference BUT: Natural selection favors some mutations and eliminates others Selection varies over time and across lineages 2000 2001 2002 2001 2002

23 Molecular Clocks for various genes % genetic divergence Time since divergence (Myr) 100% 50% 75% 25% 15003006009001200 Fibrinopeptides Hemoglobin Cytochrome c Histone IV

24 Molecular Clocks for various genes % genetic divergence Time since divergence (Myr) 100% 50% 75% 25% 15003006009001200 Fibrinopeptides Hemoglobin Cytochrome c Histone IV Why such different profiles? Variation in mutation rate? Variation in selection. Genes coding for some molecules under very strong stabilizing selection.

25 How do different patterns of selection affect phylogenies? Selection for specific mutations Time No selection (null hypothesis) Time Selection for overall diversity Time

26 Trees are hypotheses about evolutionary history So far, we’ve looked at understanding and formulating these hypotheses. Now, let’s turn our attention to testing them.

27 Tree Testing: Split Decomposition Split decomposition is one method for testing a tree. A B C D A D B C A C B D Under this procedure, we choose exactly four taxa (A, B, C, D) and examine the topologies of all possible unrooted trees. How many such trees are there? Only one of these topologies is right. How can we quantitatively assess the support for each tree?

28 Tree Testing: Split Decomposition The correct tree should be approximately additive; the others usually will not. For each tree, we calculate split indices that estimate the length of the internal branch: + A D B C + A C B D – 2 Large split indices  Long internal branch  Topology strongly supported Small split indices  Short internal branch  Topology weakly supported Negative split indices  Biologically impossible  Topology probably wrong = if A C B D is the right phylogeny!

29 Tree Testing: Bootstrapping Used to assess the support for individual branches Randomly resample characters, with replacement How often does a specific branch appear? Repeat many times (1000 or more) rat human turtle fruit fly oak duckweed 100 98 73

30 Tree Testing: Bootstrapping MacClade Example: Vertebrate evolution

31 Tree Testing Exercise Test this hypothesis by calculating split indices for each possible topology. W X Y Z Split index #1: (WY+XZ) – (WX+YZ) 2 Split index #2: (WZ+XY) – (WX+YZ) 2 In the distance matrix given, which two taxa appear most closely related?

32 Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

33 HIV Example 1: Florida dentist case 1990 case: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist? HIV evolves so fast that transmission patterns can be reconstructed from viral sequence (molecular forensics). Compared viral sequence from the dentist, three of his HIV+ patients, and two HIV+ local controls.

34 Florida dentist case

35 So what do the results mean? 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses? Do we have enough data to be confident in our conclusions? What additional data would help? If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?

36 Markham et al., 1998 HIV Example 2: Viral evolution within patients Questions: Are differences in patient health due to infection with different viral strains? Does the virus evolve the same way in healthy patients vs. those who fall ill?

37 Results Viruses from patients who stayed healthy vs. those who fell ill did not form distinct evolutionary clusters. Each patient’s viruses formed a distinct evolutionary cluster, except for two. Viruses in patients who fell ill evolved faster than viruses in patients who stayed healthy. At many visits, the virus population seemed to have evolved from viruses not observed at the previous visit — the so-called “Lazarus effect.”

38 Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications V. Conclusions

39 Conclusions Phylogenetics is crucial to understanding many biological questions, including HIV transmission and prognosis. Phylogenetics draws heavily on subjects (graph theory, stochastic processes, complex algorithms) unfamiliar to many biologists. Need math/CS collaboration! Phylogenetic results require lots of interpretation. Seldom get a definitive answer. Phylogenetics has transformed our understanding and practice of biology.


Download ppt "An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E."

Similar presentations


Ads by Google