Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogenetic Analysis

Similar presentations


Presentation on theme: "Phylogenetic Analysis"— Presentation transcript:

1 Phylogenetic Analysis

2 Motivation Approaches
The problem of explaining the evolutionary history of today's species How do species relate to one another in terms of common ancestors Nucleic acids and Proteins also evolve Approaches Fossil Records , Phylogenetic Trees

3 General comments on phylogenetics
Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure of evolutionary relatedness: e.g., morphological features Phylogenetics on sequence data is an attempt to reconstruct the evolutionary history of those sequences Relationships between individual sequences are not necessarily the same as those between the organisms they are found in The ultimate goal is to be able to use sequence data from many sequences to give information about phylogenetic history of organisms Phylogenetic relationships usually depicted as trees, with branches representing ancestors of “children”; the bottom of the tree (individual organisms) are leaves. Individual branch points are nodes.

4 What is phylogenetic analysis and why should we perform it?
Phylogenetic analysis has two major components: 1. Phylogenetic inference or “tree building” — the inference of the branching orders, and ultimately the evolutionary relationships, between “taxa” (entities such as genes, populations, species, etc.) 2. Character and rate analysis — using phylogenies as analytical frameworks for rigorous understanding of the evolution of various traits or conditions of interest

5 Examine the process of evolution What drives evolution?
Understanding mutation, gene flow and natural selection Examine the history of evolution What has evolution done in the past? Understanding how living organisms are related and how they have changed over time Aim The ultimate goal is to be able to use sequence data from many sequences to give information about phylogenetic history of organisms To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species, etc.) and to study the reliability of the consensus tree. Phylogenetic relationships usually depicted as trees, with branches representing ancestors of “children”; the bottom of the tree (individual organisms) are leaves. Individual branch points are nodes.

6 Divergence Points (represent hypothetical ancestors of the taxa)
Common Phylogenetic Tree Terminology Terminal Nodes Branches or Lineages A Represent the TAXA (genes, populations, species, etc.) used to infer the phylogeny B C D Ancestral Node or ROOT of the Tree E Internal Nodes or Divergence Points (represent hypothetical ancestors of the taxa)

7 Parts of a Phylogenetic Tree
Branch Node Root Ingroup Outgroup

8 Phylogenetic trees diagram the evolutionary
relationships between the taxa Taxon A Taxon B Taxon C Taxon E Taxon D No meaning to the spacing between the taxa, or to the order in which they appear from top to bottom. This dimension either can have no scale (for ‘cladograms’), can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportional to time (for ‘ultrametric trees’ or true evolutionary trees). ((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses These say that B and C are more closely related to each other than either is to A, and that A, B, and C form a clade that is a sister group to the clade composed of D and E. If the tree has a time scale, then D and E are the most closely related.

9 In Phylogenetic trees Leaves represent present day species
Interior nodes represent hypothesized ancestors We will only consider binary trees: edges split only into two branches (daughter edges) Rooted trees have an explicit ancestor; the direction of time is explicit in these trees Unrooted trees do not have an explicit ancestor; the direction of time is undetermined in such trees

10 Which species are the closest living relatives of modern humans?
A few examples of what can be inferred from phylogenetic trees built from DNA or protein sequence data: Which species are the closest living relatives of modern humans? What were the origins of specific transposable elements? Plus countless others…..

11 Input data for Phylogenetic Reconstruction
Distance Matrix Character State Matrix

12 Types of phylogenetic analysis methods
Phenetic: trees are constructed based on observed characteristics, not on evolutionary history Cladistic: trees are constructed based on fitting observed characteristics to some model of evolutionary history Distance methods Parsimony and Maximum Likelihood methods

13 Distance methods Another way to say this is that there are a set of distances dij between each pair of sequences i,j in the dataset. dij can be the fraction f of sites u where residues xi and xj differ; or dij can be such a fraction but weighted in some way (e.g. Jukes-Cantor distance)

14 Parsimony methods Parsimony methods are based on the idea that the most probable evolutionary pathway is the one that requires the smallest number of changes from some ancestral state For sequences, this implies treating each position separately and finding the minimal number of substitutions at each position Parsimony methods assign a cost to each tree available to the dataset, then screen trees available to the dataset and select the most parsimonious Screening all the trees available to even a smallish dataset would take too much time; branch and bound method builds trees with increasing numbers of leaves but abandons the topology whenever the current tree has a bigger cost than any complete tree

15 Example of parsimonious tree building
Tree on left requires only one change, tree on right requires two: left tree is most parsimonious

16 Character State Matrix
A character has a finite number of states Taxonomical units for which we want to create phylogeny are called Objects e.g. species, population Every object has a state vector & inherit the same characters but not the same states!

17 Character State Matrix M
M has n rows (Objects) M has m columns (characters) Mij denotes the state object i has for character j

18 Which species are the closest living relatives of modern humans?
Gorillas Chimpanzees Chimpanzees Bonobos Bonobos Gorillas Orangutans Orangutans Humans 14 15-30 MYA MYA Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization all show that bonobos and chimpanzees are related more closely to humans than either are to gorillas. The pre-molecular view was that the great apes (chimpanzees, gorillas and orangutans) formed a clade separate from humans, and that humans diverged from the apes at least MYA.

19 A few examples of what can be learned from character analysis using phylogenies as analytical frameworks: When did specific episodes of positive Darwinian selection occur during evolutionary history? Which genetic changes are unique to the human lineage? What was the most likely geographical location of the common ancestor of the African apes and humans? Plus countless others…..

20 The number of unrooted trees increases in a greater than exponential manner with number of taxa
(2N - 5)!! = # unrooted trees for N taxa

21 Inferring evolutionary relationships between the taxa requires rooting the tree:
C Root D To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: Unrooted tree A B C D Root Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D. Rooted tree

22 Now, try it again with the root at another position:
B C Root Unrooted tree D A A B C D Rooted tree Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D. Root

23 An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees Rooted tree 1b A B C D 2 A Rooted tree 1d C D A B 4 C Rooted tree 1a B A C D 1 The unrooted tree 1: Rooted tree 1e D C A B 5 Rooted tree 1c A B C D 3 B D These trees show five different evolutionary relationships among the taxa!

24 There are two major ways to root trees:
By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of interest (the “ingroup”). Requires some prior knowledge about the relationships among the taxa. The outgroup can either be species (e.g., birds to root a mammalian tree) or previous gene duplicates (e.g., a-globins to root b-globins). outgroup By midpoint or distance: Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths. Assumes that the taxa are evolving in a clock-like manner. This assumption is built into some of the distance-based tree building methods. A d (A,D) = = 18 Midpoint = 18 / 2 = 9 10 C 3 2 B 2 5 D

25 Each unrooted tree theoretically can be rooted anywhere along any of its branches
x = C A B D E F (2N - 3)!! = # unrooted trees for N taxa

26 Molecular phylogenetic tree building methods:
Are mathematical and/or statistical methods for inferring the divergence order of taxa, as well as the lengths of the branches that connect them. There are many phylogenetic methods available today, each having strengths and weaknesses. Most can be classified as follows: COMPUTATIONAL METHOD Clustering algorithm Optimality criterion DATA TYPE Characters Distances PARSIMONY MAXIMUM LIKELIHOOD UPGMA NEIGHBOR-JOINING MINIMUM EVOLUTION LEAST SQUARES

27 Types of data used in phylogenetic inference:
Character-based methods: Use the aligned characters, such as DNA or protein sequences, directly during tree inference. Taxa Characters Species A ATGGCTATTCTTATAGTACG Species B ATCGCTAGTCTTATATTACA Species C TTCACTAGACCTGTGGTCCA Species D TTGACCAGACCTGTGGTCCG Species E TTGACCAGTTCTCTAGTTCG Distance-based methods: Transform the sequence data into pairwise distances (dissimilarities), and then use the matrix during tree building. A B C D E Species A Species B Species C Species D Species E Example 1: Uncorrected “p” distance (=observed percent sequence difference) Example 2: Kimura 2-parameter distance (estimate of the true number of substitutions between taxa)

28 Computational methods for finding optimal trees:
Exact algorithms: "Guarantee" to find the optimal or "best" tree for the method of choice. Two types used in tree building: Exhaustive search: Evaluates all possible unrooted trees, choosing the one with the best score for the method. Branch-and-bound search: Eliminates the parts of the search tree that only contain suboptimal solutions. Heuristic algorithms: Approximate or “quick-and-dirty” methods that attempt to find the optimal tree for the method of choice, but cannot guarantee to do so. Heuristic searches often operate by “hill-climbing” methods.

29 Exact searches become increasingly difficult, and
eventually impossible, as the number of taxa increases: A B C C A B D A D B E C A D B E C F (2N - 5)!! = # unrooted trees for N taxa

30 Classification of phylogenetic inference methods
COMPUTATIONAL METHOD Optimality criterion Clustering algorithm PARSIMONY MAXIMUM LIKELIHOOD Characters DATA TYPE MINIMUM EVOLUTION LEAST SQUARES UPGMA NEIGHBOR-JOINING Distances

31 Parsimony methods: Optimality criterion: The ‘most-parsimonious’ tree is the one that requires the fewest number of evolutionary events (e.g., nucleotide substitutions, amino acid replacements) to explain the sequences. Advantages: Are simple, intuitive, and logical (many possible by ‘pencil-and-paper’). Can be used on molecular and non-molecular (e.g., morphological) data. Can tease apart types of similarity (shared-derived, shared-ancestral, homoplasy) Can be used for character (can infer the exact substitutions) and rate analysis. Can be used to infer the sequences of the extinct (hypothetical) ancestors. Disadvantages: Are simple, intuitive, and logical (derived from “Medieval logic”, not statistics!) Can be fooled by high levels of homoplasy (‘same’ events). Can become positively misleading in the “Felsenstein Zone”: [See Stewart (1993) for a simple explanation of parsimony analysis, and Swofford et al. (1996) for a detailed explanation of various parsimony methods.]

32 Bootstrapping Evaluation of the tree reliability
n number of trees are built (n=100/1000/5000) How many times a certain branch is reproduced Values between (%) if the assumptions the method is based on hold, you should always get the same tree from the bootstrapped alignments as you did originally The frequency of some feature of your phylogeny in the bootstrapped set gives some measure of the confidence you can have for this feature

33 Parsimony methods Parsimony methods are based on the idea that the most probable evolutionary pathway is the one that requires the smallest number of changes from some ancestral state For sequences, this implies treating each position separately and finding the minimal number of substitutions at each position

34 Example of parsimonious tree building
Tree on left requires only one change, tree on left requires two: left tree is most parsimonious

35 Parsimony methods assign a cost to each tree available to the dataset, then screen trees available to the dataset and select the most parsimonious Screening all the trees available to even a smallish dataset would take too much time; branch and bound method builds trees with increasing numbers of leaves but abandons the topology whenever the current tree has a bigger cost than any complete tree

36 Phylogeny in medical forensics: HIV
A dentist who was infected with HIV was suspected of infecting some of his patients in the course of treatment HIV evolves very quickly (10-3 substitutions/year) Possible to trace the history of infections among individuals by conducting a phylogenetic analysis of HIV sequences Samples were taken from dentist, patients, and other infected individuals in the community Study found 5 patients had been infected by the dentist Source: Ou et. al Molecular epidemiology of HIV transmission in a dental practice. Science, 256:

37 Did the Florida Dentist infect his patients with HIV?
Phylogenetic tree of HIV sequences from the DENTIST, his Patients, & Local HIV-infected People: Patient C Patient A Patient G Yes: The HIV sequences from these patients fall within the clade of HIV sequences found in the dentist. Patient B Patient E Patient A DENTIST Local control 2 Local control 3 No Patient F Local control 9 Local control 35 Local control 3 No Patient D From Ou et al. (1992) and Page & Holmes (1998)


Download ppt "Phylogenetic Analysis"

Similar presentations


Ads by Google