Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogenic trees..

Similar presentations


Presentation on theme: "Phylogenic trees.."— Presentation transcript:

1 Phylogenic trees.

2 Phylogenetic Inference
Taxonomy and phylogenetics Phylogenetic trees Cladistic versus phenetic analyses Homology and homoplasy Model of sequence evolution Tree building methods Phylogenetic networks Computer software and demos DNA/RNA overview

3 Classifying Organisms
Nomenclature is the science of naming organisms Names allow us to talk about groups of organisms. - Scientific names were originally descriptive phrases; not practical Binomial nomenclature Developed by Linnaeus, a Swedish naturalist Names are in Latin, formerly the language of science binomials - names consisting of two parts The generic name is a noun. The epithet is a descriptive adjective. Thus a species' name is two words e.g. Homo sapiens DNA/RNA overview Carolus Linnaeus ( )

4 Classifying Organisms
Taxonomy is the science of the classification of organisms Taxonomy deals with the naming and ordering of taxa. The Linnaean hierarchy: 1. Kingdom 2. Division 3. Class 4. Order 5. Family 6. Genus 7. Species The difference between classification and identification DNA/RNA overview Evolutionary distance

5 Classifying Organisms
Systematics is the science of the relationships of organisms Systematics is the science of how organisms are related and the evidence for those relationships Systematics is divided primarily into phylogenetics and taxonomy Speciation -- the origin of new species from previously existing ones - anagenesis - one species changes into another over time - cladogenesis - one species splits to make two DNA/RNA overview Reconstruct evolutionary history Phylogeny

6 Phylogenetics Phylogenetics is the science of the pattern of evolution. A. Evolutionary biology is the study of the processes that generate diversity, while phylogenetics is the study of the pattern of diversity produced by those processes. B. The central problem of phylogenetics: 1. How do we determine the relationships between species? 2. Use evidence from shared characteristics, not differences 3. Use homologies, not analogies 4. Use derived condition, not ancestral a. synapomorphy - shared derived characteristic b. plesiomorphy - ancestral characteristic C. Cladistics is phylogenetics based on synapomorphies. 1. Cladistic classification creates and names taxa based only on synapomorphies. 2. This is the principle of monophyly 3. monophyletic, paraphyletic, polyphyletic 4. Cladistics is now the preferred approach to phylogeny Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling DNA/RNA overview The phylogeny and classification of life as proposed by Haeckel (1866)

7 Phylogenetics Evolutionary theory states that groups of similar organisms are descended from a common ancestor. Phylogenetic systematics is a method of taxonomic classification based on their evolutionary history. It was developed by Hennig, a German entomologist, in 1950. DNA/RNA overview Willi Hennig ( )

8 Phylogenetics Who uses phylogenetics? Some examples:
Evolutionary biologists (e.g. reconstructing tree of life) Systematists (e.g. classification of groups) Anthropologists (e.g. origin of human populations) Forensics (e.g. transmission of HIV virus to a rape victim) Parasitologists (e.g. phylogeny of parasites, co-evolution) Epidemiologists (e.g. reconstruction of disease transmission) Genomics/Proteomics (e.g. homology comparison of new proteins) DNA/RNA overview

9 Phylogenetic trees The central problem of phylogenetics:
how do we determine the relationships between taxa? DNA/RNA overview in phylogenetic studies, the most convenient way of presenting evolutionary relationships among a group of organisms is the phylogenetic tree

10 Phylogenetic trees Node: a branchpoint in a tree (a presumed ancestral OTU) Branch: defines the relationship between the taxa in terms of descent and ancestry Topology: the branching patterns of the tree Branch length (scaled trees only): represents the number of changes that have occurred in the branch Root: the common ancestor of all taxa Clade: a group of two or more taxa or DNA sequences that includes both their common ancestor and all their descendents Operational Taxonomic Unit (OTU): taxonomic level of sampling selected by the user to be used in a study, such as individuals, populations, species, genera, or bacterial strains Branch DNA/RNA overview Node Clade Root

11 = Phylogenetic trees There are many ways of drawing a tree
DNA/RNA overview

12 = = = Phylogenetic trees There are many ways of drawing a tree
DNA/RNA overview

13 = / Phylogenetic trees There are many ways of drawing a tree
Bifurcation Trifurcation DNA/RNA overview Bifurcation versus Multifurcation (e.g. Trifurcation) Multifurcation (also called polytomy): a node in a tree that connects more than three branches. If the tree is rooted, then one of the branches represents an ancestral lineage and the remaining branches represent descendent lineages. A multifurcation may represent a lack of resolution because of too few data available for inferring the phylogeny (in which case it is said to be a soft multifurcation) or it may represent the hypothesized simultaneous splitting of several lineages (in which case it is said to be a hard multifurcation).

14 Phylogenetic trees Trees can be rooted or unrooted DNA/RNA overview

15 Summary Trees can be scaled or unscaled (with or without branch lengths) DNA/RNA overview

16 Phylogenetic trees Exercise: rooted/unrooted; scaled/unscaled A B C D
DNA/RNA overview D E F

17 Phylogenetic trees Possible evolutionary trees Taxa (n): 2 3 4
Unrooted/rooted 2 1/1 3 1/3 4 3/15 DNA/RNA overview

18 Phylogenetic trees Possible evolutionary trees Taxa (n) rooted
(2n-3)!/(2n-2(n-2)!) unrooted (2n-5)!/(2n-3(n-3)!) 2 1 3 4 15 5 105 6 954 7 10,395 8 135,135 9 2,027,025 10 34,459,425 DNA/RNA overview

19 Phylogenetic trees Rooting using outgroup(s)
the outgroup should be a taxon known to be less closely related to the rest of the taxa (ingroups) it should ideally be as closely related as possible to the rest of the taxa while still satisfying the above condition the root must be somewhere between the outgroup and the ingroup (either on a node or in a branch) DNA/RNA overview Note the outgroup is not the root or the ancestor itself!

20 Phylogenetics What are useful characters? Cactaceae and Euphorbiaceae
Use homologies, not analogies! Homology: common ancestry of two or more character states Analogy: similarity of character states not due to shared ancestry - Homoplasy: a collection of phenomena that leads to similarities in character states for reasons other than inheritance from a common ancestor (e.g. convergence, parallelism, reversal) Homoplasy is huge problem in morphology data sets! But in molecular data sets, too! DNA/RNA overview Cactaceae and Euphorbiaceae

21 Phylogenetics Molecular data and homoplasy: Orthologs vs. Paralogs
When comparing gene sequences, it is important to distinguish between identical vs. merely similar genes in different organisms Orthologs are homologous genes in different species with analogous functions Paralogs are similar genes that are the result of a gene duplication A phylogeny that includes both orthologs and paralogs is likely to be incorrect Sometimes phylogenetic analysis is the best way to determine if a new gene is an ortholog or paralog to other known genes DNA/RNA overview

22 Phylogenetic methods Cladistics versus Phenetics
Within the field of taxonomy there are two different methods and philosophies of building phylogenetic trees: cladistic and phenetic Both phenetic and cladistic methods rely on data (objective methods) evolution is descent with modification so the characteristics of organisms hold information about evolutionary relationships objective analysis of character variation is the foundation of modern phylogenetics Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data

23 Phenetics vs. cladistics
An example

24 Phenetics vs. cladistics
An example Three hypotheses:

25 Phenetics vs. cladistics
Phenetic (overall similarity) overall similarity

26 Phenetics vs. cladistics
Cladistics (shared derived characters) shared derived characters

27 Phenetics vs. cladistics
Difference between methods is more than academic: consider how different hypotheses might affect a search for natural products Phenetics Cladistics

28 Phenetics vs. cladistics
- Relies on character data - Faster algorithms - Popular for molecular evolution - Construct phenograms without recourse to history - Employ distance methods - Each character difference counted equally – large changes have large effects Cladistics - Relies on knowledge of ancestral relationships - Good for physical traits - Good for deeper levels of taxonomy - All assumptions difficult to satisfy for molecular data - Constructs cladogram considering possible evolutionary pathways - Must specify ancestral and derived sequences Cladistics is becoming the method of choice; it is considered to be more powerful and to provide more realistic estimates, however, it is slower than phenetic algorithms

29 Phylogenetics Genes vs. Species
Relationships calculated from sequence data represent the relationships between genes, this is not necessarily the same as relationships between species. Your sequence data may not have the same phylogenetic history as the species from which they were isolated Different genes evolve at different speeds, and there is always the possibility of horizontal gene transfer (hybridization, vector mediated DNA movement, or direct uptake of DNA). DNA/RNA overview

30 Phylogenetic Inference
After working with sequences for a while, one develops an intuitive understanding that for a given gene, closely related organisms have similar sequences and more distantly related organisms have more dissimilar sequences. These differences can be quantified. Given a set of gene sequences, it should be possible to reconstruct the evolutionary relationship among genes and among organisms. DNA/RNA overview

31 Phylogenetic Inference
Disclaimers Before describing any theoretical or practical aspects of phylogenetics, it is necessary to give some disclaimers. This area of computational biology is an intellectual minefield! Neither the theory nor the practical applications of any algorithms are universally accepted throughout the scientific community. The application of different software packages to a data set is very likely to give different answers; minor changes to a data set are also likely to profoundly change the result. DNA/RNA overview

32 Phylogenetic Inference
Which gene to use? Different genes will be best suited to solve different problems:  helix  sheet - the RNA genomes of HIV viruses change so quickly that every person infected carries a different strain - certain enzymes may evolve relatively fast to allow for phylogeographic studies of species distribution post-glaciation - mitochondrial DNA has a relatively fast substitution rate (evolves quickly) – can be used to establish relatively recent divergence - for establishing ‘deep phylogeny’ we need genes that change very slowly (highly conserved ones) - different sequences accumulate changes at different rates - chose level of variation that is appropriate to the group of organisms being studied. - proteins (or protein coding DNAs) are constrained by natural selection - some sequences are highly variable (rRNA spacer regions, immunoglobulin genes), while others are highly conserved (actin, rRNA coding regions) - different regions within a single gene can evolve at different rates (conserved vs. variable domains) DNA/RNA overview

33 Phylogenetic Inference I
Are there Correct trees??  helix  sheet Despite all of all problems, it is actually quite simple to use computer programs calculate phylogenetic trees for data sets Provided the data are clean, outgroups are correctly specified, appropriate algorithms are chosen, no assumptions are violated, etc., can the true, correct tree be found and proven to be scientifically valid? Unfortunately, it is impossible to ever conclusively state what is the "true" tree for a group of sequences (or a group of organisms); taxonomy is constantly under revision as new data is gathered DNA/RNA overview

34 Phylogenetics What are useful characters?
Use homologies, not analogies! - Homology: common ancestry of two or more character states Analogy: similarity of character states not due to shared ancestry Homoplasy: a collection of phenomena that leads to similarities in character states for reasons other than inheritance from a common ancestor (e.g. convergence, parallelism, reversal) Use derived condition, not ancestral - Synapomorphy (shared derived character): homologous traits share the same character state because it originated in their immediate common ancestor Plesiomorphy (shared ancestral character”): homologous traits share the same character state because they are inherited from a common distant ancestor DNA/RNA overview

35 Phenetics versus cladistics
Within the field of taxonomy there are two different methods and philosophies of building phylogenetic trees: cladistic and phenetic Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes; phenograms are based on overall similarity Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data; cladograms are based on character evolution (e.g. shared derived characters) Cladistics is becoming the method of choice; it is considered to be more powerful and to provide more realistic estimates, however, it is slower than phenetic algorithms

36 Genetic Distance DNA distances
- Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences - Insertion/deletions are generally given a larger weight than replacements (gap penalties) Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites - The distance matrix (rectangular or triangular): 7 Rat Mouse Rabbit Human Opossum Chicken Frog DNA/RNA overview Uncorrected (observed) distance: p-distance Corrected (estimated) distance: d-distance

37 Tree building methods Genetic Distance Unweighted Pair Group (UPGMA)
Character-State Unweighted Pair Group (UPGMA) Neighbor-Joining Fitch & Margoliash Maximum Parsimony Maximum Likelihood DNA/RNA overview

38 Tree building (distance based)
UPGMA - The simplest of the distance methods is the UPGMA (Unweighted Pair Group Method using Arithmetic averages) Many multiple alignment programs such as PILEUP use a variant of UPGMA to create a dendrogram of DNA sequences which is then used to guide the multiple alignment algorithm DNA/RNA overview

39 UPGMA A B C D E F G - 63 94 79 111 96 47 67 16 83 100 23 58 89 106 62 107 92 43 20 102 DNA/RNA overview

40 UPGMA A B C D E F G - 63 94 79 111 96 47 67 16 83 100 23 58 89 106 62 107 92 43 20 102 DNA/RNA overview

41 UPGMA A B C E F DG - 63 94 79 67 16 83 23 58 89 62 84 35 88 DNA/RNA overview

42 UPGMA A B E F CDG - 63 67 16 23 58 62 61 64 74 DNA/RNA overview


Download ppt "Phylogenic trees.."

Similar presentations


Ads by Google