Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bioinformatics Molecular Phylogeny Lesson 5.

Similar presentations


Presentation on theme: "Introduction to Bioinformatics Molecular Phylogeny Lesson 5."— Presentation transcript:

1 Introduction to Bioinformatics Molecular Phylogeny Lesson 5

2 2 Theory of Evolution: Life is monophyletic All organisms on Earth had a common ancestor. Any two organisms share a common ancestor in their past. Ancestor Descendant 1 Descendant 2

3 3 Theory of Evolution: Speciation events lead to creation of different species (two species ). Speciation caused by physical separation into groups where different genetic variants become dominant. Ancestor Descendant 1 Descendant 2

4 4 Ancestor

5 5

6 6

7 7 extinct extant 1 extant 2 The genetic distance between any two extant organisms is computable.

8 8 The differences between 1 and 2 are the result of changes on the lineage leading to descendant 1 + those on the lineage leading to descendant 2. descendant 1 descendant 2 ancestor

9 9 Thus, any set of species are related: the relation is Phylogeny The relationships can be represented by Phylogenetic Tree (or dendrogram)

10 10 5 MYA 120 MYA 1,500 MYA MYA = Million Years Ago

11 11 Phylogenetic Tree Terminology Graph composed of nodes & branches Each branch connects two adjacent nodes A B C D E F R

12 12 Phylogenetic Tree Terminology Nodes represent the taxonomic units Taxonomic units = species/genes/individuals Branch = relations among the taxonomic units (descant & ancestry) Branching pattern = Topology Branch lengths correspond to number of substitutions. Longer branch means more substitutions.

13 13 Phylogenetic Tree Terminology A B CDE internal node - hypothetical most recent common ancestors leaf (terminal node) - current day species or gene “ taxa ” Branches Root

14 14 OTUs & HTUs OTUs = Operational Taxonomic Units –leaves of the tree HTUs = Hypothetical Taxonomic Units –internal nodes of the tree

15 15 ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimpTrees

16 16 Same thing s4s5 s1 s3 s2 s4s5 s1 s3 s2 =

17 17 Newick format A B C D E ((A,B),(C,(D,E)));

18 18 Rooted vs. unrooted trees 1 2 3 31 2

19 19 Gorilla gorilla (Gorilla) Homo sapiens (human) Pan troglodytes (Chimpanzee) Gallus gallus (chicken)

20 20 3 possible UNROOTED trees: Human Chimp Chicken Gorilla Human Gorilla Chimp Chicken Human Chicken Chimp Gorilla the best tree

21 21 Rooting based on priori knowledge: Human Chimp Chicken Gorilla HumanChimpChickenGorilla

22 22 Ingroup / Outgroup: HumanChimp Chicken Gorilla INGROUP OUTGROUP

23 23 Monophyletic groups (clades): A group is monophyletic (clade) if it has a common ancestor and all the descendents of this ancestor are in the group.

24 24 Monophyletic groups HumanChimp Chicken Gorilla The Gorilla+Human+Chimp are monophyletic

25 25 Non-monophyletic groups WhaleChimp Drosophila Zebra-fish The Zebra-fish+Whale are not monophyletic: Adaptation to water occurred more than once during evolution, independently… (or was lost in the lineage leading to chimp).

26 26 Monophyletic groups: Human Chimp Chicken Gorilla When an unrooted tree is given, you cannot know which groups are monophyletic. You can only say which are not. For example, Chicken + Rat might be monophyletic if the root was between Chicken + Rat and the rest. In fact, the real root of the tree is between Chicken and the rest, hence Chicken and rat are not monophyletic. But, Human and Gorilla are not monophyletic no matter where is the root… Rat

27 27 What data can be used? (1) Molecular data (DNA, RNA, proteins) (2) Morphological data (living or fossilized organisms)

28 28 Advantages of molecular data: Heritable entities Characters’ description is unambiguous Molecular data are amenable to quantitative treatment Can assess evolutionary relationship among distantly related organisms (ribosomal RNA) More abundant data (bacteria, algae)

29 29 What we can learn from phylogenetics tree? Determining the closest relatives of the organism that’s you are interested in.

30 30 Example 1: Which species are closest to Human? Human Chimpanzee Gorilla Orangutan Gorilla Chimpanzee Orangutan Human Molecular analysis: Chimpanzee is related more closely to human than the gorilla Pre-Molecular analysis: The great apes (chimpanzee, Gorilla & orangutan) Separate from the human

31 31 Example 2 : Guilty Sequence - scientists map a murder weapon “In 1998, a Louisiana doctor was convicted of attempting to murder his ex-girlfriend, a nurse. The murder weapon was a syringe of HIV-infected blood drawn from a patient under the doctor's care.”

32 32 History of the virus: ©2002 National Academy of Sciences, U.S.A. Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99, 14292-14297 Phylogenetic analysis of the RT region. The smaller set of boxed sequences represents the sequences from the victim, and the larger set of boxed sequences represents the patient plus victim sequences. LA denote viral sequences from control HIV-1 infected individuals.

33 33 Species trees and Gene trees Species trees - representing the evolutionary relationships among species (the speciation process). Gene trees – Different genes may have different evolutionary history.

34 34 Before Darwin, homology was defined morphologically. Similarity between properties in various species. Example: Bats and butterflies fly, but the structures are different. Bats fly and whales swim, yet the bones in a bat's wing and a whale's flipper are strikingly alike. Conclusions: 1. Bats and butterflies wings are not homologous. 2. Bat wings and whales flippers are homologous. What is Homology ?

35 35 Darwin (1859): Homology is a result of descent with modifications from a common ancestor. Modern genetics: Homology is determined by genes. Two sequences are homologous if they are similar and share a common ancestor (similarity by itself is not enough). Large enough similarities typically imply homology. Homology Interpretation: from Darwin to 21st Century

36 36 Homolog A gene related to a second gene by descent from a common ancestral DNA sequence.

37 37 Orthologs Homologous sequences are orthologous if they were separated by a speciation event: If a gene exists in a species, and that species diverges into two species, then the copies of this gene in the resulting species are orthologous.

38 38 Orthologs Orthologs will typically have the same or similar function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.

39 39   Orthologs speciation ancestor descendant 2

40 40 Paralogs Homologous sequences are paralogous if they were separated by a gene duplication event: If a gene in an organism is duplicated, then the two copies are paralogous.

41 41 Paralogs Orthologs will typically have the same or similar function. This is not always true for paralogs due to lack of the original selective pressure upon one copy of the duplicated gene, this copy is free to mutate and acquire new functions.

42 42 Paralogs    Duplication

43 43 Orthologs & Paralogs    Duplication Speciation Species a Species b ParalogsOrthologs

44 44 How many rooted trees ab abcbaccab N=3, TR(3) = 3 bcd a cbd a dbc a acd b cad b TR = “TREE ROOTED” N=2, TR(2) = 1 dac b abd c bad c dab c abc dbac d cab d bcd a cbd a dbc a N=4, TR(4) = 15

45 45 Number of Number of taxarooted treesunrooted trees 211 331 4153 510515 6954105 710,395954 8135,13510,395 92,027,025135,135 1034,459,4252,027,025 11654,729,07534,459,425 1213,749,310,575654,729,075 Number of possible trees:

46 46 N Rooted =(2n-3)! / 2 n-2 (n-2)! N Unrooted =(2n-5)! / 2 n-3 (n-3)! Number of possible trees

47 47 Evolution is an historical process. Only one historical narrative is true. From 8,200,794,532,637,891,559,375 possibilities for 20 taxas, 1 possibility is true and 8,200,794,532,637,891,559,374 are false. Truth is one, falsehoods are many.

48 48 How do we know which of the 8,200,794,532,637,891,559,375 trees is true? We don’t, we infer by using decision criteria.

49 49 Methods

50 50 Approach 1 - Distance methods Two steps: –Compute a distances between any two sequences from the MSA. –Find the tree that agrees most with the distance table. Approach 2 - Character state methods Input: multiple sequence alignment Algorithms: –Maximum parsimony (MP) –Maximum likelihood (ML)

51 51 Step 1 :Distances estimation There are different methods to compute the distance between any two sequences. For example, one can take into account different probabilities between transitions and transversions… B 8 OTUABC CDCD 7 9 12 14 11 D A

52 52 Step 2: From a distance table to a tree Algorithms: –UPGMA –Neighbor Joining (NJ)

53 53 Neighbor Joining (NJ) Reconstructs unrooted tree Calculates branch lengths Based on Star decomposition In each stage, the two nearest nodes of the tree are chosen and defined as neighbors in our tree. This is done recursively until all of the nodes are paired together.

54 54 What are neighbours? Neighbours are defined as a pair of OTU's who have one internal node connecting them. Neighbors, we are … B D A C A and B are neighbours, C and D are neighbours, But… A and C are not neighbours…

55 55 Which pair is closest? Neighbors, we are … r i r i =Σd ik /(N-2) average distance from all nodes M ij = d ij - [r i + r j ] distance of i,j relative to the rest

56 56 7 9 OTU A BC CDECDE 12 1 3 D A B 8 A B C D(B,D) A C E E 11 10 2 6 E OTU A (B,D)C CECE 7 6 A 10 E 11 8 2

57 57 (B,D) A C E OTU A (B,D)C CECE 7 6 A 10 E 11 8 2 (B,D) (C,E) A B D C E A =

58 58 Advantages and disadvantages of NJ Advantages –is fast and thus suited for large datasets and for bootstrap analysis –permist lineages with largely different branch lengths –permits correction for multiple substitutions Disadvantages –sequence information is reduced gives only one possible tree –strongly dependent on the model of evolution used.


Download ppt "Introduction to Bioinformatics Molecular Phylogeny Lesson 5."

Similar presentations


Ads by Google