Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Similar presentations


Presentation on theme: "CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:"— Presentation transcript:

1 CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu.www.cse.sc.edu HAPPY CHINESE NEW YEAR

2 Outline Introduction to Evolution What is phylogeny and phylogenetics Application of phylogenetics Algorithms for phylogenetic inference 10/28/20152

3 How did life evolve on earth? Courtesy of the Tree of Life project An international effort to understand how life evolved on earth Biomedical applications: drug design, protein structure and function prediction, biodiversity.

4 Evolution Evolution of new organisms is driven by Mutations ◦ The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. Selection bias

5 Theory of Evolution Basic idea ◦ speciation events lead to creation of different species. ◦ Speciation caused by physical separation into groups where different genetic variants become dominant Any two species share a (possibly distant) common ancestor

6 Primate evolution A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

7 DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT TAGCCCATAGACTTAGCGCTTAGCACAAAGGGCAT TAGCCCTAGCACTT AAGACTT TGGACTTAAGGCCT AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT

8 Morphological vs. Molecular Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc. Modern biological methods allow to use molecular features ◦ Gene sequences ◦ Protein sequences ◦ Whole genome sequences. E.g. rearrangements

9 Morphological topology Archonta Glires Ungulata Carnivora Insectivora Xenarthra (Based on Mc Kenna and Bell, 1997)

10 RatQEPGGLVVPPTDA RabbitQEPGGMVVPPTDA GorillaQEPGGLVVPPTDA CatREPGGLVVPPTEG From sequences to a phylogenetic tree There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

11 Perissodactyla Carnivora Cetartiodactyla Rodentia 1 Hedgehogs Rodentia 2 Primates Chiroptera Moles+Shrews Afrotheria Xenarthra Lagomorpha + Scandentia Mitochondrial topology (Based on Pupko et al.,)

12 Phylogenenetic trees Leaves - current day species (or taxa – plural of taxon) Internal vertices - hypothetical common ancestors Edges length - “time” from one speciation to the next AardvarkBisonChimpDogElephant

13 Types of Trees A natural model to consider is that of rooted trees Common Ancestor

14 Types of trees Unrooted tree represents the same phylogeny without the root node Depending on the model, data from current day species does not distinguish between different placements of the root.

15 Rooted versus unrooted trees Tree a a b Tree b c Tree c Represents the three rooted trees

16 What is phylogenetics? Phylogenetics is the study of evolutionary relationships among and within species. ◦ Inference of trees from data ◦ Interpreting the evolutionary tree ◦ Application of evolutionary trees crocodiles birds lizards snakes rodents primates marsupials

17 What is phylogenetics? crocodiles birds lizards snakes rodents primates marsupials This is an example of a phylogenetic tree.

18 Forensics: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist? Applications of phylogenetics Conservation: How much gene flow is there among local populations of island foxes off the coast of California? Medicine: What are the evolutionary relationships among the various prion-related diseases? HIV case

19 Applications of phylogenetics 1. Forensics Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?

20 Phylogenetic analysis

21 So what do the results mean? 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses? Do we have enough data to be confident in our conclusions? What additional data would help? If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?

22 Applications of phylogenetics 2. Conservation How much gene flow is there among local populations of island foxes off the coast of California?

23 http://bioquest.org/bedrock/ Wayne, K. R, Morin, P.A. 2004 Conservation Genetics in the New Molecular Age, Frontiers in Ecology and the Environment. 2: 89-97. (ESA publication)

24 Applications of phylogenetics 3. Medicine What are the evolutionary relationships among the various prion-related diseases?

25 Inferring Phylogenies Trees can be inferred: ◦ Morphology of the organisms ◦ Sequence comparison Example: Orc: ACAGTGACGCCCCAAACGT Elf: ACAGTGACGCTACAAACGT Dwarf: CCTGTGACGTAACAAACGA Hobbit: CCTGTGACGTAGCAAACGA Human: CCTGTGACGTAGCAAACGA

26 How Many Trees? Unrooted treesRooted trees # sequences # pairwise distances# trees # branches /tree# trees # branches /tree 3 4 5 6 10 30 N (assuming bifurcation only)

27 How Many Trees? 2N - 2(2N - 3)! 2 N - 2 (N - 2)! 2N - 3(2N - 5)! 2 N - 3 (N - 3)! N (N - 1) 2 N 58 4.95  10 38 57 8.69  10 36 43530 1834,459,425172,027,0254510 9459105156 8105715105 6155364 433133 # branches /tree# trees # branches /tree# trees # pairwise distance s # sequence s Rooted treesUnrooted trees

28 Phylogenetic Methods Maximum likelihood Maximizes likelihood of observed data Many different procedures exist. Three of the most popular: Maximum parsimony Minimizes total evolutionary change Neighbor-joining Minimizes distance between nearest neighbors

29 Comparison of Methods Neighbor-joiningMaximum parsimonyMaximum likelihood Very fastSlowVery slow Easily trapped in local optima Assumptions fail when evolution is rapid Highly dependent on assumed evolution model Good for generating tentative tree, or choosing among multiple trees Best option when tractable (<30 taxa, strong conservation) Good for very small data sets and for testing trees built using other methods

30 Distance based tree Construction Distance- A weighted tree that realizes the distances between the objects. Given a set of species (leaves in a supposed tree), and distances between them – construct a phylogeny which best “fits” the distances.

31 Distance Matrix Given n species, we can compute the n x n distance matrix D ij D ij may be defined as the edit distance between a gene in species i and species j, where the gene of interest is sequenced for all n species.

32 Distances in Trees Edges may have weights reflecting: ◦ Number of mutations on evolutionary path from one species to another ◦ Time estimate for evolution of one species into another In a tree T, we often compute d ij (T) - the length of a path between leaves i and j

33 Distance in Trees: an Exampe d 1,4 = 12 + 13 + 14 + 17 + 12 = 68 i j

34 Fitting Distance Matrix Given n species, we can compute the n x n distance matrix D ij Evolution of these genes is described by a tree that we don’t know. We need an algorithm to construct a tree that best fits the distance matrix D ij

35 Summary Evolution and Phylogeny Concepts of Phylogenetics Application of Phylogenetics Category of phylogenetic inference algorithms Next lecture: Detailed algorithms for phylogenetic inference

36 Acknowledgement Anonymous authors


Download ppt "CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:"

Similar presentations


Ads by Google