Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.

Similar presentations


Presentation on theme: "Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is."— Presentation transcript:

1 Chapter 10 Phylogenetic Basics

2 Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is the study of the evolutionary history of organisms Based on fossil data in the Victorian era, but more recently on molecular data Sequences in biological polymers provide a history of changes Advantages of molecular Phylogenetics: Molecular data more numerous than fossils No sampling bias involved More robust phylogenetic trees can be constructed Molecular evolution and molecular phylogenetics

3 Major assumptions Sequences used must be homologous Phylogenetic divergence is assumed to be bifurcating (=forking) Each position in the sequence evolved independently Variability is informative enough to construct unambiguous trees

4 Terminology branch taxon node root node clade monophyletic lineage dichotomy polytomy

5 A B C D A B CD unrooted rooted Unrooted tree No knowledge of common ancestor Relative relationships No evolutionary direction To root unrooted tree: Use outgroup (distant relation; e.g.. bird for mammal tree) Midpoint rooting (midpoint of two most divergent groups)

6 Gene phylogeny versus species phylogeny Objective of constructing molecular phylogenetic trees is to reconstruct the evolutionary history and relation ships between species or organisms The rate at which a gene evolves may not mirror that of a species Genes may arrive by horizontal transfer An internal node in a molecular phylogenetic tree represents a gen duplication, whereas in a species phylogenetic tree, it represents a speciation event To get accurate phylogenetics of species from molecular data require phylogenetic analysis of several gene or protein families

7 Forms of tree representation A BCDE ABCD E A B C D E AB C D E Cladogram Phylogram Non-scaled Scaled

8 Newick format A BCDE AB C D E (((B,C),A),(D,E)) (((B:1,C:2),A:2),(D:1.2,E:2.4))

9 Finding a tree may be difficult Number of possible tree topologies is a function of the number of taxa Rooted trees: N R = (2n-3)!/2 n-2 (n-2)! Unrooted trees: N U = (2n-5)!/2 n-3 (n-3)!

10 Procedure to construct a tree Choosing molecular markers Performing multiple sequence alignment Choose model of evolution Determining a tree-building method Assessing tree reliability

11 Choice of molecular markers DNA retains smaller changes (only 4 nucleotides) To study closely related organisms, use DNA For human population studies, use non-coding mitochondrial sequences More widely divergent groups, rRNA or protein sequences Comparing bacteria with eukaryotes, use conserved protein sequences Proteins more conserved to due degeneracy of codons Different evolutionary rates between nucleotides in codons DNA sequences biased because of codon preferences Two random DAN sequences will have 50% identity if gaps are allowed Random protein sequences only 10% identity Gaps in protein coding sequences are biologically meaningless Protein-based phylogeny preferable to nucleotide-based phylogeny DNA provides data on synonymous and non-synonymous substitution that provides information on positive and negative selection

12 Alignment Correct alignment crucial otherwise there will be errors in trees Use modern package such as T-coffee Manual verification and editing essential Secondary structure can serve as guide in alignment (Praline) Non-homologous regions may have to be removed (subjective) Remove Indels Gaps regions may belong to signature indels and contain phylogenetic information

13 Multiple substitutions The number of differences between two aligned sequence is an indication of their evolutionary distance … or does at? What about A->T->G->C? G->C->G? Such multiple substitutions and convergences obscure true evolutionary distances Known as homoplasy Need statistical models to correct for homoplasy

14 Jukes-Cantor Model Assumes all substitutions occur with same probability d AB = -(3/4)ln[1-(4/3)  AB ] d AB is evolutionary distance  AB observed sequences difference Two 10 nucleotide sequences that differ at three nucleotides:  AB = 0.3 d AB = -(3/4)ln[1-(4/3)0.3] = 0.38 Mostly for closely related sequences

15 Kimura Model d AB = -(1/2)ln(1-2  ti -  tv )-(1/4)ln(1-2  tv ) d AB evolutionary distance between two aligned sequences A and B  ti observed frequency for transition  tv observed frequency for transversion If 30% difference is due to 20% transitions and 10% transversion: d AB = -(1/2)ln(1-2.0.2-0.1)-(1/4)ln(1-2.0.1) = 0.4 For protein sequences can use a PAM substitution matrix that includes evolutionary information Kimura model for proteins: d = -ln(1-p-0.2p 2 ) where p is observed pairwise distance

16 Among site variation In DNA mutation rate differs by codon position In proteins there are functional constraints Proportion of positions have invariant rates and others variable rates The distribution of variable sites follow a  distribution  -corrected Jukes-Cantor: d AB = (3/4)  [(1-4/3  AB ) -1/  -1]  -corrected Kimura: d AB = (  /2)[(1-2  ti -  tv ) -1/  -(1/2)(1-2  tv ) -1/  -1/2]


Download ppt "Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is."

Similar presentations


Ads by Google