School of Medical Education Liverpool, L69 3GE, UK Phylogeny of the Human Protein Tyrosine Kinases Dr John Smith Abstract The.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Tree of Life Chapter 26.
18.2 Modern Evolutionary Classification
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Phylogeny and Systematics
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Introduction to Bioinformatics
Molecular Clock I. Evolutionary rate Xuhua Xia
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Sequence similarity.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
D.5: Phylogeny and Systematics
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Sequencing a genome and Basic Sequence Alignment
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Phylogeny & the Tree of Life
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Chapter 3 The Interrupted Gene.
Lesson Overview 17.4 Molecular Evolution.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Phylogeny.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. By Chris Paine
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are.
Section 2: Modern Systematics
Phylogeny and the Tree of Life
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Phylogeny & the Tree of Life
17.2 Classification based on evolutionary relationships
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Lesson Overview 17.4 Molecular Evolution.
Pipelines for Computational Analysis (Bioinformatics)
In-Text Art, Ch. 16, p. 316 (1).
5.4 Cladistics.
Biological Classification: The science of taxonomy
Summary and Recommendations
D.5: Phylogeny and Systematics
Chapter 4 The Interrupted Gene.
Chapter 20 Phylogenetic Trees.
Chapter 19 Molecular Phylogenetics
Phylogeny and Systematics (Part 6)
Lesson Overview 17.4 Molecular Evolution.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Unit Genomic sequencing
5.4 Cladistics.
Summary and Recommendations
Fig. 2. —Phylogenetic relationships and motif compositions of some representative MORC genes in plants and animals. ... Fig. 2. —Phylogenetic relationships.
1 2 Biology Warm Up Day 6 Turn phones in the baskets
Presentation transcript:

School of Medical Education Liverpool, L69 3GE, UK Phylogeny of the Human Protein Tyrosine Kinases Dr John Smith Abstract The tyrosine kinases form a well conserved family of enzymes with a high degree of homology whose relationships are well known. This allows the possibility of reconstructing the pathway of their evolution. By working backwards from the sequences of existing enzymes, a possible sequence for the prototype tyrosine kinase has been constructed. The sequences inferred for intermediate ancestral sequences will aid study of their functional and developmental relationships. Introduction The tyrosine kinase family of enzymes is essentially though not entirely restricted to the metazoa 1 and is involved in intercellular signalling pathways. It is of interest because of the high degree of conservation of its catalytic domain and the relatively large number of members 2. These properties combine to give the possibility of reconstructing the pathway of evolution of the family. The stringent requirements of amino acid positioning for catalytic activity have resulted in regions that are highly conserved ( accessed 10/5/05), in which the pace of change is so slow that much of the pathway of their evolution may be inferred. Here the question is addressed: how much information relating to the evolution of the protein tyrosine kinases is preserved in existing sequences? Methods Protein tyrosine kinase domains were selected from Swissprot database ( and arranged into families on the basis of homology relatedness. This corresponded to families defined using extracellular structure 3. A family tree available on a commercial website was used for convenience ( accessed 6/03). Assuming that each branch point represented a gene duplication event, the immediate ancestral gene as it was at the time of duplication was given a name (fig 1) and a sequence was determined as a consensus sequence of its progeny using its nearest neighbour as an outgroup to determine which amino acid was the original where those of the progeny differed. (‘x’ was used where this could not be determined). To enable this, the amino acid sequences of the gene products had to be aligned. In order to align amino acids, sequences were ‘piled up’ to locate conserved stretches and variable inserts. Initially the clustal alignment of the NCBI conserved domain database for kinases ( was used to give each amino acid a number in the (longest aggregate) sequence, though some adjustments were made manually as necessary to improve fit. For each amino acid, an evolutionary tree was constructed by using successive neighbours or derived neighbours as outgroups. The final stem sequence (S1) was first rooted with M3K9 as outgroup, but then refined using a TKL stem sequence derived in the same way, using S1 of the TKL family as the final outgroup. Where ‘x’s accumulated, a tentative assignment was made by looking for amino acids that appeared in progeny on both sides of a divide. Finally, the tree was constructed that required the least number of mutations overall. Where there was a choice of equal parsimony, it was assumed that the same mutation had occurred twice during the family development rather than a forward mutation that was subsequently reversed. Only those amino acids present in essentially all the sequences were used. Insertions and deletions were treated according to the same rules used in deciding parent amino acids. Results Deriving a stem sequence. A publicly available family tree that shows the sequence similarity between protein kinase domains, derived from public sequences and gene prediction methods detailed elsewhere 4 was used as a basis for the reconstruction as described in Methods. Each of the branch points, which represents the terminal state of a gene product prior to gene duplication was given a name. The sequences of immediate precursors of existing gene products were deduced as described in ‘methods’, and these sequences in turn used to deduce the sequences of their ancestors. Using the family tree as indicated above, a putative stem sequence for the protein tyrosine kinases was derived (fig 1). Refining the tree. The number of changes in amino acids between each sequence and its progeny allowed the assignment of lengths to each branch of the tree, putatively giving a relative time scale to the phylogenetic chart (fig 2). Some amino acid locations were clearly more variable than others (fig 3). In particular, the region of the inserted stretch ( in part of the protein tyrosine kinase sub-family D) between amino acids (not shown) was so variable that the parent sequences were not derivable with any degree of certainty and these regions were not used to calculate lengths of the branches. It was noted that the overall distance between the stem origin and the final sequence increased with the number of notional gene duplications involved in its derivation (fig 4). The slope of the correlation corresponded to approximately 5 amino acids per additional gene duplication Discussion When sequences of extant proteins are aligned, some alignments are tentative; manual ‘tidying’ is often necessary. There is evidence of multiple changes at some loci (fig 4); eg TrC, shows 240 mutations in the course of its evolution from S1, but differs in only 124 amino acids from it. However, if the constancy of certain sequences of amino acids indicates their functional consistency 5,6, then the least certain amino acid assignments are the least important. Refinement of sequences may be obtained by the use of multiple species to avoid the effect of modern ‘noise’ – recent mutation. The common sequence of DNA at the branch point of the mammals is claimed to be discernable to 98% at the nucleic acid base level even in non-coding regions 7. The further back in development that is to be derived, the more helpful other ‘primitive’ species would be. ‘Primitive’ species, however, being smaller, tend to have more rapid generations, hence faster development. Hence, C. elegans is primitive in having generally only one member of each subfamily of tyrosine kinases, but the sequences themselves are more derived. The putative sequences of intermediate and stem tyrosine kinases will allow construction of those molecules, attached, if appropriate to modern external domains and this will provide insight as to their former functions. This will confirm functional inferences that would otherwise need to be gained by statistical predictions 6. This is of interest in interpreting the role of tyrosine kinases in the evolutionary development of multicellular / tissue interactions and in embryonic development, and their effects when inappropriately expressed in cancer. References 1 King N. & Carroll,S.B. A receptor tyrosine kinase from choanoflagellates: Molecular insights into early animal evolution. PNAS 98, , (2001) 2Manning, G., Plowman, G.D., Hunter,T. & Sudarsanam,S. Evolution of protein kinase signaling from yeast to man. Trends in Biochemical Sciences 27, (2002) 3 Fantl, W.J., Johnson, D.E. & Williams,L.T. Signalling by receptor tyrosine kinases. Ann. Rev. Biochem. 62, (1993) 4 Manning, G., Whyte, D.B., Martinez, R., Hunter,T. & Sudarsanam, S. The Protein Kinase Complement of the Human Genome. Science, 298, , (2002) 5Gu, X. Statistical Methods for Testing Functional Divergence after Gene Duplication. Mol.Biol.Evol. 16, (1999) 6Gu, J., & Gu, X. Natural History and Functional Divergence of Protein tyrosine Kinases. Gene 317, (2003) 7Blanchette, M., Green, E.D., Miller, W. & Haussler, D. Reconstructing large regions of an ancestral mammalian genome in silico. Genome Research 14, , (2004) Additional Information The full set of derived sequences for each ancestral protein may be found at: Fig1: The Putative Stem Sequence for the Protein Tyrosine Kinases. Amino acids shown in bold black face are invariant in all derived sequences, amino acids shown in red are present in both immediately derived sequences and those in blue are present in one immediately derived sequence and in a sequence derived from the other immediately derived sequence. Fig4: Effect of Number of Gene Duplication Events on Final Evolutionary Distance. The total number of amino acid differences of each kinase domain from the deduced stem sequence is plotted against the number of gene duplication events in its ancestry. A positive correlation is observed that corresponds to approximately 5 extra mutations per event. Fig2 Graded Evolutionary Tree of the Human Tyrosine Kinases. The kinase domains as labelled in figure1 are plotted according to their evolutionary distance from their respective ancestral forms as measured by the number of mutations observed. Fig3: Frequency of Mutation at Each Amino Acid Site. For each location, the number of mutations observed during the evolution of the kinases is recorded. The maximum number of observable mutations that could take place at a site is 176.