School of Medical Education Liverpool, L69 3GE, UK Phylogeny of the Human Protein Tyrosine Kinases Dr John Smith Abstract The.

School of Medical Education Liverpool, L69 3GE, UK email: jasmith@liv.ac.uk Phylogeny of the Human Protein Tyrosine Kinases Dr John Smith Abstract The tyrosine kinases form a well conserved family of enzymes with a high degree of homology whose relationships are well known. This allows the possibility of reconstructing the pathway of their evolution. By working backwards from the sequences of existing enzymes, a possible sequence for the prototype tyrosine kinase has been constructed. The sequences inferred for intermediate ancestral sequences will aid study of their functional and developmental relationships. Introduction The tyrosine kinase family of enzymes is essentially though not entirely restricted to the metazoa 1 and is involved in intercellular signalling pathways. It is of interest because of the high degree of conservation of its catalytic domain and the relatively large number of members 2. These properties combine to give the possibility of reconstructing the pathway of evolution of the family. The stringent requirements of amino acid positioning for catalytic activity have resulted in regions that are highly conserved (http://ca.expasy.org/cgi-bin/nicedoc.pl?PDOC00100 accessed 10/5/05), in which the pace of change is so slow that much of the pathway of their evolution may be inferred. Here the question is addressed: how much information relating to the evolution of the protein tyrosine kinases is preserved in existing sequences? Methods Protein tyrosine kinase domains were selected from Swissprot database (http://www.ncbi.nlm.nih.gov/entrez/) and arranged into families on the basis of homology relatedness. This corresponded to families defined using extracellular structure 3. A family tree available on a commercial website was used for convenience (http://www.cellsignal.com/reference/kinase/tk.asp accessed 6/03). Assuming that each branch point represented a gene duplication event, the immediate ancestral gene as it was at the time of duplication was given a name (fig 1) and a sequence was determined as a consensus sequence of its progeny using its nearest neighbour as an outgroup to determine which amino acid was the original where those of the progeny differed. (‘x’ was used where this could not be determined). To enable this, the amino acid sequences of the gene products had to be aligned. In order to align amino acids, sequences were ‘piled up’ to locate conserved stretches and variable inserts. Initially the clustal alignment of the NCBI conserved domain database for kinases (http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi) was used to give each amino acid a number in the (longest aggregate) sequence, though some adjustments were made manually as necessary to improve fit. For each amino acid, an evolutionary tree was constructed by using successive neighbours or derived neighbours as outgroups. The final stem sequence (S1) was first rooted with M3K9 as outgroup, but then refined using a TKL stem sequence derived in the same way, using S1 of the TKL family as the final outgroup. Where ‘x’s accumulated, a tentative assignment was made by looking for amino acids that appeared in progeny on both sides of a divide. Finally, the tree was constructed that required the least number of mutations overall. Where there was a choice of equal parsimony, it was assumed that the same mutation had occurred twice during the family development rather than a forward mutation that was subsequently reversed. Only those amino acids present in essentially all the sequences were used. Insertions and deletions were treated according to the same rules used in deciding parent amino acids. Results Deriving a stem sequence. A publicly available family tree that shows the sequence similarity between protein kinase domains, derived from public sequences and gene prediction methods detailed elsewhere 4 was used as a basis for the reconstruction as described in Methods. Each of the branch points, which represents the terminal state of a gene product prior to gene duplication was given a name. The sequences of immediate precursors of existing gene products were deduced as described in ‘methods’, and these sequences in turn used to deduce the sequences of their ancestors. Using the family tree as indicated above, a putative stem sequence for the protein tyrosine kinases was derived (fig 1). Refining the tree. The number of changes in amino acids between each sequence and its progeny allowed the assignment of lengths to each branch of the tree, putatively giving a relative time scale to the phylogenetic chart (fig 2). Some amino acid locations were clearly more variable than others (fig 3). In particular, the region of the inserted stretch ( in part of the protein tyrosine kinase sub-family D) between amino acids 94-95 (not shown) was so variable that the parent sequences were not derivable with any degree of certainty and these regions were not used to calculate lengths of the branches. It was noted that the overall distance between the stem origin and the final sequence increased with the number of notional gene duplications involved in its derivation (fig 4). The slope of the correlation corresponded to approximately 5 amino acids per additional gene duplication Discussion When sequences of extant proteins are aligned, some alignments are tentative; manual ‘tidying’ is often necessary. There is evidence of multiple changes at some loci (fig 4); eg TrC, shows 240 mutations in the course of its evolution from S1, but differs in only 124 amino acids from it. However, if the constancy of certain sequences of amino acids indicates their functional consistency 5,6, then the least certain amino acid assignments are the least important. Refinement of sequences may be obtained by the use of multiple species to avoid the effect of modern ‘noise’ – recent mutation. The common sequence of DNA at the branch point of the mammals is claimed to be discernable to 98% at the nucleic acid base level even in non-coding regions 7. The further back in development that is to be derived, the more helpful other ‘primitive’ species would be. ‘Primitive’ species, however, being smaller, tend to have more rapid generations, hence faster development. Hence, C. elegans is primitive in having generally only one member of each subfamily of tyrosine kinases, but the sequences themselves are more derived. The putative sequences of intermediate and stem tyrosine kinases will allow construction of those molecules, attached, if appropriate to modern external domains and this will provide insight as to their former functions. This will confirm functional inferences that would otherwise need to be gained by statistical predictions 6. This is of interest in interpreting the role of tyrosine kinases in the evolutionary development of multicellular / tissue interactions and in embryonic development, and their effects when inappropriately expressed in cancer. References 1 King N. & Carroll,S.B. A receptor tyrosine kinase from choanoflagellates: Molecular insights into early animal evolution. PNAS 98, 15032-15037, (2001) 2Manning, G., Plowman, G.D., Hunter,T. & Sudarsanam,S. Evolution of protein kinase signaling from yeast to man. Trends in Biochemical Sciences 27, 514-520. (2002) 3 Fantl, W.J., Johnson, D.E. & Williams,L.T. Signalling by receptor tyrosine kinases. Ann. Rev. Biochem. 62, 453-481 (1993) 4 Manning, G., Whyte, D.B., Martinez, R., Hunter,T. & Sudarsanam, S. The Protein Kinase Complement of the Human Genome. Science, 298, 1912-1934, (2002) 5Gu, X. Statistical Methods for Testing Functional Divergence after Gene Duplication. Mol.Biol.Evol. 16, 1664-1674 (1999) 6Gu, J., & Gu, X. Natural History and Functional Divergence of Protein tyrosine Kinases. Gene 317, 49-57 (2003) 7Blanchette, M., Green, E.D., Miller, W. & Haussler, D. Reconstructing large regions of an ancestral mammalian genome in silico. Genome Research 14, 2412-2423, (2004) Additional Information The full set of derived sequences for each ancestral protein may be found at: http://pcwww.liv.ac.uk/~jasmith/kinases.htm Fig1: The Putative Stem Sequence for the Protein Tyrosine Kinases. Amino acids shown in bold black face are invariant in all derived sequences, amino acids shown in red are present in both immediately derived sequences and those in blue are present in one immediately derived sequence and in a sequence derived from the other immediately derived sequence. Fig4: Effect of Number of Gene Duplication Events on Final Evolutionary Distance. The total number of amino acid differences of each kinase domain from the deduced stem sequence is plotted against the number of gene duplication events in its ancestry. A positive correlation is observed that corresponds to approximately 5 extra mutations per event. Fig2 Graded Evolutionary Tree of the Human Tyrosine Kinases. The kinase domains as labelled in figure1 are plotted according to their evolutionary distance from their respective ancestral forms as measured by the number of mutations observed. Fig3: Frequency of Mutation at Each Amino Acid Site. For each location, the number of mutations observed during the evolution of the kinases is recorded. The maximum number of observable mutations that could take place at a site is 176.

School of Medical Education Liverpool, L69 3GE, UK Phylogeny of the Human Protein Tyrosine Kinases Dr John Smith Abstract The.

Similar presentations

Presentation on theme: "School of Medical Education Liverpool, L69 3GE, UK Phylogeny of the Human Protein Tyrosine Kinases Dr John Smith Abstract The."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

School of Medical Education Liverpool, L69 3GE, UK Phylogeny of the Human Protein Tyrosine Kinases Dr John Smith Abstract The.

Similar presentations

Presentation on theme: "School of Medical Education Liverpool, L69 3GE, UK Phylogeny of the Human Protein Tyrosine Kinases Dr John Smith Abstract The."— Presentation transcript:

Similar presentations

About project

Feedback