Presentation on theme: "Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report."— Presentation transcript:
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month Report 11-05-07
Identification of Transmembrane Regions To generate data for a plot, the protein sequence is scanned with a moving window of size 19-21 residues. At each position, the mean hydrophobic index of the amino acids within the window is calculated and that value plotted as the midpoint of the window. Aquaporin
The Positive Inside Rule Hydrophobic: Val, Phe, Ile, Leu, Met. Positive: Lys, Arg, His. Cytoplasmic loops are enriched in positively charged residues: the 'positive-inside rule' of von Heijne
Assembling a novel data set of transmembrane proteins In order to study and predict features of transmembrane proteins, the use of a high quality data set containing sequences with experimentally confirmed TM regions is essential. The data set was based on the widely used Möller test set (2001). Additional data was collected from MPTOPO, OPM, SWISSPROT and from the literature. Sequences were blasted against the PDB in order to identify entries for which the TM region had complete structural coverage. This set was then homology reduced at the 40% sequence identity level. The makeup of the final data set contains 141 sequences, all with available structures, verifiable topology and N-terminal locations. 111 Alpha-helical proteins 30 Beta-barrel proteins
Assignment of PROSITE motifs to topological regions We next explored the possibility that motifs from the PROSITE database could be used as constraints in subsequent topology prediction steps, by identifying a bias in their inside/outside frequency. Extracelullar Cytoplasm
A Bioperl module to draw transmembrane proteins
Conclusions I have successfully achieved my major goal for the first 6 months - to create a high quality dataset of transmembrane proteins of known topology. My second goal - to scan the novel data set against motif and domain databases to identify signatures which were consistently located on either inside or outside loops - has also been completed. In collaboration with Dr Sara Mole (MRC Laboratory for Molecular Cell Biology), I have begun an analysis of CLN3 (Batten's Disease protein) with a view to predicting the protein's topology using a combination of computational and experimental evidence. I have written a module using Perl to create graphical representations of transmembrane proteins given the positions of their transmembrane helices and the location of the N-terminal. This module has been accepted by the Bioperl project and will be available in the next release.