Presentation on theme: "Progress in Transmembrane Protein Research 12 Month Report Tim Nugent."— Presentation transcript:
Progress in Transmembrane Protein Research 12 Month Report Tim Nugent
Assignment of PROSITE motifs to topological regions We explored the possibility that motifs from the PROSITE database could be used as constraints in subsequent topology prediction steps, by identifying a bias in their inside/outside frequency. Extracelullar Cytoplasm
Alpha-helical protein PROSITE motif assignments
Using PROSITE motifs to enhance topology prediction
CLN3 Topology Prediction
Model is in agreement with all published experimental data. Potential amphipathic helix. Bias is hydrophobic/polar residue placement 2 Arginine residues in close proximity – possible anion channel?
Using Support Vector Machines for Topology prediction Earlier approaches have relied on physiochemical properties such as hydrophobicity to identify transmembrane helices (e.g Kyte-Doolittle). Recently, more advanced methods using machine learning algorithms such as hidden Markov models (e.g. TMHMM, PHOBIUS) and neural networks (MEMSAT3) have been developed, They have achieved significant improvements in prediction accuracy (~80%). However, none of the top scoring methods use SVMs. While hidden Markov models and neural networks may have multiple outputs, SVMs are binary classifiers. In order to deal with TM topology prediction, multiple SVM will have to be combined, e.g. TM helix / Loop Inside Loop / Outside Loop Signal Peptide / TM helix Re-entrant Loop / TM helix
Helix / Loop SVM Prediction Accuracy TM helix / Loop SVM: PSI-BLAST profiles Normalised by Z-score 29 residue sliding window 3 rd order polynomial kernel function Mathews Correlation Coefficient = 0.75 Precision = 0.86 Recall = 0.32 TP= 8384 FP= 1355 TN= FN= 1969 Kyte-Doolittle MCC: 0.64 MEMSAT3 MMC: 0.76 Overlap of at least 37 sequences between Moller dataset and novel training set.
SVM Results – Particulate Methane Monooxygenase subunit C
SVM Results – Cytochrome b6f subunit A
Further work Expand training set: ~45 sequences to add. Additional sequences where the TMH are known but the topology is not can be used to train the Helix/Loop classifier. Parameter optimisation. Window size Kernel type Signal peptide SVM. Re-entrant loop SVM. Combine SVM raw scores/probabilities into a topology.