Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda,

Slides:



Advertisements
Similar presentations
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Advertisements

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent.
Progress in Transmembrane Protein Research 12 Month Report Tim Nugent.
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Support Vector Machine-based Transmembrane Protein Topology Prediction Tim Nugent.
Secondary structure prediction from amino acid sequence.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Pfam(Protein families )
Profiles for Sequences
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center.
Protein Tertiary Structure Prediction
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Strict Regularities in Structure-Sequence Relationship
An Introduction to Bioinformatics Protein Structure Prediction.
Garnier-Osguthorpe-Robson
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
PREDICTION OF PROTEIN FEATURES Beyond protein structure (TM, signal/target peptides, coiled coils, conservation…)
Arne Elofsson On the evolution of membrane proteins Homology detection Blast, PSIBLAST are they good ? SHRIMP (Bernsel, submitted) Evolution.
Protein Classification A comparison of function inference techniques.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Lecture 3. α domain structures Coiled-coil, knobs and hole packing Four-helix bundle Donut ring large structure Globin fold Ridges and grooves model CS882,
Protein Tertiary Structure Prediction
Evolving Models of Biological Sequence Similarity Daniel P. Miranker The University of Texas at Austin [Chenetal98]
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Transmembrane proteins in the Protein Data Bank: identification and classification Gabor, E. Tusnady, Zsuzanna Dosztanyi and Istvan Simon Bioinformatics,
Levels of Protein Structure
BINF6201/8201 Hidden Markov Models for Sequence Analysis
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
Mrs. Einstein Research in Molecular Biology. Importance of proteins for cell function: Proteins are the end product of the central dogma YOU are your.
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
THE PUZZLING PROPERTIES OF THE PERMEASE (PPP) Kim Finer, Jennifer Galovich, Ruth Gyure, Dave Westenberg March 4, 2006.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Protein Tertiary Structure Prediction Structural Bioinformatics.
Doug Raiford Lesson 14.  Reminder  Involved in virtually every chemical reaction ▪ Enzymes catalyze reactions  Structure ▪ muscle, keratins (skin,
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Chapter 14 Protein Structure Classification
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Prediction of Protein Structure and Function on a Proteomic Scale
Protein Structure Prediction
Protein structure prediction.
Volume 34, Issue 4, Pages (May 2009)
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (IV)
Volume 20, Issue 3, Pages (March 2012)
Profile HMMs GeneScan TMMOD
Volume 85, Issue 4, Pages (October 2003)
Volume 5, Issue 3, Pages (March 1997)
Protein structure prediction
Presentation transcript:

Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda, Erik Gransetha and Arne Elofsson Journal of Molecular Biology 2006 Aug 18;361(3); Tim Nugent BugF 8 th March 2007

Structural regions of alpha-helical proteins Recently, the number of solved alpha-helical TM structures has increased rapidly. Structural complexity has been revealed to be equivalent of globular proteins. The most prominent features of TM proteins are membrane spanning alpha-helices. These are connected by loop regions.

Substructures Several other functionally and structurally important substructures exists. One such substructure is the interface helix region, situated parallel with the membrane in the membrane-water interface region. Another type is the reentrant region – part of the loop region which penetrates the membrane, but enters and exits on the same side.

Definition and properties or reentrant regions Reentrant regions are defined as sequences which start and end on the same side of the membrane, and penetrate between 3 Å and 25 Å. Sequence stretches with a depth of between 1.5 Å and 3 Å are also defined as reentrant regions if residue depth monotonically increase/decrease on the respective entrance/exit sides of the deepest residue, and there is a clear turn in the membrane. Classification was performed by visual inspection. 79 transmembrane proteins with known 3D structure were attained from the Membrane Protein Structure database and the Protein Data Bank. Homology reduced at 30% sequence similarity. Based on the definition: – 36 reentrant regions – 302 transmembrane regions – 80 interface helix regions

Region comparison Fraction of irregular secondary structure elements is larger in reentrant regions than in regular TM helices. Average fraction of helical residues for reentrant regions is 57% with a clear correlation between helical content and length of the region (correlation coefficient = 0.75).

Three classes of reentrant regions can be identified Based on secondary structure - a helix must be at least 5 residues long; shorter helical regions are defined as a coil. Helix-Coil-Helix:

Three classes of reentrant regions can be identified Helix-Coil or Coil-Helix:

Three classes of reentrant regions can be identified Coil / irregular secondary structure:

Region length vs penetration depth

Amino acid composition of reentrant regions and PCA

Identification and prediction of reentrant regions Developed TOP-MOD - a hidden Markov model-based method to classify the residues of a TM sequence into four structural classes – M, R, I and L.

Distinguishing reentrant regions from loop and interface helix regions Believed that reentrant regions form relatively late in the overall folding dynamics, after the initial translocation and formation of the membrane spanning helices. Their emergence can be visualised as a process in which parts of inter-TM regions are pulled into the membrane. To test this, inter-TM parts from each sequence were cut out and TOP-MOD was used to make a region classification on these subsequences.

Distinguishing reentrant regions from loop and interface helix regions

Predicting reentrant regions on whole sequence level So far, TOP-MOD has only been tested on sequences connecting TM helices. The possibility to distinguish between different types of structural region on a whole sequence level was evaluated. First, sequences where the approximate location of TM regions was considered to be know were analysed. Central residues of membrane regions were constrained to the HMM compartment modeling the membrane regions using sequence labels. Second, topology predictor PRODIV-TMHMM used as a pre-processor to predict location of TM helices.

Scanning for reentrant regions in E. coli, S. cerevisiae and H. sapiens Using TOP-MOD and PRODIV-TMHMM, TM proteins of E. coli, S. cerevisiae and H. sapiens were scanned to make a preliminary estimate of the occurrence of reentrant regions in these genomes. Fraction is found to be at least 10% in all three genomes. To avoid false positives, sensitivity was set fairly low suggesting that the reentrant fraction may be even higher.

Scanning for reentrant regions in E. coli, S. cerevisiae and H. sapiens Fraction of proteins predicted with reentrant regions increases linearly with the number of predicted TM regions. In two TM-number categories the fraction is lower: 7-TM GPCRs and 12-TM major facilitator superfamily transporters.

Proteins of a particular molecular function with predicted reentrant region Each sequence was mapped to HMM-based domain library PFAM. Earlier literature suggests reentrant loops were primarily found in passive transporter proteins. This data suggests their occurrence in active transporters is higher than previously thought.

Conclusions For at least the last 10 years, the dominating non-experimental way of attaining structural information of alpha-helical TM proteins has been by predicting topology. As more 3D structures have been resolved, it has become apparent that TM proteins are often too complex to fit in to the helix, inside loop, outside loop constraints where loops are always on opposite sides of the membrane. This suggests that a finer grained nomenclature, as well as finer grained methods, is needed to study these proteins. Define more detailed substructures. Predict the structure directly using ab initio methods. Solve more 3D structures.