V6 SS 2006 Membrane Bioinformatics – Part II 1 V6 – Secondary Structure of TM proteins suggested reading for this lecture: Appl. Bioinf. 1, 21 (2002) Introduction.

V6 SS 2006 Membrane Bioinformatics – Part II 1 V6 – Secondary Structure of TM proteins suggested reading for this lecture: Appl. Bioinf. 1, 21 (2002) Introduction Prediction of secondary structure elements Performance on test sets

V6 SS 2006 Membrane Bioinformatics – Part II 2 Introduction Membrane proteins are crucial for survival: - they are key components for cell-cell signaling - they mediate the transport of ions and solutes across the membrane - they are crucial for recognition of self. The pharmaceutical industry preferably targets membrane-bound receptors. Particularly important: large super-family of G protein-coupled receptors (GPCRs) - receptors for hormones, neurotransmitters, growth factors, light and odor-related ligands. More than 50% of the prescription drugs act on GPCRs.

V6 SS 2006 Membrane Bioinformatics – Part II 3 Inside the lipid bilayer, the protein backbone may not form hydrogen bonds with the aliphatic chains of the phospholipid molecules  the backbone atoms need to form H-bonds among eachother.  they adopt either  -helical or  -sheet conformations. Topology of Membrane Proteins

V6 SS 2006 Membrane Bioinformatics – Part II 4 Topology of Membrane Proteins http://www.biologie.uni-konstanz.de/folding/Structure%20gallery%201.html

V6 SS 2006 Membrane Bioinformatics – Part II 5 History of membrane protein structure determination 1984 bacterial reaction centernoble price to Michel, Deisenhöfer, Huber 1987 1990EM map of bacteriorhodopsinHenderson 1997 high-resolution structure by Lücke now several intermediates of the photocycle 1992porin (complete  -barrel) 1998halorhodopsin 1995 Cytochrome c Oxidase 1998F 1 ATPasenoble price to John Walker 1997 1998KCSA ion channelnoble price to Roderick McKinnon 2003 2000aquaporin 2000rhodopsin (Palczewski) 2002SERCA Ca 2+ ATPase (Toyoshima) 2003voltage-gated ion channel 2005NaH Antiporter (Hunte)

V6 SS 2006 Membrane Bioinformatics – Part II 6 Lipid bilayer simplifies the prediction problem TM proteins are forced into two classes:  -helical, or  -sheet.  -helices are typically tilted with respect to the membrane normal between 10 – 45°. The hydrophobic lipid bilayer reduces the three-dimensional structure formation almost to a 2D problem.

V6 SS 2006 Membrane Bioinformatics – Part II 7 Predicting TM helix location Hydrophobicity scales provide simple criteria to predict membrane helices. TMH can be predicted based on the distinctive patterns of hydrophobic (TM) and polar (non-membrane or water-soluble) regions within the sequence. Observed patterns: (1) TM helices are predominantly apolar and 12-35 residues long. (2) Globular regions between TMH are typically shorter than 60 residues (3) Most TMH proteins have a specific distribution of the positively charged amino acids arginine and lysine, „positive-inside-rule“ (Gunnar von Heijne). Connecting „loop“ regions on the inside of the membrane have more positive charges than „loop“ regions on the outside. (4) Long globular regions (> 60 residues) differ in their composition from those globular regions subject to the „inside-out-rule“:

V6 SS 2006 Membrane Bioinformatics – Part II 8 Kyte-Doolittle hydrophobicity scale (1982) Assign hydropathy value to each amino acid. Use sliding-window to identify membrane regions. Sum the hydrophobicity scale over all w residues in the window of length w. Use threshold T to assign segment as predicted membrane helix. w = 19 residues could best discriminate between membrane and globular proteins. Threshold T > 1.6 was suggested for the average over 19 residues.

V6 SS 2006 Membrane Bioinformatics – Part II 9 More refined indices One drawback of pure hydropathy-based methods is that they fail to discriminate accurately between membrane regions and highly hydrophobic globular segments. PRED-TMR algorithm: combine with propensities of finding certain amino acid residues at the termini of TM helices. Other hydrophobicity scales: - Wimley & White : based on partition experiments of peptides between water/lipid bilayer and water/octanol - TMFinder (Liu & Deber scale) : based on HPLC retention time of peptides with non-polar phase helicity. http://blanco.biomol.uci.edu/hydrophobicity_scales.html

V6 SS 2006 Membrane Bioinformatics – Part II 10 Folding of helical membrane proteins White, FEBS Lett. 555, 116 (2003)

V6 SS 2006 Membrane Bioinformatics – Part II 11 Hydrophobicity Scales White, FEBS Lett. 555, 116 (2003)

V6 SS 2006 Membrane Bioinformatics – Part II 12 Translocon-assisted folding of TM proteins? White, FEBS Lett. 555, 116 (2003) Upper picture (model!): the newly synthesized polypeptide chain of a membrane protein is inserted from the ribosome into the membrane via interaction with a TM complex, the “translocon” (EM map shown). lower picture: experiment largely supports the concerted view. What determines insertion into the membrane ?

V6 SS 2006 Membrane Bioinformatics – Part II 13 Integration of H-segments into the microsomal membrane Hessa et al., Nature 433, 377 (2005) b, Membrane integration of H-segments with the Leu/Ala composition 2L/17A, 3L/16A and 4L/15A. Bands of unglycosylated protein are indicated by a white dot; singly and doubly glycosylated proteins are indicated by one and two black dots, respectively. Ingenious experiment! Introduce marker that shows whether helix segment H is inserted into membrane or not. a, Wild-type Lep has two N-terminal TM segments (TM1 and TM2) and a large luminal domain (P2). H-segments were inserted between residues 226 and 253 in the P2-domain. Glycosylation acceptor sites (G1 and G2) were placed in positions 96–98 and 258–260, flanking the H-segment. For H- segments that integrate into the membrane, only the G1 site is glycosylated (left), whereas both the G1 and G2 sites are glycosylated for H-segments that do not integrate in the membrane (right).

V6 SS 2006 Membrane Bioinformatics – Part II 14 Insertion determined by simple physical chemistry Hessa et al., Nature 433, 377 (2005) c,  G app values for H-segments with 2–4 Leu residues. Individual points for a given n show  G app values obtained when the position of Leu is changed. d, Mean probability of insertion (p) for H-segments with n = 0–7 Leu residues. measure fraction of singly glycosylated (f 1g ) vs. doubly glycosylated (f 2g ) Lep molecules

V6 SS 2006 Membrane Bioinformatics – Part II 15 Biological and biophysical  G aa scales Hessa et al., Nature 433, 377 (2005) a,  G app aa scale derived from H-segments with the indicated amino acid placed in the middle of the 19-residue hydrophobic stretch. Only Ile, Leu, Phe, Val really favor membrane insertion. All polar and charged ones are very unfavored. b, Correlation between  G app aa values measured in vivo and in vitro. c, Correlation between the  G app aa and the Wimley–White water/octanol free energy scale for partitioning of peptides.

V6 SS 2006 Membrane Bioinformatics – Part II 16 Positional dependencies in  G app Hessa et al., Nature 433, 377 (2005) a, Symmetrical H-segment scans with pairs of Leu (red), Phe (green), Trp (pink) or Tyr (light blue) residues. The Leu scan is based on symmetrical 3L/16A H-segments with a Leu-Leu separation of one residue (sequence shown at the top; the two red Leu residues are moved symmetrically outwards) up to a separation of 17 residues. For the Phe scan, the composition of the central 19-residues of the H- segments is 2F/1L/16A, for the Trp scan it is 2W/2L/15A, and for the Tyr scan it is 2Y/3L/14A. The  G app value for the 4L/15A H-segment GGPGAAALAALAAAAALAALAAAGPGG is also shown (dark blue). b, Red lines show  G app values for symmetrical scans of 2L/17A (triangles), 3L/16A (circles), and 4L/15A (squares) H-segments. c, Same as b but for a symmetrical scan with pairs of Ser residues in H-segments with the composition 2S/4L/13A. Tyr and Trp are favorable in interface region.

V6 SS 2006 Membrane Bioinformatics – Part II 17 Using observed amino acid propensities With availability of more and more 3D structures, it became possible to train statistical approaches based on the observed frequencies of amino acids in membrane proteins vs. non-membrane proteins. Similar concept as that in secondary structure prediction for globular proteins. TMpred : uses statistical amino acid preferences for scoring SPLIT (Juretic et al.) : - uses derived amino acid preferences for the „state“ membrane helix for a data set of integral membrane proteins with partially known secondary structure - combine with preferences for  -strand, turn and non-regular secondary structure based on sets of soluble proteins with known structure. This method can identify shorter, unstable or movable membrane-helices.

V6 SS 2006 Membrane Bioinformatics – Part II 18 Incorporating more information: TopPred TopPred (von Heijne 1992) predicts the complete topology of membrane proteins by using - hydrophobicity analysis - automatic generation of possible topologies - ranking these topologies by the positive-inside rule. TopPred uses a particular sliding trapezoid window to detect segments of outstanding hydrophobicity. The two bases of the trapezoid are 11 and 21 residues long. TopPred chooses thresholds by considering a segment as TM helix that yielded the optimal difference between the number of positively charged residues at the inside and at the outside.

V6 SS 2006 Membrane Bioinformatics – Part II 19 Improvements from dynamic programming: MEMSAT MEMSAT (1994) implemented statistical tables (log likelihoods) compiled from well-characterized TM proteins and a dynamic programming algorithm to recognize membrane topology models by expectation maximisation. Residues are classified as being one of 5 structural states: L i inside loop L o outside loop H i inside helix end H m helix middle H o outside helix end. Helix end caps are defined to span over 4 adjacent residues (one helical turn). Compile propensities of amino acids for 5 states. Calculate score of relating given sequences to a predicted topology. Finding optimal score is guaranteed by dynamic programming.

V6 SS 2006 Membrane Bioinformatics – Part II 20 Using evolutionary information It is known from predicting secondary structures of globular proteins that using multiple sequence alignment information improves prediction accuracy significantly. PHDtm: predict location and topology of TM helices by a system of neural networks. Was later combined with dynamical programming.

V6 SS 2006 Membrane Bioinformatics – Part II 21 Using evolutionary information TMAP (1996): uses propensity values determined for segments of 21 consecutive residues in transmembrane segments (P m ), and for the flanking 4-residue caps of TM helices (P e ). Residues with high P m tend to be hydrophobic residues with high P e tend to be polar and basic. Compute compositional difference in the protein segments exposed to the two surfaces of a membrane for 12 important residues: mostly at the outside of membranes: Asn, Asp, Gly, Phe, Pro, Trp, Tyr, Val mostly inside: Ala, Arg, Cys, Lys. Use consensus over these 12 residues to predict topology.

V6 SS 2006 Membrane Bioinformatics – Part II 22 Using grammatical rules The lipid bilayer constrains the structure of the membrane-passing regions of proteins in many ways. TMHMM (Sonnhammer et al. 1998, Krogh et al. 2001) and HMMTOP (Tusnady & Simon 1998, 2001) implement Hidden Markov Models. TMHMM: uses cyclic model with 7 states for - TM helix core - TM helix caps on the N- and C-terminal side - non-membrane region on the cytoplasmic side - 2 non-membrane regions on the non-cytoplasmic side (for short and long loops to account for different membrane insertion mechanism) - a globular domain state in the middle of each non-membrane region

V6 SS 2006 Membrane Bioinformatics – Part II 23 Using grammatical rules HMMTOP: uses hidden Markov model distinguishing 5 structural states - inside non-membrane regions - inside TMH-cap - membrane helix - outside TMH-cap - outside non-membrane region This model is similar to MEMSAT.

V6 SS 2006 Membrane Bioinformatics – Part II 24 Availability of prediction methods. Many of these servers are also available through a Meta-Server META-PP at the site of Burkhard Rost.

V6 SS 2006 Membrane Bioinformatics – Part II 25 Prediction accuracy Often, authors claimed that their methods are > 90% accurate. However, Chen and Rost claim that most authors have significantly overestimated the accuracy of their methods. (1) there are not enough high-resolution structures to allow a statistically significant analysis. Training and test sets may share or have homologous members. Using low-resolution experiments, e.g. gene fusion, is no work around. Low-resolution experiments differ from high-resolution structures almost as much as prediction methods. (2) All methods optimise some parameters. Methods perform much better on proteins for which they were developed than on new proteins.

V6 SS 2006 Membrane Bioinformatics – Part II 26 Prediction accuracy (3) Methods using evolutionary information failed due to the surprising fact that membrane helices are not entirely conserved across species. This is surprising since it implies that those proteins either do not perform similar cellular functions, e.g. GPCRs, or that we can actually realize the function with a different number of membrane regions in some cases. (4) Levels of prediction accuracy between methods can often not be compared appropriately to one another since they are frequently based on different measures for prediction accuracy and on different data sets.

V6 SS 2006 Membrane Bioinformatics – Part II 27 Most methods get number of helices right All methods based on advanced algorithms tend to underestimate TM helices %obs > %prd. a Data set: Sequence-unique subset of 36 high-resolution TM helical proteins from PDB. This is the largest subset of all 105 high-resolution membrane chains, which fulfils the condition that no pair in the set has significant sequence similarity as defined in Rost (1999). b Methods c Per-segment accuracy: Q ok percentage of proteins for which all TM helices are predicted correctly (allowed deviation of up to 3 residues), Q %obs htm percentage of all observed helices that are correctly predicted, Q %prd htm percentage of all predicted helices that are correctly predicted, TOPO percentage of proteins for which the topology (orientation of helices) is correctly predicted (empty for methods that do not predict topology). d Per-residue accuracy: Q 2 percentage of correctly predicted residues in two-states: membrane helix / non-membrane helix, Q %obs 2T percentage of all observed TMH helix residues that are correctly predicted, Q %prd 2T percentage of all predicted TMH helix residues that are correctly predicted, Q %obs 2N percentage of all observed non-TMH helix residues that are correctly predicted, Q %prd 2N percentage of all predicted non-TMH helix residues that are correctly predicted. e ERROR: the estimates for per-segment accuracy resulted from a bootstrap experiment with M = 100 and K = 18; the estimates for per-residue accuracy were obtained by standard deviations over Gaussian distributions for the respective score. f Numbers in italics: two standard deviations below the numerically highest value in each column (set in bold letters). NOTE: all methods are tested on the same set of proteins. However, the numbers are NOT from a cross-validation experiment, ie some methods may have used some of the proteins for training. Generally, newer methods are more likely to be overestimated than older ones. In particular, HMMTOP2, TMHMM1, and WW have been developed using ALL the proteins listed here.

V6 SS 2006 Membrane Bioinformatics – Part II 28 Prediction accuracy About 86% of the TMH residues predicted by the best methods are correctly predicted. Assume that we consider a prediction of a membrane helix correct if the predicted and the observed helical regions differ by less than 3 residues.  the best current methods correctly predict all membrane helices for 70 – 75% of all proteins. However, the topology is predicted correctly for only about half of all proteins. The best method, HMMTOP2, had all proteins listed in its training set. Simple hydrophobicity scales are less accurate than advanced methods.

V6 SS 2006 Membrane Bioinformatics – Part II 29 All methods confuse TM helices with signal peptides Signal peptides that are cleaved off secreted proteins usually contain stretches of hydrophobic residues resembling membrane helices. The most accurate specialists for membrane prediction (TMHMM and PHDhtm) falsely predict about 30 – 40% of all signal peptides as TM helices. Simple hydrophobicity scales predict more than 90% of the signal peptides as TM helices.

V6 SS 2006 Membrane Bioinformatics – Part II 30 Many methods predict TM helices in globular proteins Simple hydrophobicity scales reach levels close to 100% false positives. Advanced methods (SOSUI; TMHMM1, PHDhtm) predict TM helices in less than 2% of all globular proteins. Different methods predict similar numbers of TM proteins in genomes: about 10 – 30%. The overall content of TM proteins in genomes of different complexity is similar. However, eukaryotes have significantly more proteins with > 10 TM helices than all other species. Also, the distribution is different: eukaryotes have more 7 TM proteins (receptors) prokaryotes have more 6TM and 12TM proteins (ABC transporters).

V6 SS 2006 Membrane Bioinformatics – Part II 31 Future directions Meta servers yield improved predictions. > 90% correct topologies can be obtained by a simple majority vote between the results of various methods. TM helix prediction and signal peptide prediction should be combined Useful: databases for particular families of TM proteins and sequence motifs e.g. GPCR database Membrane-specific substitution matrices improve database searches e.g. PHAT by Henikoff & Henikoff improved alignments of TM proteins

V6 SS 2006 Membrane Bioinformatics – Part II 32 Summary TM helices are typically continuous stretches of mostly hydrophobic residues. Simple methods based on summing up hydrophobicities work okay but not really well. Advanced methods include additional features such as the „positive-inside rule“. The currently most successful methods are based on Hidden Markov Models or Neural Networks. Evaluating performance accuracy should be done using carefully separated training and test sets. It is possible to discriminate signal peptides and TM helices. Only Split 4.0 may detect short non-membrane spanning helices.

V6 SS 2006 Membrane Bioinformatics – Part II 1 V6 – Secondary Structure of TM proteins suggested reading for this lecture: Appl. Bioinf. 1, 21 (2002) Introduction.

Similar presentations

Presentation on theme: "V6 SS 2006 Membrane Bioinformatics – Part II 1 V6 – Secondary Structure of TM proteins suggested reading for this lecture: Appl. Bioinf. 1, 21 (2002) Introduction."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

V6 SS 2006 Membrane Bioinformatics – Part II 1 V6 – Secondary Structure of TM proteins suggested reading for this lecture: Appl. Bioinf. 1, 21 (2002) Introduction.

Similar presentations

Presentation on theme: "V6 SS 2006 Membrane Bioinformatics – Part II 1 V6 – Secondary Structure of TM proteins suggested reading for this lecture: Appl. Bioinf. 1, 21 (2002) Introduction."— Presentation transcript:

Similar presentations

About project

Feedback