Secondary Structure Prediction Protein Analysis Workshop 2008 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Protein Structure C483 Spring 2013.
Protein Structure Prediction
Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
1 September, 2004 Chapter 5 Macromolecular Structure.
Protein secondary structure prediction methods TDVEAAVNSLVNLYLQASYLS “From sequence to structure”
Protein secondary structure prediction methods TDVEAAVNSLVNLYLQASYLS “From sequence to structure”
1 Levels of Protein Structure Primary to Quaternary Structure.
An Introduction to Bioinformatics Protein Structure Prediction.
Protein Secondary Structure : Kendrew Solves the Structure of Myoglobin “Perhaps the most remarkable features of the molecule are its complexity.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Computing for Bioinformatics Lecture 8: protein folding.
Protein Structures.
(Foundation Block) Dr. Ahmed Mujamammi Dr. Sumbul Fatma
Lecture 3. α domain structures Coiled-coil, knobs and hole packing Four-helix bundle Donut ring large structure Globin fold Ridges and grooves model CS882,
Proteins Dr. Sumbul Fatma Clinical Chemistry Unit
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Housekeeping Your performance on the exam has caused me to re-evaluate how homework will be handled I will now be picking up every problem assigned on.
Diverse Macromolecules. V. proteins are macromolecules that are polymers formed from amino acids monomers A. proteins have great structural diversity.
Supersecondary structures. Supersecondary structures motifs motifs or folds, are particularly stable arrangements of several elements of the secondary.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Lecture 10: Protein structure
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Secondary Structure Prediction
Proteins. Proteins? What is its How does it How is its How does it How is it Where is it What are its.
Proteins: Amino Acid Chains DNA Polymerase from E. coli Standard amino acid backbone: Carboxylic acid group, amino group, the alpha hydrogen and an R group.
Levels of Protein Structure
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding & Biospectroscopy F14PFB David Robinson Mark Searle Jon McMaster
Proteins and Amino Acids 1. Biological Functions of Proteins Facilitate biochemical reactions Structural support Storage and Transport Immune protection.
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Mrs. Einstein Research in Molecular Biology. Importance of proteins for cell function: Proteins are the end product of the central dogma YOU are your.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
The α-helix forms within a continuous strech of the polypeptide chain 5.4 Å rise, 3.6 aa/turn  1.5 Å/aa N-term C-term prototypical  = -57  ψ = -47 
Protein Structure (Foundation Block) What are proteins? Four levels of structure (primary, secondary, tertiary, quaternary) Protein folding and stability.
1 Proteins Protein functions include: 1. enzyme catalysts 2. defense 3. transport 4. support 5. motion 6. regulation 7. storage Chapter 3- part 2.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein- Secondary, Tertiary, and Quaternary Structure.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Protein backbone Biochemical view:
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Marlou Snelleman 2012 Protein structure. Overview Sequence to structure Hydrogen bonds Helices Sheets Turns Hydrophobicity Helices Sheets Structure and.
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Structural organization of proteins
Mir Ishruna Muniyat. Primary structure (Amino acid sequence) ↓ Secondary structure ( α -helix, β -sheet ) ↓ Tertiary structure ( Three-dimensional.
Secondary Structure Prediction
Protein Structure BL
The heroic times of crystallography
Protein Structure September 7,
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
Conformationally changed Stability
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
Diverse Macromolecules
Conformationally changed Stability
Levels of Protein Structure
Fig 3.13 Reproduced from: Biochemistry by T.A. Brown, ISBN: © Scion Publishing Ltd, 2017.
The Three-Dimensional Structure of Proteins
Presentation transcript:

Secondary Structure Prediction Protein Analysis Workshop 2008 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta

Overview Hierarchy of protein structure. Introduction to structure prediction: Different approaches. Prediction of 1D strings of structural elements. Server/soft review: COILS, MPEx, … The PredictProtein metaserver.

Proteins Proteins play a crucial role in virtually all biological processes with a broad range of functions. Protein structure leads to protein function.

Hierachy of Protein Structure

Primary Structure: a Linear Arrangement of Amino Acids An amino acid has several structural components: a central carbon atom (C  ), an amino group (NH2), a carboxyl group (COOH), a hydrogen atom (H), a side chain (R). There are 20 amino acids The peptide bond is formed as the cacboxyl group of an aa bind to the amino group of the adjacent aa. The primary structure of a protein is simply the linear arrangement, or sequence, of the amino acid residues that compose it

Secondary Structure: Core Elements of Protein Architecture resulted from the folding of localized parts of a polypeptide chain. α-helix β-sheet Coils, turns, major internal supportive elements, 60 percent of the polypeptide chain

α-Helix Hydrogen-bonded 3.6 residues per turn Axial dipole moment Side chains point outward Average length is 10 amino acids (3 turns). Typically, rich of Analine, Glutamine, Leucine, Methione; and poor of Proline, Glycine, Tyrosine and Serine.

β-Sheet parallel anti-parallel Formed due to hydrogen bonds between β-strands which are short polypeptide segments (5-8 residues). Adjacent β-strands run in the same directions -> parallel sheet. Adjacent β-strands run in the oposite directions -> anti-parallel sheet. Ribbon diagram

Turns, loops, coils… A turn, composed of 3-4 residues, forms sharp bends that redirect the polypeptide backbone back toward the interior. A loop is similar with turns but can form longer bends Turns and loops help large proteins fold into compact structures. A random coil is a class of conformations that indicate an absence of regular secondary structure. Turn

Tertiary Structure: Overall Folding of Polypeptide Chain. stabilized by hydrophobic interactions between the nonpolar side chains, hydrogen bonds between polar side chains, and peptide bonds

Quaternary Structure: Arrangement of Multiple Folded Protein Molecules. HemoglobinDNA polymerase

Structure Prediction GPSRYIVDL… ? High importance in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes)

Structure Prediction Why: experimental methods, X-ray crystallography or NMR spectroscopy, are very time-consuming and relatively expensive. Challenges:  Extremely large number of possible structures.  the physical basis of protein structural stability is not fully understood. In this lecture, discuss about the protein secondary strutures prediction.

Secondary Structure Prediction Primary: MSEGEDDFPRKRTPWCFDDEHMC Secondary: CCHHHHHHCCCCEEEEEECCCCC Why: the first level of structural organization. The tasks: H: α-helix E: β- strand T: turn C: coil aa

Secondary Structure Prediction Single residue statistical analysis ( Chou-Fasman -1974) :  For each amino acid type, assign its ‘propensity’ to be in a helix, β- sheet, or coil.  Based on 15 proteins of known conformation, 2473 total amino acids.  Limited accuracy: ~55-60% on average.  Eg: Chou-Fasman (1974), not used any more

Secondary Structure Prediction Segment-based statistics:  Look for correlations (within aa windows).  Many algorithms have been tried.  Most performant: Neural Networks: Input: a number of protein sequences with their known secondary structure. Output: a trained network that predicts secondary structure elements for given query sequences. Accuracy < 70%.

POPULAR SERVERS FOR DEALING WITH SECONDARY STRUCTURES Coiled-coils Transmembrane helices Secondary structure Metaservers

Prediction of coiled-coils Coiled-coils are generally solvent exposed multi-stranded helix structures: Helix periodicity and solvent exposure impose special pattern of heptad repeat: … abcdefg …  hydrophobic residues  hydrophilic residues two-stranded (From Wikipedia Leucine zipper article) Helical diagram of 2 interacting helices:

Compares a sequence to a database of known, parallel two-stranded coiled-coils, and derives a similarity score. By comparing this score to the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation. Options: scoring matrices, window size (score may vary), weighting options. The COILS server at EMBnet

The program works well for parallel two- stranded structures that are solvent- exposed but runs progressively into problems with the addition of more helices, their antiparallel orientation and their decreasing length. The program fails entirely on buried structures. COILS Limitations

COILS Demo Let us submit the sequencesequence to the COILS server at EMBnet: >1jch_A VAAPVAFGFPALSTPGAGGLAVSISAGALSAAIADIMAALKGPFKFGLWGVALYGVLPSQ IAKDDPNMMSKIVTSLPADDITESPVSSLPLDKATVNVNVRVVDDVKDERQNISVVSGVP MSVPVVDAKPTERPGVFTASIPGAPVLNISVNNSTPAVQTLSPGVTNNTDKDVRPAFGTQ GGNTRDAVIRFPKDSGHNAVYVSVSDVLSPDQVKQRQDEENRRQQEWDATHPVEAAERNY ERARAELNQANEDVARNQERQAKAVQVYNSRKSELDAANKTLADAIAEIKQFNRFAHDPM AGGHRMWQMAGLKAQRAQTDVNNKQAAFDAAAKEKSDADAALSSAMESRKKKEDKKRSAE NNLNDEKNKPRKGFKDYGHDYHPAPKTENIKGLGDLKPGIPKTPKQNGGGKRKRWTGDKG RKIYEWDSQHGELEGYRASDGQHLGSFDPKTGNQLKGPDPKRNIKKYL

Transmembrane regions: Usually contain residues with hydrophobic side chains (surface must be hydrophobic). Usually ~20 residues long, can be up to 30 if not perpendicular through membrane. Methods: Hydropathy plots (historical, better methods now available) Threading ( TMpred, MEMSAT ), Hidden Markov Model ( TMHMM ), Neural Network ( PHDhtm ). Transmembrane Region Prediction

Hydropathy Plots (Kyte-Doolittle) The hydropathy index of an amino acid is a number representing the hydrophobic or hydrophilic properties of its side-chain compute an average hydropathy value for each position in the query sequence, window length of 19 usually chosen for membrane- spanning region prediction.

>sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIA AFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWL MAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSE GVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFG GDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLT GTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK Hydropathy Plot Servers Let us submit the sequencesequence to  Membrane Explorer (also as standalone MPEx),  Grease ( )

Hydropathy Plot  The larger the number is, the more hydrophobic the amino acid

Scans a candidate sequence for matches to a sequence scoring matrix, obtained by aligning the sequences of all transmembrane alpha-helical regions that are known from structures. These sequences are collected in a database called TMBase. TM Pred Method summary: Remark: Authors do not suggest this method for genomic sequences. Automatic methods recommended, eg, TMHMM, PHDhtm.

TM Pred Server >sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIA AFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWL MAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSE GVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFG GDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLT GTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK Let us submit RCEM_RHOVI again RCEM_RHOVI to the TMPred server at EMBnet:

allows you to obtain many informations based on your sequence including structure predictions, motif or domain search… The predictions are based on several methods. PredictProtein: A server which

For sequence analysis, structure and function prediction. When you submit any protein sequence PredictProtein retrieves similar sequences in the database and predicts aspects of protein structure and function SEG: finds low complexity regions. ProSite: database of functional motifs, ie, biologically relevant short patterns ProDom: a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases. PROFsec (PHDsec): secondary structure, PROFacc (PHDacc): solvent accessibility, PHDhtm: transmembrane helices. Sequence database is scanned for similar sequences (Blast, Psi-Blast). Multiple sequence alignment profiles are generated by weighted dynamic programming (MaxHom). The PredictProtein meta-server

PredictProtein Demo Let´s submit again to >uniprot|P00772|ELA1_PIG Elastase-1 precursor MLRLLVVASLVLYGHSTQDFPETNARVVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTL IRQNWVMTAAHCVDRELTFRVVVGEHNLNQNDGTEQYVGVQKIVVHPYWNTDDVAAGYDI ALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQLAQTLQQAYLPTVD YAICSSSSYWGSTVKNSMVCAGGDGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGC NVTRKPTVFTRVSAYISWINNVIASN For a list of mirror sites:

Results

Low-complexity regions Marked by ’X’

Secondary structure prediction results

Documentation: COILS: TMPred: MPEx: Articles:  B. Rost: Evolution teaches neural networks. In Scientific applications of neural nets. Ed. J.W.Clark, T.Lindenau, M.L. Ristig, (1999).  D.T Jones: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices. J.Mol.Biol. 292, (1999).  B. Rost: Prediction in 1D: Secondary Structure, Membrane Helices, and Accessibility. In Structural Bioinformatics (reference below). Books:  P.E. Bourne, H. Weissig: Structural Bioinformatics. Wiley-Liss,  A. Tramontano: Protein Structure Prediction. Wiley-VCH, References