Proteins Secondary Structure Predictions

Slides:



Advertisements
Similar presentations
Amino Acids PHC 211.  Characteristics and Structures of amino acids  Classification of Amino Acids  Essential and Nonessential Amino Acids  Levels.
Advertisements

Secondary structure prediction from amino acid sequence.
Protein Structure Prediction
BY1101 Introduction to Molecular and Cellular Biology Tutorial for module BY1101: Proteins and nucleic acids Joe Colgan
1 Lesson 5 Protein Prediction and Classification.
1 September, 2004 Chapter 5 Macromolecular Structure.
Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB.
Strict Regularities in Structure-Sequence Relationship
Proteins Dr Una Fairbrother. Dipeptides u Two amino acids are combined as in the diagram, to form a dipeptide. u Water is the other product.
Applied Bioinformatics The amino acids. Overview Proteins (sneak preview) – Primary structure – Secondary structure – Tertiary structure The amino acids.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Peptides to Proteins. What are proteins? How are proteins made? How do proteins fold? Why are proteins important?
Computing for Bioinformatics Lecture 8: protein folding.
Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Protein Structure.
(Foundation Block) Dr. Ahmed Mujamammi Dr. Sumbul Fatma
Proteins account for more than 50% of the dry mass of most cells
Secondary Structure Prediction Protein Analysis Workshop 2008 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta
Proteins Secondary Structure Predictions Structural Bioinformatics.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
STRUCTURAL ORGANIZATION
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Prediction of protein structure
Now playing: Frank Sinatra “My Way” A large part of modern biology is understanding large molecules like Proteins A large part of modern biology is understanding.
Secondary structure prediction
Doug Raiford Lesson 19.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.
Protein Structure 101 Alexey Onufriev, Virginia Tech
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Visualisation/prediction 3D structures. Recognition ability is the basis of biological function 3D struture is key for recognition.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Chapter 3 Proteins.
Proteins Secondary Structure Predictions
Structural Bioinformatics
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Proteins Protos “of prime importance” Big Idea: Proteins perform the actions of the cell, they are coded for by the DNA. DNA is the principal, proteins.
Macromolecules 3: Proteins. Your Assignment Your Protein Structure Assignment 1. Define proteins and their function 2. What is an amino acid (monomers.
Proteins Structure Predictions Structural Bioinformatics.
A PRESENTATION ON AMINO ACIDS AND PROTEINS PRESENTED BY SOMESH SHARMA Chemical Engineering Arham Veerayatan Institute of Engineering Technology.
Amino Acids. Amino acids are used in every cell of your body to build the proteins you need to survive. Amino Acids have a two-carbon bond: – One of the.
Sparse nonnegative matrix factorization for protein sequence motifs information discovery Presented by Wooyoung Kim Computer Science, Georgia State University.
1 4. Nucleic acids and proteins in one and more dimensions - second part.
Peptides to Proteins. What are PROTEINS? Proteins are large, complex molecules that serve diverse functional and structural roles within cells.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
molecule's structure prediction
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Mir Ishruna Muniyat. Primary structure (Amino acid sequence) ↓ Secondary structure ( α -helix, β -sheet ) ↓ Tertiary structure ( Three-dimensional.
Mandatory to put some order in such a vast wealth of structural knowledge 4. Nucleic acids and proteins in one and more dimensions - second part.
Protein structure is conceptually divided into four levels of organization Primary structure is the amino acid sequence of a protein's polypeptide chain.
Protein Folding Notes.
Lecture 3   Proteins Proteins consist of amino-acids linked together in chains through peptide bonds. An amino acid consists of a carbon atom bound to.
Protein structure (Foundation Block) Dr. Sumbul Fatma
Protein Structure September 7,
Protein Folding.
Proteins.
Conformationally changed Stability
Introduction to Bioinformatics II
3. Proteins Monomer = Amino acids Globular in shape Or Spherical.
Chapter 3 Proteins.
Protein Structures.
Introduction and Fundamentals of Protein Structure
Proteins Genetic information in DNA codes specifically for the production of proteins Cells have thousands of different proteins, each with a specific.
Conformationally changed Stability
Introduction and Fundamentals of Protein Structure
Protein Structure.
Protein structure (Foundation Block).
Presentation transcript:

Proteins Secondary Structure Predictions Structural Bioinformatics Proteins Secondary Structure Predictions

Structure Prediction Motivation Better understand protein function Broaden homology Detect similar function where sequence differs (only ~50% remote homologies can be detected based on sequence) Explain disease Explain the effect of mutations Design drugs

Myoglobin – the first high resolution protein structure Solved in 1958 by Max Perutz John Kendrew of Cambridge University. Won the 1962 and Nobel Prize in Chemistry. “ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”

(some times impossible) MERFGYTRAANCEAP…. Predicting the three dimensional structure from sequence of a protein is very hard (some times impossible) However we can predict with relative high precision the secondary structure

What do we mean by Secondary Structure ? Secondary structure are the building blocks of the protein structure: =

What do we mean by Secondary Structure ? Secondary structure is usually divided into three categories: Anything else – turn/loop Alpha helix Beta strand (sheet)

Alpha Helix: Pauling (1951) A consecutive stretch of 5-40 amino acids (average 10). A right-handed spiral conformation. 3.6 amino acids per turn. Stabilized by H-bonds 3.6 residues 5.6 Å

Beta Strand: Pauling and Corey (1951) Different polypeptide chains run alongside each other and are linked together by hydrogen bonds. Each section is called β -strand, and consists of 5-10 amino acids. β -strand

3.47Å 4.6Å Beta Sheet The strands become adjacent to each other, forming beta-sheet. 3.25Å 4.6Å Antiparallel Parallel

Loops Connect the secondary structure elements. Have various length and shapes. Located at the surface of the folded protein and therefore may have important role in biological recognition processes.

Three dimensional Tertiary Structure Describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the level of one whole polypeptide chain

Secondary Tertiary RBP Globin

How do the (secondary and tertiary) structures relate to the primary protein sequence??

SEQUENCE STRUCTURE -Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen) - Protein structure is more conserved than protein sequence and more closely related to function.

How (CAN) Different Amino Acid Sequence Determine Similar Protein Structure ?? Lesk and Chothia 1980

The Globin Family

Different sequences can result in similar structures 1ecd 2hhd

We can learn about the important features which determine structure and function by comparing the sequences and structures ?

The Globin Family

Why is Proline 36 conserved in all the globin family ?

Where are the gaps?? The gaps in the pairwise alignment are mapped to the loop regions

How are remote homologs related in terms of their structure? retinol-binding protein odorant-binding apolipoprotein D RBD b-lactoglobulin

PSI-BLAST alignment of RBP and b-lactoglobulin: iteration 3 Score = 159 bits (404), Expect = 1e-38 Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%) Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59 Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112 Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159

The Retinol Binding Protein b-lactoglobulin

Structure Prediction: Motivation Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR) Only about ~50000 solved protein structures Experimental methods are time consuming and not always possible Goal: Predict protein structure based on sequence information

Prediction Approaches Tow stage 1. Primary (sequence) to secondary structure 2. Secondary to tertiary One stage - Primary to tertiary structure

According to the most simplified model: In a first step, the secondary structure is predicted based on the sequence. The secondary structure elements are then arranged to produce the tertiary structure, i.e. the structure of a protein chain. For molecules which are composed of different subunits, the protein chains are arranged to form the quaternary structure.

Secondary Structure Prediction Given a primary sequence ADSGHYRFASGFTYKKMNCTEAA what secondary structure will it adopt ?

Secondary Structure Prediction Methods Chou-Fasman / GOR Method Based on amino acid frequencies Machine learning methods PHDsec and PSIpred HMM (Hidden Markov Model)

Chou and Fasman (1974) Success rate of 50% Name P(a) P(b) P(turn) Alanine 142 83 66 Arginine 98 93 95 Aspartic Acid 101 54 146 Asparagine 67 89 156 Cysteine 70 119 119 Glutamic Acid 151 037 74 Glutamine 111 110 98 Glycine 57 75 156 Histidine 100 87 95 Isoleucine 108 160 47 Leucine 121 130 59 Lysine 114 74 101 Methionine 145 105 60 Phenylalanine 113 138 60 Proline 57 55 152 Serine 77 75 143 Threonine 83 119 96 Tryptophan 108 137 96 Tyrosine 69 147 114 Valine 106 170 50 The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet  breaker) Success rate of 50%

Secondary Structure Method Improvements ‘Sliding window’ approach Most alpha helices are ~12 residues long Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold  predict this is an alpha helix/beta sheet TGTAGPOLKCHIQWMLPLKK

Improvements since 1980’s Adding information from conservation in MSA Smarter algorithms (e.g. Machine learning, HMM). Success -> 75%-80%

Machine learning approach for predicting Secondary Structure (PHD, PSIpred) Query SwissProt Step 1: Generating a multiple sequence alignment Query Subject Subject Subject Subject

Step 2: Additional sequences are added using a profile. We end up with a MSA which represents the protein family. Query seed MSA Query Subject Subject Subject Subject

Step 3: The sequence profile of the protein family is compared (by machine learning methods) to sequences with known secondary structure. Query seed Machine Learning Approach MSA Known structures Query Subject Subject Subject Subject

TGTAGPOLKCHIQWML p = ? HHHHHHHLLLLBBBBB HMM approach for predicting Secondary Structure (SAM) HMM enables us to calculate the probability of assigning a sequence to a secondary structure TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB p = ?

Beginning with an α-helix α-helix followed by α-helix The probability of observing Alanine as part of a β-sheet The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15 Table built according to large database of known secondary structures

The above table enables us to calculate the probability of assigning secondary structure to a protein Example TGQ HHH p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995

Secondary structure prediction AGADIR - An algorithm to predict the helical content of peptides APSSP - Advanced Protein Secondary Structure Prediction Server GOR - Garnier et al, 1996 HNN - Hierarchical Neural Network method (Guermeur, 1997) Jpred - A consensus method for protein secondary structure prediction at University of Dundee JUFO - Protein secondary structure prediction from sequence (neural network) nnPredict - University of California at San Francisco (UCSF) PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia University Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction PSA - BioMolecular Engineering Research Center (BMERC) / Boston PSIpred - Various protein structure prediction methods at Brunel University SOPMA - Geourjon and Delיage, 1995 SSpro - Secondary structure prediction using bidirectional recurrent neural networks at University of California DLP - Domain linker prediction at RIKEN