Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein threading algorithms 1.GenTHREADER Jones, D. T. JMB(1999) 287, Protein Fold Recognition by Prediction-based Threading Rost, B., Schneider,
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
Thomas Blicher Center for Biological Sequence Analysis
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Protein Tertiary Structure Prediction Structural Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Prediction and Analysis
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Rising accuracy of protein secondary structure prediction Burkhard Rost
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Lecture 10 – protein structure prediction. A protein sequence.
Representations of Molecular Structure: Bonds Only.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Secondary structure prediction
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Secondary Structure Prediction Lecture 7 Structural Bioinformatics Dr. Avraham Samson
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Modelling genome structure and function Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Proteins Structure Predictions Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Computational Structure Prediction
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Protein dynamics Folding/unfolding dynamics
Protein Structure Prediction
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Rosetta: De Novo determination of protein structure
Homology Modeling.
Protein structure prediction.
Protein structure prediction
Presentation transcript:

Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC

Some historical landmarks 1 st generation – 70’s (~50-60% accuracy) single residue statistics, explicit rules Chou & Fasman 1974,GOR nd generation – 80’s (~60-70% accuracy) single residue statistics, nearest-neighbors, neural network (more with local interaction) GOR3 1987, Levin et al. 1986, Qian & Sejnowski 1988, Holly & Karplus, rd generation – 90’s (~78% accuracy) neural network with homologous sequence information PHD 1993, PSIPRED 1999, SSPRO 2000

Chou-Fasman method Straight statistical approach Conformational propensity e.g. helical propensity Extend the nucleation sites till a threshold Categorize each amino acid e.g. helix former, helix breaker, helix indifferent Find nucleation sites short sequence with high concentration of a category Handle overlaps

Conformational parameters What is the drawback of the method? Chou-Fasman method (Table from Krane and Raymer’s book)

Introduction to neural network A perceptron An analogy – apple and orange sorter Threshold unit – classify a vector of inputs Weight ! How to get it? Shape (X1) Texture (x2) Color (X3) Apple(+1)RoundHardred Orange(-1)RoundSoftyellow A self learning system – using a training data set

Basics in neural network (1 unit only) Problem about weight Do not fit examples exactly - minimize an error function Modify threshold unit a little bit Step function vs. continuous threshold function  (a)

Squared error function E(w) Minimize error E(w) - using gradient descent method Weight update in each step Learning rate  Basics in neural network (1 unit only)

Basic neural network in secondary structure prediction (Figure from Kneller et. al. JMB 1990) x1x2x3x4 w 11 w 12 w 13 w 14 y1y1 y2y2 y3y3 Activation a 1 = Output y 1 = E1E1 E2E2 E3E3 Error E 1 =

Multi-layer neural network Complete neural network - a set of continuous threshold units interconnected in a topology - output of some unit is input of other units x1x2x3x4 Input units (x) Output units (z) Hidden units (y)

PHD method (Rost B. & Sander C, JMB 1993) Use profile of multiple sequence alignment Multiple layers Accuracy >70%

Protein Folding Problem A protein folds into a unique 3D structure in physiological condition 3D structure is a key to understand function mechanism Rational drug design 3D structure prediction What is the protein folding problem?

Protein Folding Problem Hard? Sampling conformational space SS structures offer simplicity Side chain filling the space May not be random search Free energy (  G) = Interaction energy – Entropic energy Can it be done?

Protein Folding Problem Experimental finding Protein does not start folding from the end SS seem to fold early Hydrophobic aa in the core Hydrophilic aa on surface Energy function approximation Physics based (bond length, bond angle, pair interactions) Statistics based

Scope of the problem Majority of the newly solved protein structure share certain level of similarity with a known structure Certain families of proteins have no or few structures solved Human genes ~20k Structure genomics initiative

Protein structure prediction Comparative modeling >30% sequence identify Fold recognition – formally known as threading twilight zone <25% sequence identity Ab initio new fold

CASP Experimentally solved structure Predicted structure Compare and rank CASP – e.g. Skolnick (2003) Proteins: 53:p Ginalski (2003) Proteins: 53: p Zhang, Y. “Template-based modeling and free modeling by I-TASSER in CASP7 (pages 108– 117)” Proteins, 69, S8, P (2007).Template-based modeling and free modeling by I-TASSER in CASP7 (pages 108– 117)

Comparative Modeling Search for structures Select templates Align target sequence with structures Build model Evaluate model Sequence identity vs. structure overlap (Fig)

Comparative Modeling Search for structures: pair-wise sequence alignment with database multiple sequence alignment -> profile fold assignment / threading – use structure information in comparison Select template: sequence similarity, evolutionary relationship, environment, resolution Sequence alignment (target and template) standard method with tune

Ab Inito Prediction Challenge: Search space Energy function Reduction in search space use lattice use simplified amino acids use building blocks available in nature Energy function: physics statistics - empirical

Ab inito 3D Structure prediction Simons KT, Kooperberg C, Huang E, Baker D; J Mol Biol. (1997) 268, Schonbrun J, Wedemeyer W, Baker D; Current Opinion in Structure biology, (2002), 12: An example - ROSETTA ROSETTA narrow search - use local structure available statistical based energy function one of the top few ab initio methods in CASP4.

ROSETTA – segment matching Observations: Analysis of 9-a.a. segments in structure database distribution of the conformations of 9-mers Main idea of the method build segment conformational library (fragment library for 3mer and 9mer) put pieces together better (energy function and search space)

Model Building Assembly of rigid bodies dissecting structure into core, loops and side- chains Satisfy spatial constraints (Fig.) derive spatial constraints, find a structure that optimize all the constraints spatial constraints generated from input alignment; general spatial preferences found in known structures; molecular force field;