Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein structure determination. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography,
An Introduction to Bioinformatics Protein Structure Prediction.
Biological inspiration Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to.
Protein structure (Part 2 of 2).
Garnier-Osguthorpe-Robson
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Structure Prediction in 1D
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Protein Structural Prediction. Protein Structure is Hierarchical.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Macromolecular structure
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Representations of Molecular Structure: Bonds Only.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Secondary Structure Prediction
Secondary structure prediction
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Chapter 13 Protein structure Bioinformatics and Functional Genomics
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Protein Structure.
Introduction to Bioinformatics II
Protein Structure Prediction
Protein Structure Prediction
Protein Structures.
Protein structure prediction.
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Structure Prediction

Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2] Comparative modeling (based on homology) [3] Ab initio (de novo) prediction (Dr. Ingo Ruczinski at JHSPH)

Experimental approaches to protein structure [1] X-ray crystallography -- Used to determine 80% of structures -- Requires high protein concentration -- Requires crystals -- Able to trace amino acid side chains -- Earliest structure solved was myoglobin [2] NMR -- Magnetic field applied to proteins in solution -- Largest structures: 350 amino acids (40 kD) -- Does not require crystallization

Steps in obtaining a protein structure Target selection Obtain, characterize protein Determine, refine, model the structure Deposit in database

X-ray crystallography Sperm Whale Myoglobin

PDB April 08, 2008 – 50,000 proteins, 25 new experimentally determined structures each day New folds Old folds New PDB structures

Example 1wey

Ab initio protein prediction Starts with an attempt to derive secondary structure from the amino acid sequence – Predicting the likelihood that a subsequence will fold into an alpha- helix, beta-sheet, or coil, using physicochemical parameters or HMMs and ANNs – Able to accurately predict 3/4 of all local structures

Structure Characteristics

Beta Sheets

Ab Inito Prediction

Secondary structure prediction Chou and Fasman (1974) developed an algorithm based on the frequencies of amino acids found in  helices,  -sheets, and turns. Proline: occurs at turns, but not in  helices. GOR (Garnier, Osguthorpe, Robson): related algorithm Modern algorithms: use multiple sequence alignments and achieve higher success rate (about 70-75%) Page

Table

Frequency Domain

Neural Networks

Training the Network Use PDB entries with validated secondary structures Measures of accuracy – Q 3 Score percentage of protein correctly predicted (trains to predicting the most abundant structure) – You get 50% if you just predict everything to be a coil – Most methods get around 60% with this metric

Correlation Coeficient How correlated are the predictions for coils, helix and Beta-sheets to the real structures This ignores what we really want to get to – If the real structure has 3 coils, do we predict 3 coils? Segment overlap score (Sov) gives credit to how protein like the structure is, but it is correlated with Q 3

Artificial Neural Network Predicts Structure at this point

Danger You may train the network on your training set, but it may not generalize to other data Perhaps we should train several ANNs and then let them vote on the structure

Profile network from HeiDelberg family (alignment is used as input) instead of just the new sequence On the first level, a window of length 13 around the residue is used The window slides down the sequence, making a prediction for each residue The input includes the frequency of amino acids occurring in each position in the multiple alignment (In the example, there are 5 sequences in the multiple alignment) The second level takes these predictions from neural networks that are centered on neighboring proteins The third level does a jury selection

PHD Predicts 4 Predicts 6 Predicts 5

Fold recognition (structural profiles) Attempts to find the best fit of a raw polypeptide sequence onto a library of known protein folds A prediction of the secondary structure of the unknown is made and compared with the secondary structure of each member of the library of folds

Threading Takes the fold recognition process a step further: – Empirical-energy functions for residue pair interactions are used to mount the unknown onto the putative backbone in the best possible manner

Fold recognition by threading Query sequence Compatibility scores Fold 1 Fold 2 Fold 3 Fold N

CASP cgi

SCOP SCOP: Structural Classification of Proteins.

CATH CATH: Protein Structure Classification Class (C), Architecture (A), Topology (T) and Homologous superfamily (H) Class (C), Architecture (A), Topology (T) and Homologous superfamily (H)