Visualisation/prediction 3D structures. Recognition ability is the basis of biological function 3D struture is key for recognition.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Prediction of protein structure
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein secondary structure prediction methods TDVEAAVNSLVNLYLQASYLS “From sequence to structure”
An Introduction to Bioinformatics Protein Structure Prediction.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Structure Analysis - II
Course Summary June 2, 2005 Programming Workshop Overview of course (presentation) Protein modeling, part 2 Instructor evaluations.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
COMPARATIVE or HOMOLOGY MODELING
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Secondary Structure Prediction Protein Analysis Workshop 2008 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Prediction of protein structure
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Structure prediction: Homology modeling
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Proteins Secondary Structure Predictions
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Structural Bioinformatics
Proteins Secondary Structure Predictions
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Sequence Based Analysis Tutorial
Protein Structure Prediction
Sequence Based Analysis Tutorial
Protein Structures.
Protein structure prediction.
Presentation transcript:

Visualisation/prediction 3D structures

Recognition ability is the basis of biological function 3D struture is key for recognition

Objectives  Visualize / understand 3D structures and their interactions Derive structure-function relationships  Predict 3D structure

Total entries

Protein folds

Structure prediction

aim  Structure prediction tries to build models of 3D structures of proteins that could be useful for understanding structure-function relationships.

The protein folding problem  The information for 3D structures is coded in the protein sequence  Proteins fold in their native structure in seconds  Native structures are both thermodynamically stables and kinetically available

AVVTW...GTTWVR ab-initio prediction  Prediction from sequence using first principles

Ab-initio prediction  “In theory”, we should be able to build native structures from first principles using sequence information and molecular dynamics simulations: “Ab-initio prediction of structure” Simulaciones de 1 s de “folding” de una proteína modelo (Duan-Kollman: Science, 277, 1793, 1998). Simulaciones de folding reversible de péptidos ( ns) (Daura et al., Angew. Chem., 38, 236, 1999). Simulaciones distribuidas de folding de Villin (36- residues) (Zagrovic et al., JMB, 323, 927, 2002).

... the bad news...  It is not possible to span simulations to the “seconds” range  Simulations are limited to small systems and fast folding/unfolding events in known structures steered dynamics biased molecular dynamics  Simplified systems

Some protein from E.coli predicted at 7.6 Å (CASP3, H.Scheraga) Results from ab-initio  Average error 5 Å - 10 Å  Function cannot be predicted  Long simulations

comparative modelling  The most efficient way to predict protein structure is to compare with known 3D structures

Basic concept  In a given protein 3D structure is a more conserved characteristic than sequence Some aminoacids are “equivalent” to each other Evolutionary pressure allows only aminoacids substitutions that keep 3D structure largely unaltered  Two proteins of “similar” sequences must have the “same” 3D structure

Possible scenarios 1. Homology can be recognized using sequence comparison tools or protein family databases (blast, clustal, pfam,...). Structural and functional predictions are feasible 2. Homology exist but cannot be recognized easily (psi- blast, threading) Low resolution fold predictions are possible. No functional information. 3. No homology 1D predictions. Sequence motifs. Limited functional prediction. Ab-initio prediction

fold prediction

3D struc. prediction

1D prediction  Prediction is based on averaging aminoacid properties AGGCFHIKLAAGIHLLVILVVKLGFSTRDEEASS Average over a window

1D prediction. Properties  Secondary structure propensitites  Hydrophobicity  Accesibility ...

Propensities Chou-Fasman Biochemistry 17,   turn

Some programs ( BCM PSSP - Baylor College of Medicine Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction GOR I (Garnier et al, 1978) [At PBIL or at SBDS] GOR II (Gibrat et al, 1987) GOR IV (Garnier et al, 1996) HNN - Hierarchical Neural Network method (Guermeur, 1997) Jpred - A consensus method for protein secondary structure prediction at University of Dundee nnPredict - University of California at San Francisco (UCSF) PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia University PSA - BioMolecular Engineering Research Center (BMERC) / Boston PSIpred - Various protein structure prediction methods at Brunel University SOPM (Geourjon and Deléage, 1994) SOPMA (Geourjon and Deléage, 1995) AGADIR - An algorithm to predict the helical content of peptides

1D Prediction  Original methods: 1 sequence and uniform parameters (25-30%)  Original improvements: Parameters specific from protein classes  Present methods use sequence profiles obtained from multiple alignments and neural networks to extract parameters (70-75%, 98% for transmembrane helix)

PredictProtein (PHD) 1. Building of a multiple alignment using Swissprot, prosite, and domain databases 2. 1D prediction from the generated profile using neural networks 3. Fold recognition 4. Confidence evaluation

PredictProtein Available information  Signal peptides SignalP  O-glycosilation NetOglyc  Chloroplast import signal CloroP  Consensus secondary struc. JPRED  Transmembrane TMHMM, TOPPRED  SwissModel

Methods for remote homology  Homology can be recognized using PSI-Blast  Fold prediction is possible using threading methods  Acurate 3D prediction is not possible: No structure-function relationship can be inferred from models

Threading  Unknown sequence is “folded” in a number of known structures  Scoring functions evaluate the fitting between sequence and structure according to statistical functions and sequence comparison

ATTWV....PRKSCT > SELECTED HIT

ATTWV....PRKSCT Sequence HHHHH....CCBBBB Pred. Sec. Struc. eeebb....eeebeb Pred. accesibility Sequence GGTV....ATTW ATTVL....FFRK Obs SS BBBB....CCHH HHHB.....CBCB Obs Acc. EEBE.....BBEB BBEBB....EBBE

Technical aspects  Alignment: Dynamic programming (Needleman & Wunsch, 1970)  Scoring Function: w seq.P seq + w str. (P SS + P AC ) P seq : Dayhoff matrix, P SS y P AC : probability model on pred. SS and AC P seq : Dayhoff matrix, P SS y P AC : probability model on pred. SS and AC

Threading accurancy

3D-PSSM Steps  Building of 1D/superfamily profile  Building of 3D/superfamily profile  Determine/predict secondary structure and accesibility  Best score from 1. Structure vs. query PSSM 2. Query vs. 1D-PSSM structures 3. Query vs. 3D-PSSM structures

Comparative modelling  Good for homology >30%  Accurancy is very high for homology > 60%

Remainder  The model must be USEFUL Only the “interesting” regions of the protein need to be modelled

Expected accurancy  Strongly dependent on the quality of the sequence alignment  Strongly dependent on the identity with “template” structures. Very good structures if identity > 60-70%.  Quality of the model is better in the backbone than side chains  Quality of the model is better in conserved regions

Steps 1. Alignment of template structures 2. Alignment of unknown sequence against template alignment 3. Build structure of conserved regions (SCR) 4. Build of unconserved regions (“loops” usually)

Optimization 1. Optimize side chain conformation 1. Energy minimization restricted to standard conformers and VdW energy 2. Optimize everything Global energy minimization with restrains Molecular dynamics

Quality test  No energy differences between a correct or wrong model  The structure must by “chemically correct” to use it in quantitative predictions

Prediction software SwissModel (automatic)  SwissModel Repository  3D-JIGSAW (M.Stenberg)  Modeller (A.Sali)  MODBASE (A. Sali)  cgi/index.cgi

Resultspdbv

Final test  The model must justify experimental data (i.e. differences between unknown sequence and templates) and be useful to understand function.