Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein threading algorithms 1.GenTHREADER Jones, D. T. JMB(1999) 287, Protein Fold Recognition by Prediction-based Threading Rost, B., Schneider,
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Rising accuracy of protein secondary structure prediction Burkhard Rost
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
. Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.
Protein Structure Prediction Graham Wood Charlotte Deane.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Construction of Substitution matrices
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Computational Structure Prediction
Protein Structure Prediction and Protein Homology modeling
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Homology Modeling.
Protein structure prediction.
Basic Local Alignment Search Tool
Protein structure prediction
Presentation transcript:

Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant templates of structures:

How can we match a sequence and a structure? MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE Sequence: Similar Sequences take this structure (but remember – sequence is less preserved than structure…) Solvation: which AAs are buried? trp (W): trp (W): probably not here! Pair-Interaction: How well do AAs get along (Positive hate positive? Maybe not…?) MVNGLILNGKTK AEKVFQYANDNGVDGEWTYTE more: 2nd structures prediction. 2 nd structures constraints (β-strands forming β -sheets…) etc.

“An Efficient and Reliable Protein Fold Recognition Method for Genomic Sequences” David T. Jones (1999) “What a good presentation!” B. Raveh (2003)

For each template (in the Brookhaven PDB): Construct a profile sequence Align with query sequence Calculate structural parameters (“to be continued…”) send parameters to a well-trained NEURON NETWORK (like PSIPred…) OUTPUT: match confidence & alignment GenTHREADER overview: Query sequence MTYKLILNGKTKGETTTEAVDAAT AEKVFQYANDNGVDGEWTYTE Templates

STAGE 1: Building a profile for each template 1.Start with sequence of template peptide: “ MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFK QYANDNGVDGVWTYDDATKTFTVTC” 2.Run BLASTP on OWL non-redundant protein sequence data bank, with sequence as input. 3.Take all sequences with E-Value < Align using MULTAL – multiple sequence alignment method. 5.Construct a sequence profile based on BLOSUM 50 matrix.

STAGE 2: Align sequence with a profile MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE SCORE = ? Length of alignment itself = ?Length of template profile = ? Length of query sequence = ?

STAGE 3: calculate (some) structural parameters In stage 2, the sequence was aligned to a profile of the structure. The aligned sequence is now imposed on the 3D structure of the template, and used for ENERGY POTENTIALS calculation.

STAGE 3: structural parameters (cont.): an energy potential for the probability of the interactions observed in this structure. Distance and sequence separation between certain atoms of two different amino-acids are measured (C β – C β, C β - N, C β – O, etc.) Statistics of known structures were gathered and weighted. The observed interactions are compared to the statistics An energy potential is calculated In essence: the smaller E-Pair, the better. E-Pair (pair interaction potential) aa 39 aa 157

STAGE 3: structural parameters (cont.): Degree of burial (DOB) for an amino acid: “the number of other C β atoms located within 10Å of the residue’s C β atom” In general, hydrophobic amino acids like to be buried, safely away from water. Hydrophilic acids might like the outside world better. Each amino acid DOB is calculated. It’s compared to statistical occurrence. ΔE solv (AA,r) = -RT ln( f(AA,r) / f(r) ) E-Solv (solvation potential) CβCβ 10Å CβCβ CβCβ CβCβ CβCβ CβCβ

STAGE 4: send it all to the (trained) Neuron Network Ouput is a score between 0-1 – translated to confidence level (Low, Medium, High & Certain)

See this page on the web

Who trains the Neural network? Representatives were taken for different fold types in CATH (“T-Level”). CAT numbers were used for comparing pairs chain pairs 383 pairs shared a common domain fold (= should give a positive answer) The network was trained with these pairs.

Neural network – black box?

Confidence assignment MEDIUM HIGH LOW CERTAIN

GenTHREADER – what to do with it? Results on a ‘classic’ test set of 68 proteins: High true-positive rate: 73.5% correctly recognized, 48.5% with CERTAIN. Extremely reliable: Every “CERTAIN” prediction was correct. Fast automatic method. For 22 of 68 proteins, alignment is over 50% accurate. Let’s go analyze the Mycoplasma Genitalium with it!

Mycoplasme Genitalium genome analysis – ONE DAY ONLY! Whole Genome Analysis with GenTHREADER

ORF MG276 of mycoplasma gen.: spotting a remote homologue MG276 is an “Adenine Phospho-ribosyl-transferase” (but this information is not given to GenTHREADER) 1HGX is a template of other Phospho-ribosyl-transferase. It has only 10% sequence identity with our MG276! It was found by GenTHREADER as a certain match E-Pair saved the situation! But how do we know it’s true? 1HGX template

Ligand binding site of 1HGX template Substrate

ORF MG276 of mycoplasma gen.: supporting evidence for 1HGX as a template 1.Substrate binding sites preserved 2.Secondary structure prediction of MG276 is similar 3.We cheated all along…

ORF MG353 of mycoplasma gen.: an ORF with no known function MG353 – no homologues found in databases 1HUE is a template of an “Histone-like” protein Very low sequence similarity with our MG353. It was found by GenTHREADER as a certain match Striking similarity in DNA Binding region despite overall low sequence similarity

GenTHREADER improvements: (McGuffin, Jones - may 2003) PSI-BLAST, PSI-PRED (2 nd stuructures), some more… Some Results:

AB-INITIO FOLDING - ROSETTA (Simons et al 1997, 1999, Bystroff & Baker 1998, Bonneau et al 2001) Prediction of a protein fold from scratch? Method I: physically simulate protein folding Problem: CPU time Practical for short peptides Method II: check probability for all possible conformations Problem: infinite search space Solution: use mother nature – decrease search space APKFFRGGNWKMNGKRSLG ELIHTLGDAKLSADTEVVCGI APSITEKVVFQETKAIADNKD WSKVEVHESRIYGGSVTNCK ELASQHDVDGFLVGGASLKP VDGFLHALAEGLGVDINAKH

Decreasing the search space using elements from short peptides: Take fragments of short peptides (3 residues – 9 residues long). Join them together Keep the 2 nd structures constant. “Play” with the angles of loop residues. RESULT: 200,000 decoy structures

In addition - I-Sites prediction 13 local-structure 3D motifs with sequence profiles: Strong independence of motifs (fold-initiation sites?)Strong independence of motifs (fold-initiation sites?) complements secondary structurecomplements secondary structure

Find the correct fold for a given sequence (back to threading…) P(structure) – sequence independant 2nd structure packing Strand hydrogen bonding Strand assembly in sheets Structure compactness Frequency of I-Sites 3D motifs Etc. P(sequence | structure): Solvation 2 nd structure – amino acid (proline in helix, etc.) Pair Interaction I–Sites prediction for this sequence(3D motifs) – did not contribute to performance Etc.

RESULTS in CASP 4 – Baker’s a winner… native structures vs. predicted models