Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Protein structure prediction Scoring matrices workshop review Learning objectives-Understand the basis of secondary structure prediction programs. Become.
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Structure Prediction in 1D
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Similar Sequence Similar Function Charles Yan Spring 2006.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Protein Structure July 2, 2006 Learning objectives-Understand the basis of the secondary structure prediction program- Psi-PRED. Introduce the concept.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Protein structure prediction May 24, 2005 Return of Quiz#3 Writing assignments-please hand in. Learning objectives-Understand the basis of secondary structure.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein structure prediction
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Protein Secondary Structure Prediction: A New Improved Knowledge-Based Method Wen-Lian Hsu Institute of Information Science Academia Sinica, Taiwan.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Protein Secondary Structure Prediction
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
PROTEIN PHYSICS LECTURES 22-23
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein structure prediction June 27, 2003 Learning objectives-Understand the basis of secondary structure prediction programs. Become familiar with the.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
PROTEIN MODELLING Presented by Sadhana S.
Sequence Based Analysis Tutorial
Protein Structures.
Homology Modeling.
Protein structure prediction.
Protein structure prediction
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics

Psi-BLAST Predict Secondary Structure (PSIPRED) Three stages: 1) Generation of sequence profile 2) Prediction of initial secondary structure 3) Filtering of predicted structure

PSIPRED Uses multiple aligned sequences for prediction. Uses training set of folds with known structure. Uses a two-stage neural network to predict structure based on position specific scoring matrices generated by PSI-BLAST (Jones, 1999) First network converts a window of 15 aa’s into a raw score of h,e (sheet), c (coil) or terminus Second network filters the first output. For example, an output of hhhhehhhh might be converted to hhhhhhhhh. Can obtain a Q 3 value of 70-78% (may be the highest achievable)

Neural networks Computer neural networks are based on simulation of adaptive learning in networks of real neurons. Neurons connect to each other via synaptic junctions which are either stimulatory or inhibitory. Adaptive learning involves the formation or suppression of the right combinations of stimulatory and inhibitory synapses so that a set of inputs produce an appropriate output.

Neural Networks (cont. 1) The computer version of the neural network involves identification of a set of inputs - amino acids in the sequence, which transmit through a network of connections. At each layer, inputs are numerically weighted and the combined result passed to the next layer. Ultimately a final output, a decision, helix, sheet or coil, is produced.

Neural Networks (cont. 2) 90% of training set was used (known structures) 10% was used to evaluate the performance of the neural network during the training session.

Neural Networks (cont. 3) During the training phase, selected sets of proteins of known structure are scanned, and if the decisions are incorrect, the input weightings are adjusted by the software to produce the desired result. Training runs are repeated until the success rate is maximized. Careful selection of the training set is an important aspect of this technique. The set must contain as wide a range of different fold types as possible without duplications of structural types that may bias the decisions.

Neural Networks (cont. 4) An additional component of the PSIPRED procedures involves sequence alignment with similar proteins. The rationale is that some amino acids positions in a sequence contribute more to the final structure than others. (This has been demonstrated by systematic mutation experiments in which each consecutive position in a sequence is substituted by a spectrum of amino acids. Some positions are remarkably tolerant of substitution, while others have unique requirements.) To predict secondary structure accurately, one should place less weight on the tolerant positions, which clearly contribute little to the structure One must also put more weight on the intolerant positions.

15 groups of 21 units (1 unit for each aa plus one specifying the end) Row specifies aa position three outputs are helix, strand or coil Filtering network Provides info on tolerant or intolerant positions

Example of Output from PSIPRED

Workshop

3D structure data The largest 3D structure database is the Protein Database It contains over 33,000 records Each record contains 3D coordinates for macromolecules 80% of the records were obtained from X-ray diffraction studies, 15% from NMR and the rest from other methods and theoretical calculations

ATOM 1 N ARG A N ATOM 2 CA ARG A C ATOM 3 C ARG A C ATOM 4 O ARG A O ATOM 5 CB ARG A C ATOM 6 CG ARG A C ATOM 7 CD ARG A C ATOM 8 NE ARG A N ATOM 9 CZ ARG A C ATOM 10 NH1 ARG A N ATOM 11 NH2 ARG A N Part of a record from the PDB

Steps to tertiary structure prediction Comparative protein modeling Extrapolates new structure based on related family members Steps 1. Identification of modeling templates 2. Alignment 3. Model building

Identification of modeling templates One chooses a cutoff value from FastA or BLAST search (10 -5 ) Up to ten templates can be used but the one with the highest sequence similarity to the target sequence (lowest E-value) is the reference template C  atoms of the templates are selected for superimposition. This generates a structurally corrected multiple sequence alignment

Alignment “Common core” of target sequence is threaded onto the template structure using only alpha carbons

Framework construction

Building the model Framework construction Average the position of each atom in target, based on the corresponding atoms in template. Portions of the target sequence that do not match the template are constructed from a “spare part” algorithm. Each loop is defined by its length and C  atom coordinates of the four amino acids preceding and following the loop.

Building the model Completing the backbone-a library of PDB entries is consulted to add carbonyl groups and amino groups. The 3-D coordinates come from a separate library of pentapeptide backbone fragments. These backbone fragments are fitted onto the target C alpha carbons. The central tri-peptide is averaged from each backbone atom (N,C,C(O)). Side chains are added from a table of most probable rotamers that depend on backbone conformation. Model refinement-minimization of energy