PREDICTING PROTEIN STRUCTURE AND BEYOND …. P. V. Balaji Biotechnology Center I.I.T., Bombay.

Slides:



Advertisements
Similar presentations
Protein Structure and Physics. What I will talk about today… -Outline protein synthesis and explain the basic steps involved. -Go over the Chemistry of.
Advertisements

Understanding Biology through StructuresCourse Work 2009 Proteins Structures: Introduction and General Overview.
Chemotaxis Pathway How can physics help? Davi Ortega.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Catalytic Strategies. Basic Catalytic Principles What is meant by the binding energy as it relates to enzyme substrate interactions? –free energy released.
Biochemistry 301 Overview of Structural Biology Techniques Jan. 19, 2004.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Introduction to Molecular Biology zMolecular biology is interdisciplinary (biochemistry, genetics, cell biology) zImpact of genome projects (human, bacteria,
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Computer-Assisted Drug Design (1) i)Random Screening ii)Lead Development and Optimization using Multivariate Statistical Analyses. iii)Lead Generation.
CSE 6406: Bioinformatics Algorithms. Course Outline
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Lecture 10 – protein structure prediction. A protein sequence.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
PROTEINS PROTEINS Levels of Protein Structure.
-A cell is an organization of millions of molecules -Proper communication between these molecules is essential to the normal functioning of the cell -To.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Structural proteomics
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Central dogma: the story of life RNA DNA Protein.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structural proteomics Handouts. Proteomics section from book already assigned.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Motif Search and RNA Structure Prediction Lesson 9.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
CHM 708: MEDICINAL CHEMISTRY
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Protein Structure Prediction and Protein Homology modeling
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Predicting Active Site Residue Annotations in the Pfam Database
There are four levels of structure in proteins
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Protein structure prediction.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

PREDICTING PROTEIN STRUCTURE AND BEYOND …. P. V. Balaji Biotechnology Center I.I.T., Bombay

Organization of the talk 1. Why predict the structure? 2. Methods for structure prediction 3. What next?

Genome Size is not Proportional to the Complexity of the Organism Size of the Genome Complexity

Molecular Logic of Life is Same Biochemically, all things living – animals, plants, bacteria, viruses, etc. – are remarkably similar English  26-Letter alphabet  Only one grammar  Extremely diverse literature Genome  4-Letter alphabet  Only one grammar  Extremely diverse organisms

Genome Sequencing and Analysis: One of the Key Steps in Deciphering the Logic of Life Even minute details have to be analyzed Hang him Hang him, not let him go Hang him not Hang him not, let him go Ac Humans: NeuNAc Gc Chimpanzees: NeuNGc –CH 3 –CH 2 OH

Innovations in Technology Have Made Genome Sequencing a Routine Affair Genome sequencing Completed: ~70 organisms In the pipeline: Several more “ … it is unlikely that the base sequence of more than a few percent of such a complex DNA will ever be determined …” C W Schmid & W R Jelinek, Science, June 1982

One Aspect of Genome Sequence Analysis is to Assign Functions to Proteins (Reverse Genetics) Proteins are workhorses of the cell Are involved in every aspect of living systems

Function of a Protein can be Defined at Different Levels Example: Lysozyme Biochemical level: Hydrolyzes C—O bond Physiological level: Breaks down the cell wall Cellular level: Defense against infection Different Analysis Tools Provide Functions at Different Levels

Hallmark of Proteins: Specificity Know exactly which small molecule (ligand) they should bind to or interact with Also know which part of a macromolecule they should bind to

Origin of Specificity 1ruv.pdb Function is critically dependent on structure

Structure Structure – Key to Dissect Function Interaction Interfaces Crystal Packing Functional Oligomerization Location of Mutants Conserved Residues SNPs Evolutionary Relationships Fold Relative Juxtaposition Catalytic Clusters Motifs Catalytic Mechanism Clefts (active sites) Antigenic Sites, surface patches Surface Shape & Charge Dynamics (breathing)

Christian B. Anfinsen: Nobel Prize in Chemistry (1972) 1 KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHES LADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTT QANKHIIVACEGNPYVPVHFDASV 124 Sequence Determines Structure 1ruv.pdb

Sequence Structure Function How Does Sequence Specify Structure? Structure has to be determined experimentally The Protein Folding Problem (second half of the genetic code) ? Functional Genomics

Experimental Methods of Structure Determination Provides a static picture X-ray crystallography Obtaining crystals that diffract Solubilization of the over-expressed protein Nuclear Magnetic Resonance spectroscopy Provides a Dynamic picture Size-limit is a major factor Solubilization of the over-expressed protein

Annotated proteins in the databank: ~ 100,000 Limitations of Experimental Methods: Consequences Proteins with known structure: ~5,000 ! Total number including ORFs: ~ 700,000 ORF, or Open Reading Frame, is a region of genome that codes for a protein Have been identified by whole genome sequencing efforts ORFs with no known function are termed orphan Dataset for analysis

Structural Biology Consortia: Brute Force Approach Towards Structure Elucidation Employ battalions of Ph.Ds & Post-doctorals Aim to solve about 400 structures a year Large-scale expression & crystallization attempts + – Basic strategies remain the same No (known) new tricks * Enhances the statistical base for inferring sequence – structure relationships “Unrelenting” ones will be ignored

? KQFTKCELSQNLYDIDGYGRIALPELICTMF HTSGYDTQAIVENDESTEYGLFQISNALWCK SSQSPQSRNICDITCDKFLDDDITDDIMCAK KILDIKGIDYWIAHKALCTEKLEQWLCEKE Predicting Protein Structure: 1. Comparative Modeling (formerly, homology modeling) Use as template & model 8lyz 1alc KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAK FESNFNTQATNRNTDGSTDYGILQINSRWWCND GRTPGSRNLCNIPCSALLSSDITASVNCAKKIV SDGNGMNAWVAWRNRCKGTDVQAWIRGCRL Share Similar Sequence Homologous

Structure is much more conserved than sequence during evolution Comparative Modeling Basis* Higher the similarity, higher is the confidence in the modeled structure * Limited applicability A large number of proteins and ORFs have no similarity to proteins with known structure *

Predicting Protein Structure: Alternative Methods Threading or Fold Recognition Both these methods depend heavily on the analysis of known protein structures* Ab initio In addition, establishing sequence  structure relationship is also important * Input from people trained in statistics, pattern recognition and related areas of computer science is very critical *

Statistical Analysis of Protein Structures: Microenvironment Characterization Atom based properties Residue based properties Chemical group Secondary structure Other properties Type, Hydrophobicity, Charge Type, Hydrophobicity Hydroxyl, Amide, Carbonyl, etc.  -Helix,  -Strand, Turn, Loop VDW volume, B-factor, Mobility, Solvent accessibility Describe structures at multiple levels of detail using a comprehensive set of properties

Predicting Protein Structure: 2. Threading or Fold Recognition Basis It is estimated there are only around 1000 to stable folds in nature* Irrespective of the amino acid sequence, a protein has to adopt one of these folds* Fold recognition is essentially finding the best fit of a sequence to a set of candidate folds * Select the best sequence-fold alignment using a fitness scoring function* NP-complete problem*

Fold of a Protein Refers to the spatial arrangement of its secondary structural elements (  -helices and  -strands) 1l45.pdb4bcl.pdb1mbl.pdb  /  -barrel  -barrel  /  -sandwich

Threading: Basic Strategy Sequence Template Spatial Interactions dhgakdflsdfjaslfkjsdlfjsdfjasd Library of folds Query Scoring & selection

Predicting Protein Structure: 3. Ab Initio Methods Sequence Secondary structure Prediction Tertiary structure Low energy structures Predicted structure Energy Minimization Validation Mean field potentials

Predicting the structure of such proteins is an entirely different challenge 1a6g.pdb Small molecules and/or metal ions are an integral part of certain proteins

Proof of the Pudding: CASP Meetings Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction – 4 Predictions; not Post-dictions Easy and medium targets: ~100% success Hard targets: ~50% success Significant increase from CASP3

OK, I can predict the structure correctly! is that it? Strict structure – function correlation exists only for a subset of proteins Some folds (ferredoxin, TIM barrel, …) are very popular – several protein families, with diverse functions, adopt these folds Well, no!! Detailed biochemical characterization is required Despite high similarity in sequence and structure, may act on different substrates (hence different functions) – due to subtle changes in active site (  1  3-GalT and  1  3-GlcNAcT)

Similar structure, mutually exclusive function: Lysozyme &  -lactalbumin Inferring Function from Structure: Caveats Same function, completely different structures: Carbonic anhydrases from M. thermophila and mouse 8lyz.pdb, 1alc.pdb 1thj.pdb 1dmx.pdb “Moonlighting” proteins – one structure(?), multiple functions Glyceraldehyde 3-phosphate dehydrogenase Glycolysis Binding protein for plasmin, fibronectin and lysozyme Transcriptional control of gene expression, DNA replication and repair Flocculation Gal1p – Kinase as well as regulator of Gal-gene expression Gal3p – 70% similar; does not have kinase activity

Same fold, different oligomerization DimerizationTetramerization ConA PNA PNA, GSIV

Ligand Induced Conformational Changes are Quite Common Binding of first substrate redefines the active site and creates the binding pocket for the second substrate and the metal ion Flexible loop Before After

Take Home Message Predicting Protein Structure is a key component of genome sequence analysis Structure is a very important link in deciphering the function New tools are required? Or larger training dataset is required?

Acknowledgement Organizers for giving me this opportunity Sujatha and Jayadeva Bhat for helping me put together this talk Few Useful Links