EBI is an Outstation of the European Molecular Biology Laboratory. Validation & Structure Quality.

Slides:



Advertisements
Similar presentations
Progress Monitoring Short Response. Rubric for a score of 2 Indicates a thorough understanding of the scientific concept Completed the task correctly.
Advertisements

Protein x-ray crystallography
Introduction to protein x-ray crystallography. Electromagnetic waves E- electromagnetic field strength A- amplitude  - angular velocity - frequency.
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Proteins are made by linking amino acids Protein Structure Review and Refinement Introduction Brian Bahnson Dept of Chemistry & Biochemistry, University.
Computing for Bioinformatics Lecture 8: protein folding.
A PEPTIDE BOND PEPTIDE BOND Polypeptides are polymers of amino acid residues linked by peptide group Peptide group is planar in nature which limits.
Proteins: Levels of Protein Structure Conformation of Peptide Group
Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson
Critical and Scientific Thinking in Psychology chapter 1.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
Being a binding site: Characterizing Residue-Composition of Binding Sites on Proteins joint work with Zoltán Szabadka and Gábor Iván, Protein Information.
Bringing Structure to Biology: Small Molecules and the PDBe
Introduction to Macromolecular X-ray Crystallography Biochem 300 Borden Lacy Print and online resources: Introduction to Macromolecular X-ray Crystallography,
Protein Secondary Structure Lecture 2/19/2003. Three Dimensional Protein Structures Confirmation: Spatial arrangement of atoms that depend on bonds and.
Transmembrane proteins in the Protein Data Bank: identification and classification Gabor, E. Tusnady, Zsuzanna Dosztanyi and Istvan Simon Bioinformatics,
EMBL-EBI Adel Golovin MSDsite The project is funded by the European Commission as the TEMBLOR, contract-no. QLRI-CT under the RTD programme.
SMART Teams: Students Modeling A Research Topic Jmol Training 101!
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Applied common sense The why, what and how of validation (and what EM can learn of X-ray) Gerard J. Kleywegt Protein Data Bank in Europe EMBL-EBI, Cambridge,
EMBL-EBI the European Macromolecular Structure Database (EMSD).
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
Molecular visualization
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
1. Diffraction intensity 2. Patterson map Lecture
Patentability Considerations in the 3-D Structure Arts Patentability Considerations in the 3-D Structure Arts Michael P. Woodward Supervisory Patent Examiner.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
EBI is an Outstation of the European Molecular Biology Laboratory. Quaternary Structure.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Atomic structure model
X-ray crystallography – an overview (based on Bernie Brown’s talk, Dept. of Chemistry, WFU) Protein is crystallized (sometimes low-gravity atmosphere is.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Refinement is the process of adjusting an atomic model to:
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
SCHOlar – Live Online Advanced Higher Physics
PDBe Protein Interfaces, Surfaces and Assemblies
Protein Structure BL
PROTEIN MODELLING Presented by Sadhana S.
Computational Structure Prediction
Take a REST from manual searching: PDBe, programmatically
Getting the Most out of the PDBe
Reduce the need for human intervention in protein model building
Validation & Structure Quality
Nobel Laureates of X Ray Crystallography
Goals for Today Introduce automated refinement and validation.
Ligand Binding to the Voltage-Gated Kv1
Presentation transcript:

EBI is an Outstation of the European Molecular Biology Laboratory. Validation & Structure Quality

Ground rules for Bioinformatics  Don't always believe what programs tell you they're often misleading & sometimes wrong!  Don't always believe what databases tell you they're often misleading & sometimes wrong!  Don't always believe what lecturers tell you they're often misleading & sometimes wrong!  In short, don't be a naive user  when computers are applied to biology, it is vital to understand the difference between mathematical & biological significance  computers don’t do biology - they do sums quickly! 2

Validation 1: the act of validating; finding or testing the truth of something 2: the cognitive process of establishing a valid proof Assessing the quality of a model is called validation. Validation is something that needs to be done both by producers (crystallographers, NMR spectroscopists, electron microscopists, etc.) and users (biologists, enzymologists, medicinal chemists, etc.) of models. 3

Some Truths Never trust a structure at face value. Any structure is only as good as the experimental data which goes into its determination. Just because it is published in Nature/Cell/Science does not mean the structure is not without flaws. 4

Errors in Structures Completely wrong Wrong trace, incorrect fold of protein Register errors, where trace of protein is not in keeping with sequence order. Partial errors Incorrectly built loops. Wrong residues built into the structure (i.e., Proline instead of Aspartic acid). Bad data quality Bad geometry and stereochemistry. Incorrect positioning of ligands etc due to lack of experimental evidence. FRAUD !! 5

Some Quality Indicators Some data quality indicators for structures are 1.Ramachandran Plot 2.Geometry and Stereochemistry 3.R-factor/FreeR-factor (Structures from X-ray crystallography) 4.Correlation between experimental data and structure 5.Resolution of the data upon which the structure is based (Structures from X-ray crystallography) 6

Ramachandran Plot A graph between the dihedral angles of an amino acid in a protein. Due to steric hindrance from amino acid side chains, only certain angles are allowed in a folded protein. A plot between the dihedral angles of individual amino acids in a protein can serve to indicate how well the structure has been determined. Any deviations from the allowed values are called Outliers and usually indicate bad geometry 7

Ramachandran Plot 8 A real life example. All non-glycine residues are in allowed regions. Standard Plot showing where different secondary structures fit into the plot.

Validation Ideally, there should be no outliers in the Ramachandran plot, except for Glycine and Proline, which are “special” amino acids. However, there may be some rational explanation for outliers by the scientist depositing the structure. (Always refer to the publication!). Expect to find more than 85-90% of residues to fall into the red regions. 9

Geometry and Stereochemistry This is supposed to be Phenylalanine and should look like: 10 BUT….

Geometry and Stereochemistry This is supposed to be a sugar and should look like: 11 BUT….

12 Geometry and Stereochemistry This is supposed to be another sugar (sucrose) and should look like: BUT….

13 Geometry and Stereochemistry Always look at the structure in graphical viewers. Look at the geometry section in PDB files (REMARK 500). Use tools like PDBeAnalysis, PDBSum to analyze structures.

14 R-Factor/Correlation R-factor is a measure of the agreement between the crystallographic model and the experimental X-ray diffraction data. Free R-factor is calculated between the structure and a certain subset of the data excluded from the structure calculation process. In a good structure, the difference between R-factor and Free R-factor (  R) should be less than 5%. Correlation calculates the overall correlation between the structure and the data available. Good structure should have overall correlation in excess of 90%. Look at the R-factors on the Atlas Pages in the tutorials !!! See PDBe atlas pages for experimental correlations in crystal structures

15 Resolution Resolution is a indicator of the level of detail available in the data used for determining structures in X-ray crystallography. Higher resolution (lower number) means that there is more detail available. Low resolution: <3.0A Medium resolution: A High Resolution: 1.0 – 1.8A Atomic Resolution: >1.0A Not all parts of the structure are at the same resolution…

16 So what do you look for… Higher resolution structures where more than one available Good geometry and stereochemistry (Look at the Ramachandran plot) Lower R-factor and  R (FreeR-factor – Rfactor) High correlation coefficient between experimental data and structure. Complete structures (pay attention to the Sequence and how much of it is represented in the structure), with no sequence conflicts. Structures with ligands bound may be more useful for analysis than apo-form structures. Note: These are general guidelines which may help you choose the best structure for your analysis where more than one structure for the same protein is available.

17 General Evaluation Criteria Be sceptical and cynical! When you are searching for information you need to judge its quality and suitability. Think critically about each piece of information you find and how you found it. Relevance:  Does the information you have found adequately support your research?  Does it answer the question, or support one of your arguments?  How general or specific is the information about the topic?

18 Some programs for Structure Validation: Procheck WHATCHECK: JCSG Validation: PDBeAnalysis: Validation

19 Wrong Structures !! PDB entry 1PHYPDB entry 2PHY

20 Wrong Structures PDB entry 1PTEPDB entry 3PTE

21 “were incorrect in both the hand of the structure and the topology. Thus, the biological interpretations based on the inverted models for MsbA are invalid.” 1PF4

22 “However, because of the lack of clear and continuous electron density for the peptide in the complex structure, the paper is being retracted.” 1F83

23 After a thorough examination of the available data, which included a re-analysis of each structure alleged to have been fabricated, the committee found a preponderance of evidence that structures 1BEF, 1CMW, 1DF9/2QID, 1G40, 1G44, 1L6L, 2OU1, 1RID, 1Y8E, 2A01, and 2HR0 were more likely than not falsified and/or fabricated and recommended that they be removed from the public record. The former employee was H.M. Krishna Murthy, who was found by the Investigation Committee to be solely responsible for the fraudulent data. UAB Researcher involved in fraud ! The coordinates for 2HR0 do not form a connected network of molecules in the crystal lattice.

What do you get as a Structural Biologist? The best chances of winning a Nobel Prize (1946, 1962, 1962, 1972, 1982, 1988, 1991, 1997, 2002, 2003, 2006, 2009), all in X-ray crystallography except Islands in the Antarctic: Perutz Glacier 67° 37' S, 66° 25W. Bragg Islands 66° 28' S, 66° 27' W. Shull Rocks 66° 27' S, 66° 40' W. Pauling Islands Bernal Islands 24