Dictionaries and Ontologies in Structural Biology.

Slides:



Advertisements
Similar presentations
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Advertisements

SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
1.
Data Representation, Data Integration and API Delivery of PDB Data John Westbrook RCSB/PDB Rutgers University.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
Update on PDB Data Deposition Specifications
Nucleic Acid Database By Pooja Awatramani. Database Utilities Provides structural references in the form of base pair annotation for DNA, RNA, and some.
Protein Structure, Databases and Structural Alignment
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Computing for Bioinformatics Lecture 8: protein folding.
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
PHAR 201 Lecture 4, Data Representation and the Role of Ontologies PHAR 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Prerequisite.
Pharm201 Lecture Data Representation Pharm 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Prerequisite Reading: Structural.
Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Protein Interfaces, Surfaces and Assemblies
Enabling Rapid Interaction with the Protein Data Bank Alexy Khrabrov Rutgers University John D. Westbrook Rutgers University.
Visualization of Biological Macromolecules Shuchismita Dutta, Ph.D.
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
MODELLER hands-on Ben Webb, Sali Lab, UC San Francisco Maya Topf, Birkbeck College, London.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Coordinate handling and exploitation An overview of coordinate functionality in CCP4 suite Coordinate functionality in REFMAC group of programs (A. Vaguine)
X-ray crystallography NMR cryoEM Experimental approaches for structural biology.
WMO BUFR &CREX Gil Ross, UK Met Office
SMART Teams: Students Modeling A Research Topic Jmol Training 101!
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes MSD Protein.
Worldwide Protein Data Bank Worldwide Protein Data Bank History of the PDB  1970s  Community discussions about how to establish.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Data Integration and Management A PDB Perspective.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Structure database: PDB Tuomas Hätinen. Protein Data Bank A repository for 3-D biological macromolecular structure. It includes proteins, nucleic acids.
Biochemistry - as science; biomolecules; metabolic ways. Structure of proteins, methods of its determination.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
EBI is an Outstation of the European Molecular Biology Laboratory. Quaternary Structure.
Pharm201 Lecture Data Representation Pharm 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Prerequisite Reading: Structural.
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall & Kendall 8.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Mining the Biomedical Research Literature Ken Baclawski.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Topic 1 Roland Dunbrack. Modeling of Biological Units Model data files of single proteins may require –sequence alignment(s) to templates (entry and chain)
HELM notation Tianhong Zhang – Technical Lead. © Pistoia Alliance Hierarchical Editing Language for Macromolecules 2 Macromolecules non small molecules,
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Fall OSI Protein Modeling Challenge December 2010.
PDBe Protein Interfaces, Surfaces and Assemblies
Introduction to RCSB PDB Data, Tools and Resources
Data Representation, Data Integration and API Delivery of PDB Data
Enabling Rapid Interaction with the Protein Data Bank
From: Structural database resources for biological macromolecules
Number of released entries
Chapter 2 Database Environment Pearson Education © 2009.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Chapter 2 Database Environment Pearson Education © 2009.
Crystal structure description
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

Dictionaries and Ontologies in Structural Biology

Scope of Ontology PDB Exchange Dictionary Meta Data  Experimental information  Molecular description  Structural description Coordinates  Macromolecule  Ligands  Solvent

History of Project 1990 mmCIF project begins 1992 NDB serves as testbed 1998 PDB adopts mmCIF as core data representation 2001 PDB Exchange Dictionary incorporates X-ray, NMR and cryoEM 2003 direct translation of mmCIF data & dictionaries into XML(PDBML)

Challenges in Creating an Ontology  Appropriate coverage and level of detail  Acquiring and organizing expert input  Getting consensus  Evolution with the science  Create a rigorous syntax that can be translated (eg mmCIF ->XML)

mmCIF (PDB Exchange) is an Ontology Relationships among data items are explicit

Features of Dictionary  Data Items  Definitions  Examples  Data types  Ranges or enumerations  Simple organization  Tables and columns (categories)  Related data item sets (subcategories)  Chapters (category groups)  Associations  Parent-child relationships  Interdependencies/exclusivity  Methods

Dictionary Definition Example save__em_detector.type _item_description.description ; The detector type used for recording images. Usually film or CCD camera. ; _item.name '_em_detector.type' _item.category_id em_detector _item.mandatory_code no _item_type.code line loop_ _item_enumeration.value 'KODAK SO163 FILM' 'GATAN 673' 'GATAN 676' ’TVIPS TEMCAM F224' 'TVIPS FASTSCAN F114' PROSCAN AMT save_ Controlled vocabulary Data type Schema Semantics

Dictionary Definition Example save__struct_biol.id _item_description.description ; The value of _struct_biol.id must uniquely identify a record in the STRUCT_BIOL list. Note that this item need not be a number; it can be any unique identifier. ; _item.name '_struct_biol.id' _item.category_id struct_biol _item.mandatory_code yes _item_type.code line loop_ _item_linked.child_name _item_linked.parent_name '_struct_biol_gen.biol_id' '_struct_biol.id' '_struct_biol_keywords.biol_id' '_struct_biol.id' '_struct_biol_view.biol_id' '_struct_biol.id' '_struct_ref.biol_id' '_struct_biol.id' save_ Parent-child (foreign key) relationships Data type Schema Semantics

Molecular Description  Macromolecular sequence  Macromolecular source  Detailed chemical descriptions of monomers  Detailed chemical descriptions of ligands and solvent

Non-polymer Chemical Details Molecular Description Molecular Component Dictionary Biological Source Molecular Hierarchy Macromolecular Polymer Sequence

Structural Description  Coordinates of the experimental subunit  Symmetry operations required to build functional assemblies  Structural annotation  Secondary structure  Hydrogen bonding classification  Base pairs and base pair steps  Backbone torsions and base morphology

Base Pairs Base Pair Steps Hydrogen Bonding Secondary Structure Atomic Coordinates Experimental Subunits Functional Units Molecular Description Backbone Torsions Base Morphology Structural Hierarchy

Connection between Molecular and Structure Descriptions  Macromolecular sequences are explicitly aligned to experimentally determined chemical sequences  Monomers, ligands and solvent matched with chemical descriptions in the PDB molecular components dictionary Molecular Description Structural Description

Relationships with other Resources  Sequence database correspondences  Domain/family annotation  Functional annotation (GO/EC/OMIM)  Structural database correspondences  SCOP/CATH/RNAML structural classifications  Functional annotation  Citation and related literature

Supporting Software Tools Dictionaries, Data Files and Databases  Validating Parsers for Files and Dictionaries (CIFPARSE)  Dictionary access and presentation tools (CIFOBJ)  File format translation tools (MAXIT, CIFTr)  PDB Validation Suite  Data acquisition and editor tool (ADIT)  Database Builder, Loader (mmCIFLOADER)  XML translation tool  Data extraction and merging tools (PDB_EXTRACT)

Availability  WWW and CDROM Distribution  Source and Binary Distributions  Open Source License  Supported on Linux, IRIX, ALPHA, SUNOS, and Mac OSX

Structure Related Data Dictionaries DDL2 mmCIF RNAML Ligand data NMR Cryo-EM Modeling Crystallization Symmetry Image data BIOSYNc Protein Production

Access  RCSB Protein Data Bank Site  RCSB/PDB Beta Data Site  RCSB/PDB Dictionary Resource Site /  RCSB/PDB Deposition Site /  PDBML site  RCSB/PDB Software Download Site /