Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester,

Slides:



Advertisements
Similar presentations
ChEBI and SABIO-RK: Association of chemical compound information and reaction kinetics data Ulrike Wittig Scientific Databases and Visualization Group.
Advertisements

SBML2Murphi: a Translator from a Biology Markup Language to Murphy Andrea Romei Ciclo di Seminari su Model Checking Dipartimento di Informatica Università.
CellDesigner Tutorial Laurence Calzone, Andrei Zinovyev UMR U900 INSERM/Institut Curie/Ecole des Mines de Paris Wednesday, April 30th.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Gene Ontology John Pinney
EBI is an Outstation of the European Molecular Biology Laboratory. IntEnz Integrated relational Enzyme database 23 May 2015.
Enzymes.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
University of Leeds Department of Chemistry The New MCM Website Stephen Pascoe, Louise Whitehouse and Andrew Rickard.
Petri net modeling of biological networks Claudine Chaouiya.
Enzymes. What is an enzyme? globular protein which functions as a biological catalyst, speeding up reaction rate by lowering activation energy without.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Dr. Víctor Treviño BT4007
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
BioUML Fedor Kolpakov Institute of Systems Biology (spin-off of DevelopmentOnTheEdge.com) Laboratory of Bioinformatics, Design Technological Institute.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. BioModels Database, a public model- sharing resource In silico systems biology: network.
Converting Macromolecular Regulatory Models from Deterministic to Stochastic Formulation Pengyuan Wang, Ranjit Randhawa, Clifford A. Shaffer, Yang Cao,
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
Biological Databases By : Lim Yun Ping E mail :
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
LibAnnotationSBML Neil Swainston Manchester Centre for Integrative Systems Biology 29 March 2009.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
Sharing Models. How Can I Exchange Models? SBML (Systems Biology Markup Language): de facto standard for representing cellular networks. A large number.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Rhea Annotated reactions database 17 December 2015.
Systems Biology Markup Language Ranjit Randhawa Department of Computer Science Virginia Tech.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
A database of biological pathways and processes (borrowed from a presentation created by Steve Jupe)
1, StarOmics course,Lausanne, Monday November 19 th Training agenda Chemicals Reactions Enzymes Pathways.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
EBI is an Outstation of the European Molecular Biology Laboratory. Tutorial 5: ChEBI - On-line Submission and Curation.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
BENG/CHEM/Pharm/MATH 276 HHMI Interfaces Lab 2: Numerical Analysis for Multi-Scale Biology Modeling Cell Biochemical and Biophysical Networks Britton Boras.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
Describing and Annotating Experimental Data: Hands On.
Title: Lesson 4 B.2 Enzymes Learning Objectives: – Describe the structure and the function of an enzyme – Identify and explain the factors that affect.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Pathway Team SNU, IDB Lab. DongHyuk Im DongHee Lee.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Ministry of Economic Development and Innovation
Data Exchange & Public Reference Data
FIZZ Database General presentation.
Annotation Presentation
Network biology An introduction to STRING and Cytoscape
Identify and Investigate the role of enzymes.
Presentation transcript:

Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp th of January 2007, Manchester, UK Annotating SABIO-RK: Integration of MIRIAM and SBO

Biochemical model simulations need experimental reaction kinetics data Kinetic parameter values highly depend on environmental conditions (temperature, pH, concentrations of reactants and modifiers, etc.) Enzyme characteristics vary between organisms, tissues and cellular locations Kinetic parameters are only interpretable with their corresponding kinetic laws Most databases do not link experimental kinetic data for single reactions to complete sets of information comprising all the information mentioned above Data must be easily accessible and interchangeable (data export for exchange) We aimed at creating a database that collects and standardizes kinetic data, relates the data to its biochemical, environmental and experimental context, cross-links corresponding data and associates it with external resources to make the data comparable and accessible in standard formats Why we have developed SABIO-RK ?

SABIO-RK  Merges information about biochemical reactions and pathways mainly collected from other databases (e.g. KEGG) with corresponding kinetic data manually extracted from literature (including the environmental context)  Is curated manually, assisted by semi-automatic tools (e.g. lists of values)  Unifies, systematically structures and interrelates the data  Can be accessed through a web-based user interface and through web-services  Supports export of the data in SBML for exchange  Links entities and expressions to complementary databases and ontologies Database population and access

Database population: data extraction Data source: Kinetic data contained in publications Text with non-local, highly scattered information Tables, Formula, Graphs, Pictures Some information is only noted as reference Problems: No 1:1 relation between the paper and the input mask! No controlled vocabulary (e.g. different names of one compound or enzyme)  fuzziness of descriptions Full-text publicationSABIO-RK input interface

Problems in the database population Missing or only partial information in the data source: - Incomplete reactions (products not mentioned) - Assay conditions missing or reference to another paper - Kinetic law equation (or fitting equation) not described Multiplicity of kinetic law types: no real standard used in publications (or even available, except SBO)  varying notations referring to several kinetic theories Parameter units: - Multiple definitions (e.g. Katal or Unit for enzyme activities) - Different compositions (e.g. µmol/s or µmol/(s*mg) for Vmax) - Wrong parameter unit (e.g. 1/s for Vmax) Identification of compounds, reactions and enzymes: - Ambiguous descriptions of chemical compounds or enzymes (e.g. missing stereochemical information for stereoisomers, simplifying trivial names,...)

Data integration problems =nmol/(min*mg) =U/mg 1 U = the amount of enzyme which catalyses the transformation of 1 µmol of the substrate per minute under standard conditions e.g. Parameter units:

Annotations and controlled vocabularies Infosource PubMed ID title authors journal Kinetic Law type  SBO equation Environment buffer pH temperature Reactant, Modifier (Species) compound name (given in publication) role (e.g. substrate, inhibitor)  SBO cellular location  Gene Ontology comments (modifications etc.) Kinetic Parameter name type (e.g. Km, kcat)  SBO value (range) standard deviation comment SBO-ID Reaction stoechiometry EC classification enzyme variant General Information organism  NCBI-ID tissue pathway comments Unit Compound recommended name synonymic names IDs in external databases (e.g. KEGG, ChEBI) additional information for a determined under parameter units corresponding species participate in belongs to refers to reported for from a SBML Unit defined as Protein complex UniProt IDs catalyzes Annotations to external resources Controlled vocabulary

Annotations of entities in SABIO-RK Annotations shown to the user:  Chemical compounds to KEGG compound and ChEBI  Enzymatic activities to Expasy, KEGG, IntEnz, IUBMB and Reactome (query links in the user interface based on the enzyme classification EC)  Enzyme protein complexes to UniProt/Swiss-Prot  Cellular locations (compartments etc.) to Gene Ontology (as query link)  Publications (data sources) to PubMed Annotations integrated in SABIO-RK, not yet implemented for the output:  Organisms to NCBI taxonomy  Kinetic law types and parameter types to SBO (Systems Biology Ontology)  Species role (substrate, product, modifier, etc.) to SBO  Reactions to KEGG reactions More annotations following the MIRIAM standard are planed...

Controlled vocabularies in SABIO-RK - To unambiguously identify entities or terms - Facilitate the search, interpretation and comparison of the data - Permits a matching with other database resources based on shared vocabulary - Facilitate the integration of different database entries into kinetic models Lists of values (LOV) in the input interface:  Species (compounds) and species roles (e.g. substrate, product, modifier …)  Biochemical reactions and pathways  Organisms (NCBI taxonomy), tissues and cellular locations  Kinetic law types (e.g. ‚Competitive inhibition‘ or ‚Sequential ordered Bi Bi‘)  Parameter types (e.g. Km‚ kcat, Vmax, Ki, Kd, rate constant, pH, pK...)  Parameter units (e.g. mM, µM, 1/s, nmol/min, U/(h*mg)...)  Corresponding species for kinetic parameters (like for Km, Ki or concentrations)

Other notation standards in SABIO-RK Semi-controlled notation standards: - Kinetic law equation (analyzed for mathematical correctness when entered) - Enzyme variants (e.g. wildtype, mutant E540K, wildtype isoenzyme PFKL...) - Protein complex of the enzyme: e.g. (Q6UG02)*4 for a hometetramer - Recombinant enzymes: e.g. ‚expressed in Escherichia coli BL21(DE3)’ - Buffer composition in the experimental setup

Controlled vocabularies in SABIO-RK List of values (LOV) SABIO-RK input interface

Identifying chemical compounds Every chemical compound can have multiple synonymic descriptions e.g.: Trivial name and systematic chemical description Valproic acid = 2-Propylpentanoic acid Different parts of the molecule could be considered as lead structure Acetyl phenol = Phenylacetate Abberrant order of the substituents of a lead structure (prefixes) 2-Amino-6-methyl-4-pyrimidol = 6-Methyl-2-amino-4-pyrimidol Description of substituents as prefix (like amino-) or suffix (like –amine) 1-(4-Iodo-2,5-dimethoxyphenyl)-2-aminopropane = 1-(4-iodo-2,5-dimethoxy-phenyl)propan-2-amine 3,17-Dioxoandrost-4-ene = 4-Androstene-3,17-dione Different nomenclature systems (e.g. abberrant order of the morphems) 2-Amino-6-methyl-4-pyrimidol= 2-Amino-6-methylpyrimidin-4-ol 2-Methylpropan-2-ol = 2-Hydroxy-2-methyl-propane

Normalization of compound names Goals: Comparing and linking databases with names of chemical compounds, i.e. synonym detection disregarding orthographic and (minor) morpho- syntactic variance in naming Matching chemical compound names against existing synonym lists (e.g. ChEBI, PubChem) to identify synonyms with differences in naming not arising from orthographic variations, like trivial names and systematic names.

Normalization of compound names CompoundID: IUPAC Name: 2-phenylpropanoic acid Canonical SMILES: CC(C1=CC=CC=C1)C(=O)O Synonyms Hydratropic acid 2-Phenylpropionic acid 2-Phenylpropanoic acid alpha-Phenylpropioic acid alpha-Methylphenylacetic acid.alpha.-Phenylpropionic acid alpha-Methylbenzeneacetic acid Benzeneacetic acid,.alpha.-methyl-.alpha.-Methylphenylacetic acid.alpha.-Methylbenzeneacetic acid ALPHA-PHENYLPROPIONIC ACID Benzeneacetic acid, alpha-methyl- (S)-alpha-Methylbenzeneacetic acid Benzeneacetic acid,.alpha.-methyl-, (S)- Benzeneacetic acid,.alpha.-methyl-, (R)- Benzeneacetic acid, alpha-methyl-, (R)- Benzeneacetic acid, alpha-methyl-, (S)- IDNAME 20986alpha-Phenylpropionate Normalized Name: alpha-phenylpropionate

Linguistic assisted compound analysis Systematic compound name StructureClassification

Access to SABIO-RK Available interfaces:  Web-based user interface for browsing and searching the data manually  Web Services (API access) can be automatically called by external tools, e.g. by other databases or simulation programs for biochemical network models Both interfaces support the export of the data in SBML

SABIO-RK user interface: Query

SABIO-RK user interface: Query result

SABIO-RK user interface: Reaction

SABIO-RK user interface: Enzyme

SABIO-RK user interface: database entry with kinetic data

SBML export from SABIO-RK

Reactions are coupled in exported SBML files every species is only defined once in the exported SBML file if several reactions refer to the same species Export of layout information in SBML - using the SBML layout extension - to draw reaction maps

Web service methods SABIO-RK API access - Integration in simulation tools - Cross-linking with other databases - Several possible entry points - Supports data export in SBML

Data in SABIO-RK: statistics PubMed records: 923 Organisms312 Pathways90 Reactions: 9600 Enzymes416 Measured parameters: enzyme activities (rate constant, kcat or Vmax )8118 Km (Michaelis constant)8701 Ki (inhibiton constant)1774 as of 09/01/2007

Data in SABIO-RK: statistics

Conclusions SABIO-RK is a web-accessible database containing biochemical reaction kinetics data for systems biologists and experimenters Merges general reaction information retrieved from external databases with kinetic data manually extracted from literature Manual curation of the data with some semi-automatic support High degree of interrelation within the database Type of kinetics, modes of inhibition or activation and corresponding equations are shown with their parameters, measured values and experimental conditions Access through a web-based user interface or through web services (API) Export of the data in SBML from both interfaces Controlled vocabulary used and content annotated to ontologies and external resources

Future goals Information about detailed reaction mechanisms (elementary reaction steps) Expansion of the data export functions (more data, more annotations) Tools for information extraction and data integration Expand the usage of annotations and controlled vocabularies Extension of the database model to store signaling reactions Convince scientists to directly insert their kinetic data into SABIO-RK

SABIO-RK project team and many more: students, colleagues at EML Research and other collaborators…. Financial support:

Workshop Invitation Workshop Storage and Annotation of Reaction Kinetics’ Data May 21-23, 2007 Heidelberg, Germany Topics: - Data generation - Data storage and integration - Data annotation - Data usage