RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB. 2005 DongHyuk Im.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Building a Semantic IntraWeb with Rhizomer and a Wiki Roberto Garcia and Rosa Gil GRIHO (Human Computer Interaction Research Group) Universitat de Lleida,
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
SRI International Bioinformatics Comparative Analysis Q
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
KEGG: Kyoto Encyclopedia of Genes and Genomes Susan Seo Intro to Bioinformatics Fall 2004.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Introduction to Bioinformatics - Tutorial no. 13 Probe Design Gene Networks.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Internet tools for genomic analysis: part 2
Toward Making Online Biological Data Machine Understandable Cui Tao Data Extraction Research Group Department of Computer Science, Brigham Young University,
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Ch10. Intermolecular Interactions and Biological Pathways
Metagenomic Analysis Using MEGAN4
Development of Bioinformatics and its application on Biotechnology
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Tae-Hyung Kim 1 Gil-Mi Ryu 1,2 InSong Koh 2 Jong Park 3 1.
Gene Set Enrichment Analysis (GSEA)
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
Introduction to Pharmacoinformatics
Bioinformatics Dr. Víctor Treviño BT4007
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Networks and Interactions Boo Virk v1.0.
Biological Databases By : Lim Yun Ping E mail :
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
Tutorial on Current Biochemical Pathway Visualization Tools By Rana Khartabil.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
12/7/2015Page 1 Service-enabling Biomedical Research Enterprise Chapter 5 B. Ramamurthy.
An overview of Bioinformatics. Cell and Central Dogma.
Practical RDF Ch.10 Querying RDF: RDF as Data Taewhi Lee SNU OOPSLA Lab. Shelley Powers, O’Reilly August 27, 2004.
A collaborative tool for sequence annotation. Contact:
Bioinformatics and Computational Biology
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Importing KEGG pathway and mapping custom node graphics on Cytoscape Kozo Nishida Keiichiro Ono Cytoscape retreat 2010 University of Michigan Jul 18, 2010.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Pathway Team SNU, IDB Lab. DongHyuk Im DongHee Lee.
Biological Databases By: Komal Arora.
Bioinformatics Capstone Project
Predicting Active Site Residue Annotations in the Pfam Database
Genome organization and Bioinformatics
Nancy Baker SILS Bioinformatics Seminar January 21, 2004
Annotation Presentation
Service-enabling Biomedical Research Enterprise
Presentation transcript:

RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im

Contents  Introduction  Pathway Database  Enzyme Database  Gene Ontology  Related Works  Our Approach  Supporting Function  Data Transformation  Integration of KEGG, Enzyme, Gene Ontology  Querying using SeRQL

Pathway?  Most chemical reaction mechanisms are translated from a compound(substrate) to a compound(product) by enzyme acting  Importance  to comparison and analyze pathways in order to understand the process of creating compounds and the evolutive relevance between organisms  Drug Discovery

Pathway Map : Glycolysis / GluconeogenesisMap : Aquifex aeolicus

Enzyme Database  EC number  Recommended name  Alternative names(if any)  Catalytic activity  Cofactors (if any)  Pointers to the SWISS-PORT entrie(s) that correspond to the enzyme (if any)  Pointers to disease(s) associated with a deficiency of the enzyme (if any)

Enzyme Hierarchy [*] [1][2][3] [2.1][2.2][2.3] [2.2.1][2.2.2][2.2.3] [ ][ ][ ]  Four levels  EC number  Ex) is a member of the top level group [1]  The leftmost number identifies the highest level  [ ] – [ ](sibling) : similar reactions in pathway

Gene Ontology

KEGG

 To computerize all aspects of cellular functions in terms of the pathway of interacting molecules or genes  To maintain gene catalogs for all organisms and link each gene product to a pathway component  To organize a database of all chemical compounds in the cell and link each compound to a pathway component  To develop computational technologies for pathway comparison, reconstruction, and analysis

Why RDF Integration?  Pathway data model : DAG  RDF is a good model for representing pathway  RDF data model : DAG  Need integration of multiple knowledge sources available from internet : one of the major problems in biologists  RDF is a good model for same standard  Enzyme, GO : hierarchy structure  RDF is a good model for representing hierarchy structure  GO annotation is important  Enzymes(proteins) in certain pathway need GO annotation

Related Works  KEGG: Kyoto Encyclopedia of Genes and Genomes, 1999, Nucleic Acids Res.  YeastHub: a semantic web case for integrating data in the life science domain, 2005, Bioinformatics  LIGAND: database of chemical compounds and reactions in biological pathways, 2002, Nucleic Acids Res.  Gene Ontology: tool for the unification biology, the Gene Ontology Consortium, 2000, Nature Genetics.

Our System’s Supporting  KEGG  Search compound  Path prediction  Search Enzyme  Our system’s function to add  Integration Query (pathway+enzyme+GO)  Relaxation Query using GO hierarchy  Searching pathway using enzyme information

Search Compounds Compound : C00668 target

Pathway Prediction Tool compound Relaxation query using enzyme hierarchy

Search Enzyme Enzyme :

From Pathway to Gene Ontology Select enzyme

Data Translation for Integration KGML Data XSLT KEGG RDF Data Enzyme RDF Data GO RDF Data GENOS Storage Adding GO ID XSLT :

KEGG RDF Data(1/2) <Rectangle k:name="aldH1" k:fgcolor="#000000" k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/> <Rectangle k:name=" " k:fgcolor="#000000" k:bgcolor="#FFFFFF" k:x="170" k:y="1039" k:width="45" k:height="17"/> <Circle k:name="C00033" k:fgcolor="#000000" k:bgcolor="#FFFFFF" k:x="102" k:y="971" k:width="8" k:height="8"/> Gene entry Enzyme entry Compound entry No information

KEGG RDF Data(2/2) Relation Reaction

How to Process KEGG Pathway  Problem  GENOS(Sesame) does not support multiple graph  KEGG data consists of multiple documents  Ex) map00010.rdf, aae00010.rdf …  Solution  Using namespace, we can distinguish maps  When Storing pathway data, pathway’s map name is added as a namespace in resource table of GENOS

Processing Pathway Data …. <Rectangle k:name="aldH1" k:fgcolor="#000000" k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/> conflict IDNameSpaceLocalname 1…… 2…Glycolysis/… 3aae#00010_1 4…aq_186 5… 6aae#00020_1 7 8map#00010_1 9…. resources table of GENOS SubjectPredicateObject ……… 3…… 6…… 8…… ……… triples table of GENOS

Integrating Databases Enzyme number GO ID

Relaxation Querying using SeRQL E1.* C1 C2 E1 SELECT C1,C2 FROM Path_EXP WHERE E1 LIKE “1.*" Dewey order Ex. 1.1 and 1.2 are childrens of 1 use Prefix SeRQL subclassof

Considering Performance aae:aq_018path:aae03010 aae:aq_020path:aae03010 aae:aq_021path:aae00400 …. eco:b1236path:eco00052 eco:b1236path:eco00500 eco:b1236path:eco00520 …. KEGG : Pathway List GenesMap using genes_index

Schedule  Implementation (~11/30)  Integrated Databases  Query Processor for pathway  Simple UI (Web :JSP)  Complete Paper (~12/10)