Presentation is loading. Please wait.

Presentation is loading. Please wait.

RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB. 2005 DongHyuk Im.

Similar presentations


Presentation on theme: "RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB. 2005 DongHyuk Im."— Presentation transcript:

1 RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB. 2005 DongHyuk Im

2 Contents  Introduction  Pathway Database  Enzyme Database  Gene Ontology  Related Works  Our Approach  Supporting Function  Data Transformation  Integration of KEGG, Enzyme, Gene Ontology  Querying using SeRQL

3 Pathway?  Most chemical reaction mechanisms are translated from a compound(substrate) to a compound(product) by enzyme acting  Importance  to comparison and analyze pathways in order to understand the process of creating compounds and the evolutive relevance between organisms  Drug Discovery

4 Pathway Map : Glycolysis / GluconeogenesisMap : Aquifex aeolicus

5 Enzyme Database  EC number  Recommended name  Alternative names(if any)  Catalytic activity  Cofactors (if any)  Pointers to the SWISS-PORT entrie(s) that correspond to the enzyme (if any)  Pointers to disease(s) associated with a deficiency of the enzyme (if any)

6 Enzyme Hierarchy [*] [1][2][3] [2.1][2.2][2.3] [2.2.1][2.2.2][2.2.3] [2.2.2.1][2.2.2.2][2.2.2.3]  Four levels  EC number  Ex) 1.1.1.1 is a member of the top level group [1]  The leftmost number identifies the highest level  [2.4.2.3] – [2.4.2.4](sibling) : similar reactions in pathway

7 Gene Ontology

8 KEGG

9  To computerize all aspects of cellular functions in terms of the pathway of interacting molecules or genes  To maintain gene catalogs for all organisms and link each gene product to a pathway component  To organize a database of all chemical compounds in the cell and link each compound to a pathway component  To develop computational technologies for pathway comparison, reconstruction, and analysis

10 Why RDF Integration?  Pathway data model : DAG  RDF is a good model for representing pathway  RDF data model : DAG  Need integration of multiple knowledge sources available from internet : one of the major problems in biologists  RDF is a good model for same standard  Enzyme, GO : hierarchy structure  RDF is a good model for representing hierarchy structure  GO annotation is important  Enzymes(proteins) in certain pathway need GO annotation

11 Related Works  KEGG: Kyoto Encyclopedia of Genes and Genomes, 1999, Nucleic Acids Res.  YeastHub: a semantic web case for integrating data in the life science domain, 2005, Bioinformatics  LIGAND: database of chemical compounds and reactions in biological pathways, 2002, Nucleic Acids Res.  Gene Ontology: tool for the unification biology, the Gene Ontology Consortium, 2000, Nature Genetics.

12 Our System’s Supporting  KEGG  Search compound  Path prediction  Search Enzyme  Our system’s function to add  Integration Query (pathway+enzyme+GO)  Relaxation Query using GO hierarchy  Searching pathway using enzyme information

13 Search Compounds Compound : C00668 target

14 Pathway Prediction Tool compound Relaxation query using enzyme hierarchy

15 Search Enzyme Enzyme : 5.3.1.9

16 From Pathway to Gene Ontology Select enzyme

17 Data Translation for Integration KGML Data XSLT KEGG RDF Data Enzyme RDF Data GO RDF Data GENOS Storage Adding GO ID XSLT : http://www.w3.org/2005/02/13-KEGG/

18 KEGG RDF Data(1/2) <Rectangle k:name="aldH1" k:fgcolor="#000000" k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/> <Rectangle k:name="1.2.1.5" k:fgcolor="#000000" k:bgcolor="#FFFFFF" k:x="170" k:y="1039" k:width="45" k:height="17"/> <Circle k:name="C00033" k:fgcolor="#000000" k:bgcolor="#FFFFFF" k:x="102" k:y="971" k:width="8" k:height="8"/> Gene entry Enzyme entry Compound entry No information

19 KEGG RDF Data(2/2) Relation Reaction

20 How to Process KEGG Pathway  Problem  GENOS(Sesame) does not support multiple graph  KEGG data consists of multiple documents  Ex) map00010.rdf, aae00010.rdf …  Solution  Using namespace, we can distinguish maps  When Storing pathway data, pathway’s map name is added as a namespace in resource table of GENOS

21 Processing Pathway Data …. <Rectangle k:name="aldH1" k:fgcolor="#000000" k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/> conflict IDNameSpaceLocalname 1…… 2…Glycolysis/… 3aae#00010_1 4…aq_186 5… 6aae#00020_1 7 8map#00010_1 9…. resources table of GENOS SubjectPredicateObject ……… 3…… 6…… 8…… ……… triples table of GENOS

22 Integrating Databases Enzyme number GO ID

23 Relaxation Querying using SeRQL E1.* C1 C2 E1 SELECT C1,C2 FROM Path_EXP WHERE E1 LIKE “1.*" Dewey order Ex. 1.1 and 1.2 are childrens of 1 use Prefix SeRQL subclassof

24 Considering Performance aae:aq_018path:aae03010 aae:aq_020path:aae03010 aae:aq_021path:aae00400 …. eco:b1236path:eco00052 eco:b1236path:eco00500 eco:b1236path:eco00520 …. KEGG : Pathway List GenesMap using genes_index

25 Schedule  Implementation (~11/30)  Integrated Databases  Query Processor for pathway  Simple UI (Web :JSP)  Complete Paper (~12/10)


Download ppt "RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB. 2005 DongHyuk Im."

Similar presentations


Ads by Google