OASIS Environment (Omics Analysis for microbial organisms) Internet Data Base Lab, SNU 2005, 12.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Microarray Data Analysis Day 2
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Gene Ontology John Pinney
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
KEGG: Kyoto Encyclopedia of Genes and Genomes Susan Seo Intro to Bioinformatics Fall 2004.
Gene expression analysis summary Where are we now?
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
The Protein Data Bank (PDB)
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Ch10. Intermolecular Interactions and Biological Pathways
Automatic methods for functional annotation of sequences Petri Törönen.
Metagenomic Analysis Using MEGAN4
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Networks and Interactions Boo Virk v1.0.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
An overview of Bioinformatics. Cell and Central Dogma.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
A collaborative tool for sequence annotation. Contact:
Bioinformatics and Computational Biology
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
GO based data analysis Iowa State Workshop 11 June 2009.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
High throughput biology data management and data intensive computing drivers George Michaels.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Selection of Resources for the Development of an Information Service Program in Molecular Biology and Genetics Ansuman Chattopadhyay, PhD Information Specialist.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Pathway Team SNU, IDB Lab. DongHyuk Im DongHee Lee.
Networks and Interactions
Biological Databases By: Komal Arora.
Data-intensive Computing: Case Study Area 1: Bioinformatics
GO : the Gene Ontology & Functional enrichment analysis
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Department of Genetics • Stanford University School of Medicine
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Introduction to Bioinformatics
Presentation transcript:

OASIS Environment (Omics Analysis for microbial organisms) Internet Data Base Lab, SNU 2005, 12

Contents  Introduction  System architecture and Component Databases  Gene Ontology  Go Annotation  KEGG Pathway  Protein-Protein Interaction  Subcellular Localization DB  PubMed DB  Blast DB  Available applications and issues  Common Gateway  Pathway Application  PPI Application  Subcellular Localization  Semantic Similarity Search  GO Application  References  Conclusion  Appendix

Introduction(1/6)  Omics  -Omics is a suffix commonly attached to biological subfields for describing very large-scale data collection and analysis. It is supposed to mean the study of whole 'body' of some definable entities  Genomics  The study of the structure and function of large numbers of genes simultaneously  Proteomics  The study of the structure and function of proteins, including the way they work and interact with each other inside cells object Omics viewpoints

Introduction(2/6)  Need of omics analysis system  Many biological databases for individual gene or protein information  Relation or network of this information can reveal the new facts or insights  Many tools and DBs for each area such as pathway, PPI, subcellular localization exist  Integration of these analyses can show another picture of biological phenomena Analysis 1 Analysis 1.5 Analysis 2Analysis 1+2

Introduction(3/6)

Introduction(4/6)  Microbial organisms  Many fully sequenced genomes (228 completed, 669 ongoing)  A small amount of genes  Influenza(1,700) Yeast(6,000) Fly(13,000) Human(25,000)  Microbial organisms have low information complexity  A large amount of information  Functions of genes revealed  Microbial organisms (50%), Human (5%)  A good starting point for bioinformatics research

Introduction(5/6)  Project  Participants  IDB lab., SNU  Laboratory of Plant Genomics, KRIBB  Cheol-Goo Hur (Ph. D., Director)  Mi Kyoung Lee  Goals  Implementation of basic framework for omics research  Creation of databases for microbial organisms  Acquisition of new insight into the biological data with analysis applications  Related projects  CJ project, KRIBB genome X project  System validation will be done by these projects  A new genome can be analyzed under OASIS environment

Introduction(6/6)  Omics projects in Korea  The center for functional analysis of human genome  1999~2010, 170 billion won  KRIBB  Crop functional genomics center  2001~2011, 100 billion won  SNU  Microbial genomics & applications  2002~2012, 100 billion won  KRIBB  Functional proteomics center  2002~2012, 100 billion won  KIST  Supported by the Ministry of Science and Technology

Contents  Introduction  System architecture and Component Databases  Gene Ontology  Go Annotation  KEGG Pathway  Protein-Protein Interaction  Subcellular Localization DB  Pubmed DB  Blast DB  Available applications and issues  Common Gateway  Pathway Application  PPI Application  Subcellular Localization  Semantic Similarity Search  GO Application  References  Conclusion  Appendix

System architecture (Databases) KEGG pathway PPI DB Subcellular Localization DB  Databases Biological process Molecular function Cellular component GO Annotation DB (UniProt) Blast DB GO annotationSequence matching RDF storage, RDBMS PubMed Biomedical Literature

Gene Ontology(1/2)  GO works as a dictionary  It only describes the definition and the relationship between terms  We need the relationship between gene products  We need other useful information of gene products  Biological process  KEGG pathway database  Molecular function  PPI database  Cellular component  Subcellular localization database

Gene Ontology(2/2) mitochondrion inheritance The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton. We will analyze the information of gene products by Gene Ontology

GO Annotation DB (1/2) GO Annotation DB Gene product Annotation data GOA Other DB Input Data Gene Ontology RDF Publish

GO Annotation DB (2/2)  GOA UniProtP051003MG1_ECOLIGO: GOA:interpro IEAPproteintaxon: UniProt UniProtP051003MG1_ECOLIGO: GOA:spkw IEAPproteintaxon: UniProt UniProtP051003MG1_ECOLIGO: GOA:spkw IEAPproteintaxon: UniProt

KEGG Pathway(1/3)  Kyoto Encyclopedia of Genes and Genomes  Bioinformatics Center, Kyoto University  Pathway  Network of interacting proteins used to carry out biological functions such as metabolism and signal transduction  Metabolic pathways themselves are sufficiently discovered  Relations  Compound-Enzyme-Compound relation  Protein-Enzyme relation

KEGG Pathway(2/3)

KEGG Pathway(3/3) 1 EC: > GO:ribokinase activity ; GO: This mapping is provided by GO consortium Or A protein can be mapped to GO by GOA

Protein-Protein Interaction(1/2)  Protein-Protein interaction  Proteins work together  If protein A is involved in function X and we obtain evidence that protein B functionally associates with A, then B is also involved in X  Databases  Experimental data  In-silico prediction

Protein-Protein Interaction(2/2) gene cluster 0.4

Subcellular localization DB  Subcelluar localization  Location in a cell  If two proteins locate at the same site in a cell, they are likely to have the same function  PSORT is a computer program for the prediction of protein localization sites in cells  Human Genome Center, University of Tokyo  Simon Fraser University, Canada  Input: Amino acids sequence, source of sequence  Output: the possibility for the input protein to be localized at each candidate site with additional information

PubMed DB  PubMed  PubMed is a service of the National Library of Medicine that includes over 15 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s  Every article has a PubMed ID(PID)  Gene annotations usually have PIDs  We can download the abstracts freely

Blast DB  Basic Local Alignment Search Tool (BLAST)  The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches  We need our own local blast DB  To do  Download the sequence file  Format blast DB  Set up an interface for blast search

Contents  Introduction  System architecture and Component Databases  Gene Ontology  Go Annotation  KEGG Pathway  Protein-Protein Interaction  Subcellular Localization DB  Pubmed DB  Blast DB  Available applications and issues  Common Gateway  Pathway Application  PPI Application  Subcellular Localization  Semantic Similarity Search  GO Application  References  Conclusion  Appendix

PubMed information System Architecture (Applications) Cellular localization prediction Pathway mapping prediction visualization GO mapping visualization (GOGuide) Protein interaction prediction visualization Semantic Similarity Search Common Applications Blast Search

Common gateway(1/2) Data sourceDescriptionSelect sourceProperties GO Gene ontologydefinition PPI Protein-protein interaction Gene cluster Cellualr Localization Cellular component Pathway Metabolic pathway Literature PubMed Query Interface

Common gateway(2/2) Properties to search Go definition Cell growth PPI probability 0.8 Properties to display Go tree PPI network

Pathway Applications(1/3)  Pathway

Pathway Applications(2/3) Unknown gene New pathway

Pathway Applications(3/3)  Issues  Searching the pathway  Mapping the existing information to pathway  Prediction of the protein’s unknown pathway  Microarray gene expression analysis

PPI Applications(1/3)  Protein-Protein interaction

PPI Applications(2/3)

PPI Applications(3/3)  Issues  Database construction  Sequence-based prediction  Genome-based prediction  Structure-based prediction  Comparisons between experimental methods and computational methods  Microarray analysis

Subcelluar localization Applications(1/2)  Cellular component prediction

Subcelluar localization Applications(2/2)  Issues  Construction of databases  Comparison between machine learning approaches  Multiple locations problem  Using literature or protein function annotation

Semantic Similarity Search  Input  A gene product information  Keyword, sequence, id  Output  Similar gene products  Issues  GP Similarity  Calculate functional similarity between gene products based on the annotation information of gene products  GORank  Retrieve gene products which are similar with a given gene product in the descendant order of their similarity

GO Applications(1/2)

GO Applications(2/2)  Issues  Gene Ontology is a standard for interpretation of various analysis results  Mapping analysis results to GO  GO browsing, clustering

PubMed Information

Contents  Introduction  System architecture and Component Databases  Available applications and issues  References  Conclusion  Appendix

References(1/2)  The Gene Ontology Consortium, “Creating the gene ontology resource: design and implementation”, Genome Research, 2001  Kanehisa M. et al, “The KEGG resource for deciphering the genome ”, Nucleic Acids Research, 2004  Bairoch A. et al, “The Universal Protein Resource (UniProt)”, Nucleic Acids Research, 2005  Camon, E. et al, “The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology”, Nucleic Acids Research, 2005  Kei-Hoi Cheung et al, “YeastHub: s semantic web use case for integrating data in the life science domain”, Bioinformatics, 2005

References(2/2)  Peter M. et al, “Prolinks: a database of protein functional linkages derived from coevolution”, Genome Biology, 2004  Christian von Mering et al, “STRING: known and predicted protein-protein associations, integrated and transferred across organisms”, Nucleic Acids Research, 2005  Gardy, J. L. et al, “PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria”, Nucleic Acids Research, 2003  P.W. Lord et al, “Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation”, Bioinformatics, 2003

Contents  Introduction  System architecture and component databases  Available applications and issues  References  Conclusion  Appendix

Conclusion(1/3)  Research with OASIS environment  Visualization of the information network  Offering various network components A series of genes or proteins OASIS Information network

Conclusion(2/3)  Research with OASIS environment (cont’d)  Prediction of the unknown information Information network Locating information object or new network Problem solving

Conclusion(3/3)  Experimental environment for RDF processing and bioinformatics research  RDF is suitable for data integration and graph representation  Improvement of each application is possible  Expectation of getting a new angle on the biological data through the integrated analysis tools

Contents  Introduction  System architecture and component databases  Available applications and issues  References  Conclusion  Appendix

Appendix(1/4)  각 컴포넌트별 담당자  Pathway: 임동혁, 이동희  PPI: 유상원, 정호영, 이태휘  Subcellular localization: 정준원, 박형우  Similarity Search using GOA: 김기성, 김철한  GOGuide: 재사용  각 컴포넌트 완성 후 통합 인터페이스 구축

Appendix(2/4)  12~2 월 진행계획  Pathway 팀  Pathway based on RDF 완성 :12 월  KRIBB 요구 사항 반영 : 12 ~ 1 월  향후 연구 주제  Similar pathway Research  Visualization on pathway  Query Performance  PPI 팀  Prolinks 에서 사용한 기법에 기반한 DB 구축 :12 월  검색인터페이스 구축 :12 월 ~1 월  DB 품질 측정 : 1 월 ~2 월

Appendix(3/4)  향후 연구주제  각 DB 별 품질 비교 측정, 공통 부분 도출  DB 구축 알고리즘별 비교 분석  새로운 기법의 추가  Similarity Search (GORank) 팀  GORank 의 UI 작업 : 질의 입력 부분, 결과를 보여주는 부분  GORank 관리 기능 : 인덱스 구축, similarity 계산 등  RDF publish 구현 : GO, Protein 의 annotation 정보를 RDF 로 publish  향후 연구주제  GORank 를 사용한 GO Annotation 검증 툴, 또는 Clustering 에 응용

Appendix(4/4)  Subcellular Localization 팀  12 월까지 PSORT DB 구축  PSORT 및 localization prediction 기법 연구  연구실 구축 시스템에서 데이터의 연관성 기반의 localization prediction 기법 연구