Bioinformatics Capstone Project

Slides:



Advertisements
Similar presentations
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Advertisements

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
BIOINFORMATICS Ency Lee.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
System Biology Study Group Walker Research Group Spring 2007.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
KEGG: Kyoto Encyclopedia of Genes and Genomes Susan Seo Intro to Bioinformatics Fall 2004.
Introduction to Bioinformatics - Tutorial no. 13 Probe Design Gene Networks.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
From Databases to Dynamics Dr. Raquell M Holmes Center for Computational Science Boston University.
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Ch10. Intermolecular Interactions and Biological Pathways
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
Knowledge Enabled Information and Services Science GlycO.
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
T.Jadczyk, Bioinformatics Applications in the Virtual Laboratory Bioinformatics Applications in the Virtual Laboratory Tomasz Jadczyk AGH University of.
Tutorial on Current Biochemical Pathway Visualization Tools By Rana Khartabil.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
K Phone: Web: A Software Package for the Design and Analysis of Microbial Functional.
Pfam, DAS and the future Rob Finn DAS Workshop 2009.
Guiding motif discovery by iterative pattern refinement Zhiping Wang Advisor: Sun Kim, Mehmet Dalkilic School of Informatics, Indiana University.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
NTHU 共 21 頁,第 1 頁 Modeling and Simulating the Biological Pathway - case study - 第六組 Systems Biology Presentation.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
High throughput biology data management and data intensive computing drivers George Michaels.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
National Cancer Institute Uma Mudunuri ABCC, NCI-Frederick ISRCE Monthly Meeting, Nov 9th 2010 bioDBnet The biological DataBase network.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
BT8118 – Adv. Topics in Systems Biology
Comparative Analysis in BioCyc
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
Improving searches through community clustering of information
Interactions and Ontologies
An Advanced Web Query Interface for Biological Databases
The Pathway Tools FBA Module
The Pathway Tools Schema
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Enzymes.
Unit Metabolic Pathways & their Control
Human Cells Metabolic pathways
University of Pittsburgh
Mangaldai College, Mangaldai
Predicting Active Site Residue Annotations in the Pfam Database
Enzymes Page 23.
Overview of Microbial Pathway and Genome Databases
Annotation Presentation
Enzymes.
Applying principles of computer science in a biological context
Supporting High-Performance Data Processing on Flat-Files
Overview of the Pathway Tools FBA Module
8.1 Metabolism Understanding:
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

Bioinformatics Capstone Project The design and implementation of a system that integrates pathway data from KEGG and genome sequence data from NCBI Xiang (Sean) Zhou Advisor: Prof. Sun Kim Bioinformatics Capstone Project Indiana University 9/21/2018

Outline Background Methods Sample results Online demonstration Future direction 9/21/2018

Why do we want to study metabolic pathway? One of the challenges in life science is to uncover the fundamental design principle that provides the common underlying structure and function in all cells and microorganisms [2] . Metabolic pathway network serves as the tool to achieve the goal. 9/21/2018

Metabolic Pathway Definition of a metabolic pathway: A series of enzyme-catalyzed chemical reactions within a cell, which results in the removal of a molecule from the environment to be used/stored by the cell, or the initiation of another metabolic pathway[1]. A pathway is a linked set of biochemical reactions—linked in the sense that the product of one reaction is a reactant of, or an enzyme that catalyzes, a subsequent reaction[4]. 9/21/2018

Why is it so difficult to study metabolism in multiple genomes? The metabolism in one organism is too large to be grasped by a single mind. (i.e. E. coli has a metabolism involving over 850 substances and 1500 reactions.) Genome projects keep generating a large amount of sequence data. 9/21/2018

A sample metabolic pathway[3] 9/21/2018

Pathway Database(DB) A pathway DB is a bioinformatics DB that describes biochemical pathways and their component reactions, enzymes, and substrates[4]. 9/21/2018

Current Pathway DBs KEGG (Kyoto Encyclopedia of Genes and Genomes) The most comprehensive metabolic pathway DB. EcoCyc Encyclopedia of Escherichia coli K12 Genes and Metabolism. CGAP (Cancer Genome Anatomy Project) Pathways on the CGAP web site are obtained directly from BioCarta and KEGG. WIT It has changed to a commercial DB. 9/21/2018

Disadvantages of current DBs They are “static”. All data are pre-computed and stored in the DBs. User’s flexibility of choosing their genome and pathway of interest is limited. They can only study one genome at a time. User cannot compare the pathways in different genomes at the same time. 9/21/2018

Motivation Create a system User can select genomes and pathways of their interest and perform sequence analysis freely. Enables multi-genome pathways comparison The result is generated based on users need. 9/21/2018

Data Sources KEGG NCBI GenBank PLATCOM genome comparison data 9/21/2018

The Challenge In KEGG and NCBI GenBank The genome names and genes names are slightly different. The ids used in two DBs are totally different. Some of the protein id (pid) in KEGG are out-dated. Thus, integration of the two DBs is not trivial. 9/21/2018

The unique features of our system Easy to maintain Only need to download the latest datasets from KEGG and NCBI GenBank. Flexibility Sequence analysis is based on the combination of the genomes and pathways of user’s choice. Everything is computed on the fly. Integration of KEGG and NCBI GenBank DBs in terms of sequence analysis. 9/21/2018

Methods FASTA ClustalW HMMer A series of modules 9/21/2018

Infrastructure A query protein sequence A pathway A reference genome Interested genomes Protein information Pathway Information Search for missing genes 9/21/2018

PLATCOM-Metabolic Pathway Division 9/21/2018

Sample Result –(1) 9/21/2018

Sample Result –(2) 9/21/2018

Sample Result –(3) 9/21/2018

Online Demonstration PlatCom: A Platform for Computational Comparative Genomics 9/21/2018

Future Direction Use conserved domain to perform HMM search Enable sequence alignment and pattern search Connect to other DBs Protein-Protein Interaction DBs PDB Improve the performance by using dynamic cache. 9/21/2018

Reference H. JEONG, H., TOMBOR, B., ALBERT, R., OLTVAI, Z. N., and BARABÁSI, A.-L., (2000), The large-scale organization of metabolic networks, Nature, 407:651-654 http://www.free-definition.com/ http://www.genome.ad.jp/kegg/pathway.html Karp, PD, (2001), Pathway Databases: A Case Study in Computational Symbolic Theories, Science, 293:2040-2044 9/21/2018

Acknowledge Professor Sun Kim Kwangmin Choi Arvind Gopu 9/21/2018