Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Capstone Project

Similar presentations


Presentation on theme: "Bioinformatics Capstone Project"— Presentation transcript:

1 Bioinformatics Capstone Project
The design and implementation of a system that integrates pathway data from KEGG and genome sequence data from NCBI Xiang (Sean) Zhou Advisor: Prof. Sun Kim Bioinformatics Capstone Project Indiana University 9/21/2018

2 Outline Background Methods Sample results Online demonstration
Future direction 9/21/2018

3 Why do we want to study metabolic pathway?
One of the challenges in life science is to uncover the fundamental design principle that provides the common underlying structure and function in all cells and microorganisms [2] . Metabolic pathway network serves as the tool to achieve the goal. 9/21/2018

4 Metabolic Pathway Definition of a metabolic pathway:
A series of enzyme-catalyzed chemical reactions within a cell, which results in the removal of a molecule from the environment to be used/stored by the cell, or the initiation of another metabolic pathway[1]. A pathway is a linked set of biochemical reactions—linked in the sense that the product of one reaction is a reactant of, or an enzyme that catalyzes, a subsequent reaction[4]. 9/21/2018

5 Why is it so difficult to study metabolism in multiple genomes?
The metabolism in one organism is too large to be grasped by a single mind. (i.e. E. coli has a metabolism involving over 850 substances and 1500 reactions.) Genome projects keep generating a large amount of sequence data. 9/21/2018

6 A sample metabolic pathway[3]
9/21/2018

7 Pathway Database(DB) A pathway DB is a bioinformatics DB that describes biochemical pathways and their component reactions, enzymes, and substrates[4]. 9/21/2018

8 Current Pathway DBs KEGG (Kyoto Encyclopedia of Genes and Genomes)
The most comprehensive metabolic pathway DB. EcoCyc Encyclopedia of Escherichia coli K12 Genes and Metabolism. CGAP (Cancer Genome Anatomy Project) Pathways on the CGAP web site are obtained directly from BioCarta and KEGG. WIT It has changed to a commercial DB. 9/21/2018

9 Disadvantages of current DBs
They are “static”. All data are pre-computed and stored in the DBs. User’s flexibility of choosing their genome and pathway of interest is limited. They can only study one genome at a time. User cannot compare the pathways in different genomes at the same time. 9/21/2018

10 Motivation Create a system
User can select genomes and pathways of their interest and perform sequence analysis freely. Enables multi-genome pathways comparison The result is generated based on users need. 9/21/2018

11 Data Sources KEGG NCBI GenBank PLATCOM genome comparison data
9/21/2018

12 The Challenge In KEGG and NCBI GenBank
The genome names and genes names are slightly different. The ids used in two DBs are totally different. Some of the protein id (pid) in KEGG are out-dated. Thus, integration of the two DBs is not trivial. 9/21/2018

13 The unique features of our system
Easy to maintain Only need to download the latest datasets from KEGG and NCBI GenBank. Flexibility Sequence analysis is based on the combination of the genomes and pathways of user’s choice. Everything is computed on the fly. Integration of KEGG and NCBI GenBank DBs in terms of sequence analysis. 9/21/2018

14 Methods FASTA ClustalW HMMer A series of modules 9/21/2018

15 Infrastructure A query protein sequence A pathway A reference genome
Interested genomes Protein information Pathway Information Search for missing genes 9/21/2018

16 PLATCOM-Metabolic Pathway Division
9/21/2018

17 Sample Result –(1) 9/21/2018

18 Sample Result –(2) 9/21/2018

19 Sample Result –(3) 9/21/2018

20 Online Demonstration PlatCom:
A Platform for Computational Comparative Genomics 9/21/2018

21 Future Direction Use conserved domain to perform HMM search
Enable sequence alignment and pattern search Connect to other DBs Protein-Protein Interaction DBs PDB Improve the performance by using dynamic cache. 9/21/2018

22 Reference H. JEONG, H., TOMBOR, B., ALBERT, R., OLTVAI, Z. N., and BARABÁSI, A.-L., (2000), The large-scale organization of metabolic networks, Nature, 407: Karp, PD, (2001), Pathway Databases: A Case Study in Computational Symbolic Theories, Science, 293: 9/21/2018

23 Acknowledge Professor Sun Kim Kwangmin Choi Arvind Gopu 9/21/2018


Download ppt "Bioinformatics Capstone Project"

Similar presentations


Ads by Google