Download presentation
Presentation is loading. Please wait.
1
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab
2
Genome Engineering Lab The Newest
3
What do they evaluate ?? They seek to determine the accuracy of different computational method for predict metabolic pathway. The Comparison between different reference PGDB and the same early prediction algorithm called HpyCyc-1 and HpyCyc-2 PGDB The Comparison between different prediction algorithm using MetaCyc DB as the reference PGDB called HpyCyc – 2A and HpyCyc – 2B Compare prediction HpyCyc - 2B with manual Prediction for H.pylori False positive Genome Engineering Lab
4
What is PGDB (Pathway/Genome Database ) A database that describes the genome of an organism (its chromosome(s), genes, and genome sequence), the product of each gene, the biochemical reaction(s) catalyzed by each gene product, the substrates of each reaction, and the organization of reactions into pathways. PGDB is not only the database, but also a software MetaCyc Database: A PGDB containing metabolic data for more than 150 organisms,include EcoCyc 。 EcoCyc Database: A PGDB for the organism E. coli. The majority of the information in EcoCyc is derived from the biomedical literature 。 Framed based Knowledge system and flat file to input. Genome Engineering Lab
5
Pathway Tools Software Pathway Tools Software: Software used to construct, update, visualize, query, and analyze PGDBs. The three components of the Pathway Tools are as follows : The Pathway/Genome Navigator supports querying, visualization, and analysis of PGDBs. The Pathway/Genome Navigator The Pathway/Genome Editors support interactive updating and refinement of PGDBs. The PathoLogic pathway-prediction program supports automated creation of a PGDB and prediction of the metabolic pathway complement of an organism. Genome Engineering Lab
6
The Pathologic program use to predict The Pathologic predict metabolic pathway of an organism from its annotated genome and produce new PGDB. It takes as input an annotated and Genebank format file. The second input required by Pathologic is the reference pathway DB such as EcoCyc, MetaCyc Genome Engineering Lab
7
Link enzyme to reaction algorithm : The matching process between the enzyme names and EC numbers listed in the annotated genome. The matching process is based on the functions assigned to individual genes by the genome center that annotated the genome. To Input LinkEnzymesToReactions(name,ecnum) that accepts as its inputs one or more alternative gene product name for a single enzymatic activity. The Pathologic Algorithm
8
The Pathologic Algorithm (cont.) And an EC number from a single Genbank coding region. It will return up to two reaction as its outputs that correspond to that enzyme activity. The Algorithm is as follows :
9
The Pathologic Algorithm Flow (Name, EC num) To Transfer Name to Cannon E Match by EC Number Find reaction in MetaCyc Or not Find reaction in MetaCyc Or not Store the reaction in R1 Yes If have Build hash table H in MetaCyc Enzyme nameReaction ……..…… To transfer Enzyme name and reaction In Canon To transfer Enzyme name and reaction In Canon Match or not Store the reaction in R2 Yes Compute variant form No
10
The Pathologic Algorithm Flow (cont.) If R1 <> R2 Report to user that enzyme name and EC Number are not inconsistent Yes No IF R1 Then Create a connection within the DB between The current enzyme and R1 IF R2 Then Create a connection within the DB between The current enzyme and R2
11
Why compute variant form?? In computeing variant forms of E the program attempts to remove various extraneous text that is too frequently found in Genbank format file. Such as : Prefix and Suffix words added to the enzyme name like “ putative“, “ probable”, “alpha chain”, “large subunit” etc parenthesized gene names that follow the product name in some Genebank entried. They still found that 10 – 20 % of the enzyme in a given genome are not identified because of not finding in H and depend on manual.
12
Infer pathway Once the matching process is complete, this program has inferred a set of reaction expected to occur in the target organism. The remaining task is to determine which pf those pathway are likely to be present in the organism.
13
The evidence for inferring pathway If there is evidence for some reaction s in a pathway,there are three possible interpretations : The pathway is not present in the target organism. A variant form of the pathway is present in the target organism that uses some but not all of the steps from the pathway, as described in the reference pathway. The pathway is present in the target organism, but the genes for the missing reaction steps either have not been found by the name- matcher or have not yet been identified in the genome.
14
Result : Comparison of HpyCyc – 1 with.2A They expected that, by using MetaCyc as the reference DB, it would be able to infer additional pathways that are not found in E.Coli. With MetaCyc as the reference database, 135 pathways were predicted, as opposed to 77 pathways when EcoCyc was used to as the reference DB.
15
Result : Comparison of HpyCyc – 2A with.2B They created a new version of PathoLogic called PathoLogic 2, containing an algorithm that identifies false positive pathway. The enhanced algorithm removes only pathways that we believe to be false positive predictions.(Criteria) HpyCyc-2B contains 98 pathways, which is almost 30% fewer pathway than does HpyCyc – 2A.
16
Following heuristic criteria The pathway contained evidence for no unique reaction The pathway was classified as a biosynthetic pathway and was missing one or more steps from the end of the pathway. The pathway was classified as a degradation pathway and was missing one or more steps from its beginning. The pathway consisted of more than two reactions but contained evidence for only a single reaction.
17
Result : Comparison of HpyCyc – 2B with Manual We compared the 98 pathways predicted by the HpyCyc-2B with the result of a manual analysis of the pathways of H – Pylori
18
Discussion Since the enzyme to reaction matching procedure is fairly consertvative, we did not expect it to make many incorrect matches. The Genebank file contains few or no EC Numbers.
19
Comparsion with other pathway prediction algorithm The comparsion are hampered by two factors first : published prediction algorithms are not clear second: KEGG lie ……(about EC number and prediction) About WIT : It is not clear if the pathway prediction process is automated or manual. WIT does seem to be much selective in its prediction but It didn’t predict any obviously incorrect photosynthetic pathway for H.pylori. WIT failed to predict such pathway as glycolysis or Entner - Doudoroff
20
Conclusion This study validate the usefulness of pathologic as a tool for metabolic analysis of an organism’s annotated genome False positive would be increased from EcyC to MetaCyc False positive can be decreased by pathlogic -2 exceeding the expert analysis is comprehensiveness
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.