Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010
Outline 1.Metabolic annotation from genomic data 2.Pathway Tools and MetaCyc facilitated genome/pathway annotation 3.Pathway/genome database (PGDB) exploration 4.Potential applications in environmental ‘omics
?!?!
Metabolic Annotation Workflow Assembly of DNA sequence from an organism Identification of protein-encoding genes GLIMMER FGENESB Functional annotation of genes InterProScan BLAST (COG, KEGG, etc.) Inference of metabolic pathways Microsoft Excel Illustrator Web browser Generation of a cellular/community metabolic workflow Illustrator Automated Manual Pathway Tools
is a software suite for working with pathway/genome databases
Pathway/Genome Databases (a.k.a. PGDBs) integrate genomic data with detailed functional annotations (including metabolic pathways)
The Three Tiers of PGDBs Tier 1 Literature derived 2 PGDBs MetaCyc and EcoCyc Tier 2 Computationally derived 15 PGDBs HumanCyc, AraCyc and other model organisms Tier 3 No curation ~400 PGDBs
Pathway Tools PGDBs MetaCyc EcoCyc YourCyc PGDB viewerPGDB editorPathoLogic
PGDB Viewer: Genome Overview
PGDB Viewer: Pathway Details
PGDB Viewer: Cellular Overview
MetaCyc is a continuously curated PGDB of reference pathways from organisms encompassing all domains of life
MetaCyc Statistics by Year Adapted from Metabolic pathways Reactions Enzymes Genes Chemical compounds Organisms Citations
The Metabolic Hierarchy of PGDBs Metabolic pathways in PGDBs are organized in hierarchies where each pathway can have superpathways and subpathways and variants e.g., superpathway of arginine and ornithine degradation Superclasses: Degradation/Utilization/Assimilation Amino Acids Degradation Arginine Degradation Superpathways Subpathways: 4-aminobutyrate degradation I, arginine degradation III (arginine decarboxylase/agmatinase pathway), superpathway of arginine, putrescine, and 4-aminobutyrate degradation, superpathway of ornithine degradation, putrescine degradation II, putrescine degradation I Variants: arginine degradation II (AST pathway)
Novel metabolic pathways can be created at will in Pathway Tools (PGDBs are therefore dynamic)
PathoLogic predicts metabolic pathways in a PGDB based on reference PGDBs
PathoLogic reads GenBank and nucleotide FASTA files (but it also has a native file format)
PathoLogic infers metabolic pathways from functional annotations Product names EC numbers GO terms
PathoLogic applies an iterative algorithm that keeps track of candidate and undecided pathways
How PathoLogic Works Determine candidate and undecided pathways Prune pathways using “keep tests” and “delete tests” Keep remaining candidate and undecided pathways
How PathoLogic Works Iterative algorithm that tracks candidate and undecided pathways “Keep tests” – “Mostly present” pathways – Pathways with unique reactions present – Pathways with “key reactions” present “Delete tests” – “Mostly absent” pathways – Biosynthentic pathways lacking final steps – Degradative pathways missing initial steps – Pathways missing “key reactions”
Demonstration on two rRNA-containing fosmids of an uncultivated organism
Resources Pathway Tools MetaCyc