Biocyc.org Identify Pathway Hole Fillers Definition: Pathway Holes are reactions in metabolic pathways for which no enzyme is identified in the PGDB. holes.

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

SRI International Bioinformatics Comparative Analysis Q
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
SRI International Bioinformatics 1 Orthology-Based Multi-PGDB Curation Tools Suzanne Paley Pathway Tools Workshop 2010.
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of a different aspect of the cellular.
Overview of the Pathway Tools Software and Pathway/Genome Databases.
Profiles for Sequences
Assuming normally distributed data! Naïve Bayes Classifier.
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
©CMBI 2005 Search tools Google, MRS, SRS. ©CMBI 2004 Search tools SRS = Sequence Retrieval System MRS = Maarten’s Retrieval System Google = Thé best generic.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Genome Annotation BCB 660 October 20, From Carson Holt.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Public Resources (II) – Analysis tools  Web-based analysis tools – easy to use, but often with less customization options.  Stand-alone analysis tools.
Multiple testing correction
SRI International Bioinformatics 1 Pathway Tools: Recent Developments GMOD Meeting, June 2006.
Metagenomic Analysis Using MEGAN4
Overviews, Omics Viewers, and Object Groups. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of cellular machinery.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of cellular machinery l Cellular Overview.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
NGS Bioinformatics Workshop 1.5 Tutorial – Genome Annotation April 5th, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor, MBB.
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
PathoLogic Pathway Predictor
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Whole Genome Repeat Analysis Package A Preliminary Analysis of the Caenorhabditis elegans Genome Paul Poole.
Cellular Overview and Omics Viewer. SRI International Bioinformatics The Cellular Overview Diagram A way to quickly visualize an organism’s metabolism.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
WFM 6311: Climate Risk Management © Dr. Akm Saiful Islam WFM 6311: Climate Change Risk Management Akm Saiful Islam Lecture-7:Extereme Climate Indicators.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Introduction to biological molecular networks
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Welcome to the combined BLAST and Genome Browser Tutorial.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
DNA / protein sequence analysis 第九組成員: 吳宇軒 侯卜夫 朱子豪 王俊偉
CACI Proprietary Information | Date 1 PD² SR13 Client Upgrade Name: Semarria Rosemond Title: Systems Analyst, Lead Date: December 8, 2011.
bacteria and eukaryotes
Comparative Analysis in BioCyc
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
Bioinformatics Capstone Project
A Community Effort to Model the Human Microbiome
Genome Center of Wisconsin, UW-Madison
Predicting Active Site Residue Annotations in the Pfam Database
Comparative Analysis Q
Annotation Presentation
Basic Local Alignment Search Tool
Explore Evolution: Instrument for Analysis
False discovery rate estimation
Overview of the Pathway Tools FBA Module
Part II SeqViewer AraCyc Help
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Presentation transcript:

Biocyc.org Identify Pathway Hole Fillers Definition: Pathway Holes are reactions in metabolic pathways for which no enzyme is identified in the PGDB. holes are indicated by purple lines L-aspartate iminoaspartate quinolinate nicotinate nucleotide deamido-NAD NAD quinolinate synthetase nadA pyrophosphorylase nadC NAD+ synthetase NH 3 -dependent CC

Biocyc.org organism 1 enzyme A organism 2 enzyme A organism 5 enzyme A organism 4 enzyme A organism 3 enzyme A organism 8 enzyme A organism 7 enzyme A organism 6 enzyme A Step 1: collect query isozymes of function A based on EC# Step 2: BLAST against target genome Step 3 & 4: Consolidate hits and evaluate evidence Candidates Gene X Gene Y Gene Z gene Y gene X gene Z target genome Algorithm for identifying candidates and consolidating data…

Biocyc.org Features used to calculate the probability that a protein has the desired function… Best E-valueBest E-value Avg. rankAvg. rank Avg % alignedAvg % aligned Number of query sequences alignedNumber of query sequences aligned Candidate in same directon as another pathway gene?Candidate in same directon as another pathway gene? Candidate is adjacent to a gene that catalyzes an adjacent reaction?Candidate is adjacent to a gene that catalyzes an adjacent reaction? Candidate catalyzes another pathway reaction?Candidate catalyzes another pathway reaction?

Biocyc.org Use Bayesian classifier to evaluate candidates protein has function A AverageRank BestE-valueAverage % aligned # Query Sequences Same Gene? Adjacent Gene? Gene from Operon?

Biocyc.org Computing P(has function) Apply Bayes’ rule: protein has function A Average Rank Same Operon? Compute probability distributions, i.e., P(evidence|true) and P(evidence|false), from the “known” reactions in the database. e.g., Same operon? 0.96 (TN)0.76 (FN)no 0.04 (FP)0.24 (TP)yes False hit ~Has-Fn(A) True hit Has-Fn(A) In operon?

Biocyc.org Computing P(has function) Example: Candidate X has avg-rank 1.5 and is in a directon with another pathway gene. From training data: P(average-rank = 1.5 | has-function) = 0.40 P(average-rank = 1.5 |  has-function) = 0.03 P(pathway-directon = true | has-function) = 0.24 P(pathway-directon = true |  has-function) = 0.04 P(has-function) = (4.1% of candidates in training data are true hits) protein has function A Average Rank Same Operon?

Biocyc.org Steps that must be completed before running the Pathway Hole Filler Install BLAST executable (should already be installed on training room machines)Install BLAST executable (should already be installed on training room machines) Prepare BLAST protein dbPrepare BLAST protein db –Need FASTA format genome nucleotide sequence (see me if you have something different, like ESTs, or have no nucleotide sequence data file) In general, the more pathways in your PGDB, the more candidates the pathway hole filler will have to findIn general, the more pathways in your PGDB, the more candidates the pathway hole filler will have to find

Biocyc.org Conceptual stages of the pathway hole filler 1. Prepare training data for Bayes classifier Collect feature data for known rxns in PGDB Collect feature data for known rxns in PGDB Calculate probability distributions for classifier Calculate probability distributions for classifier 2. Identify and evaluate candidates Collect feature data for each candidate Use classifier to determine P(has-function) 3. Choose holes to fill in KB Either select all above a cut-off or manually review candidates

Biocyc.org Navigating to the Pathway Hole Filler

Biocyc.org Step 1: Prepare Training Data… Calculate training data from your organism or use existing training data… Once Step 1 has been completed, the training data are saved and can be reused (even in another Pathway Tools session). If using existing data from E. coli the training data are based on data from the literature.

Biocyc.org Step 2: Identify & Evaluate Candidates…

Biocyc.org Step 2: Identify & Evaluate Candidates Select reactions from a list Select pathways from a list

Biocyc.org Modes of operation… Fully automatic No interaction required from userNo interaction required from user All default values usedAll default values used –Prepare training data – all known rxns in KB –Identify and evaluate candidates – all pathways with pathway holes –Choose holes to fill in KB – all holes with P>0.9 filled Evidence code: “Automatic inference from sequence similarity”Evidence code: “Automatic inference from sequence similarity”

Biocyc.org Modes of operation… Wizard Wizard prompts user for training data source and for which holes to make predictions. Wizard runs Steps 1 & 2, then prompts user to complete Step 3. Power-user mode User must proceed through each step in order. Program still prompts user for required parameters, but each step must be completed before advancing to next step.

Biocyc.org Step 3: Choose Holes to Fill in KB

Biocyc.org

Output from Pathway Hole Filler - from “Prepare Training Data” step ROOT/ptools-local/pgdbs/user/ORGIDcyc/VERSION/data/ (e.g., ROOT/ptools-local/pgdbs/user/caulocyc/1.0/data/) rxn-list = data retrieved from ORGID for calculating training datarxn-list = data retrieved from ORGID for calculating training data priors/ = directory containing training data that is loaded when using existing data from ORGIDpriors/ = directory containing training data that is loaded when using existing data from ORGID These files contain the training data computed in Step 1. If either file is available, the user may use “existing” training data in Step 1.These files contain the training data computed in Step 1. If either file is available, the user may use “existing” training data in Step 1. * Each file is overwritten each time you run this step.

Biocyc.org Output from Pathway Hole Filler - from “Identify and Evaluate Candidates” step ROOT/ ptools-local/pgdbs/user/ ORGIDcyc/VERSION/reports/ (e.g., ROOT/ptools-local/pgdbs/user/caulocyc/1.0/reports/) ORGID_filled-holes.html = the list of holes that user selected to fill in the KB in Step 3.ORGID_filled-holes.html = the list of holes that user selected to fill in the KB in Step 3. ORGIDholesX-Y.html (e.g., CAULOholes0-10.html)ORGIDholesX-Y.html (e.g., CAULOholes0-10.html) blasterrors.log = log of each rxn describing whether or not any candidates were foundblasterrors.log = log of each rxn describing whether or not any candidates were found hole-data = file containing data found for each rxn, used to generate list in “Choose holes to fill in KB” dialogue. If this file is available, step 3 can be initiated without repeating Step 2.hole-data = file containing data found for each rxn, used to generate list in “Choose holes to fill in KB” dialogue. If this file is available, step 3 can be initiated without repeating Step 2. * Each file is overwritten each time you run this step.

Biocyc.org Reference for the Pathway Hole Filler Green, ML and Karp, PD. A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics 2004, 5:76.

Biocyc.org Pathway Hole Filler Demo (1) Prerequisites: HpyCyc installedHpyCyc installed BLAST installed and workingBLAST installed and working For EcoCyc, the data/priors/ directory neededFor EcoCyc, the data/priors/ directory neededDemo: Using Power User mode, to save timeUsing Power User mode, to save time Select HpyCycSelect HpyCyc Refine->PHF->Step 1: Prepare Training DataRefine->PHF->Step 1: Prepare Training Data In popup, select HpyCyc and 2-3 reactionsIn popup, select HpyCyc and 2-3 reactions

Biocyc.org Pathway Hole Filler Demo (2) once more:once more: Refine->PHF->Step 1: Prepare Training Data Refine->PHF->Step 1: Prepare Training Data In popup, select EcoCyc and say Yes toIn popup, select EcoCyc and say Yes to use existing Training Data use existing Training Data Refine->PHF->Step 2: Identify CandidatesRefine->PHF->Step 2: Identify Candidates –In popup, select Pathways from a List –Select Pyridnucsyn-Pwy Refine->PHF->Step 3: Choose Holes to Fill in KBRefine->PHF->Step 3: Choose Holes to Fill in KB