PathoLogic: More about Matching Enzyme Names to Reactions

Slides:



Advertisements
Similar presentations
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Advertisements

SRI International Bioinformatics Data Import / Export Markus Krummenacker Bioinformatics Research Group SRI, International Q
Unbalanced Reactions by Markus Krummenacker Q
SRI International Bioinformatics Comparative Analysis Q
RETRIEVING DATA FROM FCC LICENSE DATABASE Steps for obtaining query results, and importing it into MS Excel Spreadsheet.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Chapter 14 Getting to First Base: Introduction to Database Concepts.
Microsoft ® Office Excel ® 2007 Training Get started with PivotTable ® reports [Your company name] presents:
Microsoft ® Office Excel ® 2007 Training Get started with PivotTable ® reports Guangzhou Newelink Technology Co,. Ltd.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
PathoLogic Pathway Predictor. SRI International Bioinformatics Inference of Metabolic Pathways Pathway/Genome Database Annotated Genomic Sequence Genes/ORFs.
Remy Product Catalog Product Search/Cross References Product Details Application Search Model Search Service Parts.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
1 Introducing Windows Backup There are different methods for starting Windows 2000 Backup. Requirements for running Windows 2000 Backup All users can back.
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
XP New Perspectives on Microsoft Office Access 2003 Tutorial 12 1 Microsoft Office Access 2003 Tutorial 12 – Managing and Securing a Database.
First Screen : First window form will always remain open, for the user to select menu options. 1.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
DNA Replication Lecture 7. DNA Replication  Synthesis of two new DNA duplexes based on complementary base sequences with parental DNA.  Is progressive,
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
Regular Expressions.
SRI International Bioinformatics 1 Recent Pathway Tools Performance Enhancements (Versions 13.0 to 14.5) Bioinformatics Research Group SRI International.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
Blackboard 8: Grade Center This workshop is for existing users of Blackboard interested in keeping track of student grades online. Blackboard replaced.
1 Chapter 4: Creating Simple Queries 4.1 Introduction to the Query Task 4.2 Selecting Columns and Filtering Rows 4.3 Creating New Columns with an Expression.
Exploring Microsoft Access Chapter 7 Building Applications: The Switchboard, Macros, and Prototyping.
Chapter 3 Automating Your Work. It is frustrating when you have to type the same passage of text repeatedly. For example your name and address. Word includes.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Word Create a basic TOC. Course contents Overview: table of contents basics Lesson 1: About tables of contents Lesson 2: Format your table of contents.
Linux Operations and Administration
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
SCIENCE, GEOGRAPHY AND ICT Unit Overview Healthy Eating – Are We What We Eat?
Entity Relationship Diagram (ERD). Objectives Define terms related to entity relationship modeling, including entity, entity instance, attribute, relationship.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
OPERATORS IN C CHAPTER 3. Expressions can be built up from literals, variables and operators. The operators define how the variables and literals in the.
PathoLogic Pathway Predictor
Comparative Analysis in BioCyc
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
by Markus Krummenacker June 2011
Bioinformatics Research Group
The Pathway Tools Schema
Building Metabolic Models
How to Administer a PGDB
Comparative Analysis Q
Advanced PGDB Editing: Regulation GO Terms
BioCyc Update Notifications Suzanne Paley Pathway Tools Workshop 2018
Microsoft Office Access 2003
Incremental PathoLogic
Propagating Changed Annotation and Pathway Information
Click ‘browse’ to search your device for
Annotation Presentation
Getting to First Base: Introduction to Database Concepts
Advanced PGDB Editing: Gene Ontology (GO) Terms
Getting to First Base: Introduction to Database Concepts
The MultiOmics Explainer
SRI Bioinformatics Research Group
Sample Question Styles
Unbalanced Reactions by Markus Krummenacker Q
Presentation transcript:

PathoLogic: More about Matching Enzyme Names to Reactions

Inputs MetaCyc is the primary reference PGDB. (Most) name/reaction associations in MetaCyc will be available to the name matcher You can specify additional reference PGDBs using the Organism -> Specify Reference PGDB(s) menu item – useful if there is a manually curated PGDB for a closely related organism. Several additional name/reaction mapping files.

Mapping Files Allow you to specify additional name-reaction mappings not present in PGDBs aic-export/pathway-tools/pathologic/11.5/data/enzyme- mappings.dat: global mappings provided by us ptools-local/local-enzyme-mappings.dat: your local mappings; apply to all new PGDBs; persist between Ptools upgrades ptools- local/pgdbs/user/ORGIDcyc/VERSION/input/enzyme- mappings.dat: for current PGDB only

Mapping File Format Tab-delimited; two required columns, one optional Column 1: name Column 2: space-separated list of reactions associated with name Column 3 (optional): flag indicating whether name is ambiguous (T/NIL)

Overview of name matching The name matcher runs in three phases: Phase I: Build a table indexing the names in MetaCyc (and other ref. PGDBs) and the associated reactions; names checked for ambiguity Phase II: Look up protein names from the annotated genome in the table Phase III: Analyze nonmatching enzymes – look for “probable enzymes” and possible matches

Phase I: Build Index Protein function names can be stored in multiple places in MetaCyc. The name matcher indexes names from: reaction frames (e.g., official names assigned by EC) enzymatic reaction frames enzyme frames (provided that the enzyme is monofunctional and the name contains “ase” mapping files (described above) Names in gene frames are not indexed.

Phase I: Build Index Each name can be associated with multiple reactions. In some cases, the name is ambiguous. A pair of reactions having the same name are ambiguous if: the reactions' EC numbers (if any) don't match (partial match is okay, e.g., 1.2.3.- with 1.2.3.4) no enzyme in MetaCyc catalyzes both reactions Ambiguous matches are presented to the user for review. (“Assign Probable Enzymes”)

Phase II: Look up names In the simplest case, a protein has one function with one name. If the name exactly matches a name in the table, associate the protein with the reactions for that name. (Spaces and punctuation are ignored.) Some proteins have multiple functions with multiple names. How are they handled? Multiple functions are treated separately – each can give a matching set of reactions. Multiple names for a function are considered together – the first name that matches determines the reaction set.

But wait, there's more! The name matcher doesn't just check the exact name given in the annotation file. If the original name doesn't match, it tries a variety of “alternative” names: remove common prefixes and suffixes, such as “putative”, “probable”, “hypothetical”, “homolog”, “family protein”, etc. remove “subunit ___”, “small chain”, etc. (But see the Create Protein Complexes task) remove some gene names: “abcD”, short parenthesized words

Phase III: Nonmatching names If a name can't be matched, even in an alternative form, we try to decide whether it is a “likely metabolic enzyme”. This is true if: the name contains “ase” the name doesn't contain “RNA”, except “tRNA” the name doesn't match a list of nonmetabolic enzyme names (aic- export/pathway-tools/pathologic/VERSION/data/metabolic-enzyme-ruleout-words.dat) the name doesn't match a list of nonspecific enzyme names (aic- export/pathway-tools/pathologic/VERSION/data/nonspecific-enzyme-names.dat)

Phase III: Nonmatching names Likely metabolic enzymes can be reviewed in the “Assign Probable Enzymes” task under “Refine” Right click on an enzyme to get information about that enzyme, include a list of suggested reactions. Suggested reactions are found by approximate matching to MetaCyc names You can also split an enzyme name into separate names, flag for future research, or reject (nonmetabolic / nonspecific) See also name matching report: ptools- local/pgdbs/user/ORGIDcyc/VERSION/reports/name-matching-report.txt