The consistency Checker, or Overhauling a PGDB By Ron Caspi.

Slides:



Advertisements
Similar presentations
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Advertisements

The Pathway/Genome Navigator (These slides are a guide as you experiment with the Navigator)
AS ICT Finding your way round MS-Access The Home Ribbon This ribbon is automatically displayed when MS-Access is started and when existing tables.
A complete citation, notecard, and outlining tool
SRI International Bioinformatics Data Import / Export Markus Krummenacker Bioinformatics Research Group SRI, International Q
Unbalanced Reactions by Markus Krummenacker Q
Chapter 2 Creating a Research Paper with Citations and References
Microsoft Word 2003 Tutorial 2 – Editing and Formatting a Document.
Chapter 3 Working with Text and Cascading Style Sheets.
WebDFS Budget Amendment and Personnel Processing.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Creating Custom Forms. 2 Design and create a custom form You can create a custom form by modifying an existing form or creating a new form. Either way,
Microsoft Access 2003 Introduction To Microsoft Access 2003.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
XP New Perspectives on Microsoft Office Word 2003 Tutorial 1 1 Microsoft Office Word 2003 Tutorial 1 – Creating a Document.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
XP 1 Microsoft Office Word 2003 Tutorial 1 – Creating a Document.
Catalog: Batch delete old Patron Records How to conduct global/batch updates to records – patron Adding Faculty and Patron/Student Records Manually Standardizing.
Reference Manager Making your life easier! Updated September 2007.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
COMPREHENSIVE Excel Tutorial 8 Developing an Excel Application.
Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh.
Chapter 3 Working with Text and Cascading Style Sheets.
Microsoft Office Word 2003 Tutorial 1 Creating a Document.
Working with a Database
Ensure that the Field Day Call Sign is correct.
Spreadsheets in Finance and Forecasting Presentation 8: Problem Solving.
Spreadsheets in Finance and Forecasting Presentation 9 Macros.
Chapter 6 Generating Form Letters, Mailing Labels, and a Directory
Ensure that the Field Day Call Sign is correct.
Instructors begin using McGraw-Hill’s Homework Manager by creating a unique class Web site in the system. The Class Homepage becomes the entry point for.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
← Select Exchange Once logged in. ↓ click Join Course Icon.
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
Lesson 12: Creating a Manual and Using Mail Merge.
Chapter 17 Creating a Database.
XP New Perspectives on Microsoft Access 2002 Tutorial 1 1 Microsoft Access 2002 Tutorial 1 – Introduction To Microsoft Access 2002.
Voting with SOSA/OVRD SOSA/OVRD. Login and Information Screens After login, the SOSA Voting Information screen displays The Database drop-down allows.
Diagnostic Pathfinder for Instructors. Diagnostic Pathfinder Local File vs. Database Normal operations Expert operations Admin operations.
SESSION 3.1 This section covers using the query window in design view to create a query and sorting & filtering data while in a datasheet view. Microsoft.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
Use CSS to Implement a Reusable Design Selecting a Dreamweaver CSS Starter Layout is the easiest way to create a page with a CSS layout You can access.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
Instructor Suleiman Muhammad (mcpn,mncs)
To increase performance and to add future capabilities to ESC, the following screens have been converted to.NET: Customer Information Qualifications Screen.
Chapter 3 Automating Your Work. It is frustrating when you have to type the same passage of text repeatedly. For example your name and address. Word includes.
Editing Pathway/Genome Databases Compounds, Reactions and Pathways Ron Caspi.
SRI International Bioinformatics Update your computers! To install a patch: Tools => Instant Patch => Download and Activate All Patches.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
© 2015 Ex Libris | Confidential & Proprietary Yoel Kortick Senior Librarian Cataloging introductory flow.
XP Creating Web Pages with Microsoft Office
Online Civil Pensioner’s Medical Services
Excel Tutorial 8 Developing an Excel Application
Editing Pathway/Genome Databases
Microsoft Word Illustrated
by Markus Krummenacker June 2011
About SharePoint Server 2007 My Sites
The Pathway Tools Schema
Mail Merge And Macros in MS WORD
How to Administer a PGDB
Cataloging introductory flow
Working with Text and Cascading Style Sheets
Incremental PathoLogic
Propagating Changed Annotation and Pathway Information
Annotation Presentation
N3FJP Field Day Logger.
Unbalanced Reactions by Markus Krummenacker Q
Presentation transcript:

The consistency Checker, or Overhauling a PGDB By Ron Caspi

PGDB Atrophy Your PGDB started out all smooth and shiny… …but after a few years, it looks more like this

It’s time for an overhaul! Update genome annotation Propagate updates from Reference DB (MetaCyc) Re-run the name matcher Rescore pathways Re-run the transcription unit predictor Run the consistency checker Create protein complexes Re-run the Transport Inference Parser

Propagating Updates From a Reference PGDB Invoke from the Tools menu (Propagate MetaCyc Data Updates) If your PGDB was created using a different reference PGDB, you can select it instead of MetaCyc

Propagating Data Updates Data updates are broken into three sections: Compounds Reactions Pathways

Propagating Compound Data For compounds, the software looks for differences in chemical structures and in the data stored in the different slots

Inspecting Differences When you click the “select for update” button, you can review the differences and decide what to do for each case.

Propagating Reaction Data For reactions, the software looks for differences in the reaction equation, as well as in the data stored in the different slots

Objects Not Present In The Reference Database When the software finds objects in the PGDB which are missing from the reference database, you can click the “Examine” button next to it to see the details. The software would try to find merge candidates for these objects

Propagating Pathway Data For pathways, the software looks for differences in the topology of the pathway, as well as in the data stored in the different slots When pathways are present in your PGDB but not in the reference PGDB, it may be for two reasons: either you created them (in which case you would probably want to keep them), or they were deemed incorrect or redundant in MetaCyc, in which case you would want to delete them. To make life easier: when modifying pathways in your PGDB, change the frame ID!

The Consistency Checker Consistency Checking should be performed routinely (every few months), and problems should be addressed

Automatic and Manual Tasks I recommend running the automatic tasks first I recommend running individual tasks one at a time. When you mouse over a task’s name, you will see documentation for that particular task in the bottom window pane

Consistency Checker Output The output appears on the right pane, but is also saved into a text file in the reports directory. The name and location of the file are printed at the end of the output.

Automatic Tasks: Check all links This tool looks at: Inverse links (compound- reaction, gene-protein, etc.) Pathway links Ghost reactions in pathways Pathways included in other pathways

Automatic Tasks: Check all links Warnings are not necessarily errors, but should be checked. For example, PWY-21 is completely redundant to P142- PWY and should be deleted.

More Automatic Tasks Verify pathways for duplicate reactions Verify replicon components and positions: ensures all genes exist, sorts based on position. Validate GO terms: updates the GO terms using the latest version of GO-KB, removes obsolete ones. Change compound names to string IDs: mostly applies to legacy data, where enzyme regulators may have been entered as text strings.

Yet More Automatic Tasks Run miscellaneous checks: formatting glitches in names, validity of superpathways, clears values of computed slots, deletes temporary frames created by breaks when the pathway editor runs Update proteins: molecular weights recalculated from sequence Check compound structures for redundant bonds

Automatic Tasks: Recompute database statistics Its the only way to change the numbers on the home page

Manual Tasks: the Constraint Checker Flags constraints issues. For example, if a slot is supposed to contain only compound frame IDs, but a different type of frame is listed among its values, the constraint checker identifies and flags the offensive value. The opposite is true as well: the checker will flag that compound as present in a slot of a frame that is not suppose to have such a value. (this means errors are often listed multiple times, under different frames) The checker also flags cardinality violations. For example, cases where more than one value is present in a slot that is only allowed to have a single value. This tool usually requires the most time and effort for correcting the problems.

Constraint Checker Error Reports (example 1) Obviously, this frame used to be classified as a protein, but has been converted at some point to a chemical compound. Thus, it should no longer contain a Modified-Protein slot.

Fixing The Problem The problematic slot shows up in blue. To solve the problem, highlight the attached value and remove it.

Constraint Error Reports (example 2) The problem here is that CPLX-2, a modified form of CPLX-1, has not been classified as a modified protein. The solution is to open CPLX-2 in the Protein Editor and classify it as a modified protein.

More Manual Tasks Verify all reactions and compounds: finds orphan enzymatic reaction frames (missing a protein, a reaction, or both); finds orphan reactions that are not associated with any other objects, looks for duplicate compounds. Generate reaction balance report

Frame References Errors Looking at that pathway’s comment, we find that the FRAME construct is missing the last bar.

More Manual Tasks Fix references between polypeptide and genes: adds the gene value to modified proteins that miss it, adds a capitalized gene name to the synonyms list, scans that list for duplicates, flags orphan genes and proteins. Check pathway reactions and validate EC numbers: checks the PREDECESSORS slot of pathway frames, flags references to deleted and transferred EC numbers. Check transcription units: looks for invalid frames, transcription units with no genes, with genes in different directions, etc.

Even More Manual Tasks Check citations: tries to find formatting problems, reports PubMed citations that have not been imported, provides statistics. Check external database link IDs: flags frames that are linked to the same external DB entry by links that are supposed to be unique. Check HTML tags: looks for formatting errors in HTML within comments.

And When You Finish, take pride at your newly renovated PGDB!