How pathway databases were created and curated Peifen Zhang Plant Metabolic Network (PMN)

Slides:



Advertisements
Similar presentations
Bienvenidos al PMN! Kate Dreher curator PMN/TAIR.
Advertisements

Annotation of Gene Function …and how thats useful to you.
The Arabidopsis Information Resource (TAIR)
Kate Dreher AraCyc, TAIR, PMN Carnegie Institution for Science
El PMN: Tu amigo en el metabolismo de plantas Kate Dreher curator PMN/AraCyc/TAIR.
Part I: Tips and techniques from curators Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
SRI International Bioinformatics Comparative Analysis Q
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
From Databases to Dynamics Dr. Raquell M Holmes Center for Computational Science Boston University.
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
Using DNA Subway in the Classroom Red Line Lesson Sketch.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Development of Bioinformatics and its application on Biotechnology
Sequence Databases What are they and why do we need them.
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
New data and tools at TAIR (The Arabidopsis Information Resource)
Accessing information in plant metabolic pathway databases at the PMN, Gramene, and SGN Part I: Contents, Search Strategies, and Data Sharing Opportunities.
TAIR/Gramene/SGN Workshop I ASPB Meeting July 08, 2007 Chicago, IL Metabolic Databases.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
Introduction to Biology. Section 1  Biology and Society Biology  The study of life.
Tutorial on Current Biochemical Pathway Visualization Tools By Rana Khartabil.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
PlantCyc, AraCyc, PoplarCyc and more... Building databases and connecting to researchers at the Plant Metabolic Network kate dreher curator PMN/TAIR.
Organizing information in the post-genomic era The rise of bioinformatics.
The Plant Genome Research Program BIO AC Meeting November 17, 2005 Machi F. Dilworth DD/DBI What are the research questions being supported for the activity.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Combining Computational Prediction and Manual Curation to Create Plant Metabolic Pathway Databases Peifen Zhang Carnegie Institution For Science Department.
Metabolic Pathway Databases and Tools Speaker and Schedule Update PMN (Peifen Zhang) KEGG (auto-slide show) MetaCrop (cancelled)
DATA MANAGEMENT AND CURATION AT TAIR
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Data Mining at PLEXdb : Plant and Plant Pathogen Gene Expression Database.
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
PlantCyc, AraCyc, PoplarCyc and more... Building databases with YOUR help at the Plant Metabolic Network kate dreher curator PMN/TAIR.
A database of biological pathways and processes (borrowed from a presentation created by Steve Jupe)
Building and Refining AraCyc: Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
1 AraCyc Metabolic Pathway Annotation. 2 AraCyc – An overview  AraCyc is a metabolic pathway database for Arabidopsis thaliana;  Computational prediction.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Reconstructing the metabolic network of a bacterium from its genome: the construction of LacplantCyc Christof Francke In silico reconstruction of the metabolic.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
13-2: Manipulating DNA Biology 2. Until very recently breeders could not change the DNA of the plants/animals they were breeding Scientists use DNA structure.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
The Pathway Tools FBA Module
Department of Genetics • Stanford University School of Medicine
A Community Effort to Model the Human Microbiome
Overview of Microbial Pathway and Genome Databases
Incremental PathoLogic
Overview of the Pathway Tools FBA Module
Part II SeqViewer AraCyc Help
Presentation transcript:

How pathway databases were created and curated Peifen Zhang Plant Metabolic Network (PMN)

About PMN,

PMN is A network of plant metabolic pathway databases and database curation community –A plant reference database, PlantCyc Genes, enzymes and pathways consolidated from all plant species –A collection of single-species pathway databases Pathway Genome Databases (PGDB) Genes, enzymes and pathways in a particular species –A community for data curation Curators at databases (PMN, Gramene, SGN etc) Researchers in the plant biochemistry field

Prediction of PGDBs, why Huge sequence data are generated from genome and EST projects Put individual genes into a metabolic network Use the network to visualize and analyze large experimental data sets, discover missing enzymes, design metabolic engineering, conduct comparative and evolutionary studies

Creation of PGDBs, how Manual extraction of pathways from the literature, assigning genes/enzymes to pathways Computational assigning genes/enzymes to reference pathways, manual validation/correction and further curation

Prediction of PGDBs, how Annotated sequences, molecular function A reference database (such as MetaCyc and PlantCyc) PathoLogic (Pathway Tools software)

PathoLogic ANNOTATED GENOME AT1G69370 chorismate mutase prephenate aminotransferase arogenate dehydratase chorismateprephenateL-arogenateL-phenylalanine Gene calls Gene functions DNA sequences AT1G69370 chorismate mutase MetaCyc PGDB

A snap shot of AraCyc Arabidopsis genome –27,235 protein coding genes AraCyc –6158 enzyme coding genes –2733 genes are assigned to reactions –1914 genes are assigned to pathways

Currently available PGDBs SpeciesDatabaseStatus ArabidopsisTAIRSubstantial curation RiceGrameneSome curation SorghumGrameneNo curation MedicagoNoble Foundationsome curation TomatoSGNsome curation PotatoSGNNo curation PepperSGNNo curation TobaccoSGNNo curation PetuniaSGNNo curation CoffeeSGNNo curation

Prediction of new PGDBs by PMN Prioritization –Available sequences, economic impact High priority –Maize, Poplar, Soybean, Wheat Second priority –Cotton, Grape, Sugarcane, Sunflower, Switchgrass…

A quality database REQUIRES manual validation and curation

Validation: pruning false-positive predictions Pathways not operating in plants or not in a target species –glycogen biosynthesis –C4 photosynthesis –caffeine biosynthesis Pathways operating via a different route –Phenylalanine biosynthesis in bacteria v.s. in plants

Validation: adding evidence and literature supports

Pathways are supported by different evidence Pathways supported by molecular data enzymes and genes Pathways based on radio tracer experiments no enzymes or genes Expert hypothesis (paper chemistry) Pure computational prediction

Correcting pathway diagrams

Curating missing pathways What information are curated from the literature – Pathway: diagram, summary, evidence, citations – Reaction: co-substrates, EC number – Compound: name and synonyms, structure – Enzyme: coding gene, physical-/biochemical properties, evidence, comments, citations

Source of literature PubMed, SciFinder Special journals (i.e. phytochemistry), Books in specialized field (i.e. alkaloids)

Curation workflow identify a pathway find details of reactions find details of enzymes data entry structure of substrates EC number enzymes physical & chemical properties coding gene reactions species draw pathway diagram

Current curation priority Big economic impact –Bio-energy production, i.e. cell wall components –Industrial material, i.e. rubber –Medicinal metabolites Under-represented domains –i.e. quinones, volatiles

The importance of community contribution, why we need your help A mountain of information –17 million citations in PubMed alone –4208 citations in PlantCyc Triage the most up-to-date and most relevant references Synthesize and extract information from individual papers

The importance of community contribution, why we need your help Limited human resource –curator (3 at PMN, 1 at SGN, 1 at Gramene) Limited expertise –molecular biologist, may be familiar in one particular pathway, but certainly not all the pathways.

How you can help Expedite data coverage –Submitting a pathway, an enzyme, a bunch of compounds Enhance data accuracy –Reporting errors Your idea/need of new features and functionalities

Data submission forms

Reporting errors

to us

The PMN project, us and you PlantCyc poplar wheat maize AraCyc tomato rice medicago sugarcane other… MetaCyc