Presentation on theme: "Part I: Tips and techniques from curators Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science."— Presentation transcript:
Part I: Tips and techniques from curators Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science
Scientists often want to work with more than one gene or protein that are related by a common feature TAIR (and the PMN) offer some basic tools to create customized data sets (e.g. lists of genes or proteins) to add more information to data sets to analyze data sets Sometimes, one gene isnt enough...
Data sets can be based on many different criteria: Overall sequence alignment (DNA or protein) Sequence motifs (DNA or protein) Protein domains and biochemical properties Gene/Protein function Subcellular location Molecular function Biological process Expression pattern Biochemical pathway Mapping region Phenotype Gene families Creating customized data sets using TAIR and the PMN How do you generate these data sets?
Creating customized data sets: TAIR Data sets can be obtained using several strategies at TAIR Advanced search pages Data-mining tools
Creating customized data sets: PMN Data sets can be obtained using several strategies at the PMN What is the PMN? It is the home of AraCyc – the Arabidopsis metabolic pathway database The Plant Metabolic Network (PMN) maintains a set of metabolic pathway databases for Arabidopsis and other plants Provides tools to analyze metabolic data Generates new metabolic pathway databases for crops and other important plants
Pathway Enzyme Gene Reaction Compound Evidence Code AraCyc Pathway pages contain several types of data Metabolic pathway data in AraCyc at the PMN
Pathway pages contain curated comments and useful links Metabolic pathway data in AraCyc at the PMN
Creating customized data sets: PMN Data sets can be obtained using several strategies at the PMN Advanced search page Data-mining tools (*coming soon*) Metabolic pathway pages
Additional information can be obtained for your data set Enhancing customized data sets Bulk data retrieval tool FTP files
You have mapped a mutation that disrupts flower development to a region of Chromosome 1 What are some good candidates in the mapping interval? Get a list of all the genes in the mapping interval and find candidates involved in flower development Find all the associated gene function (GO) and expression (PO) annotations for the candidate genes Obtain gene confidence scores for all associated gene models to choose sequence for complementation Customized data sets: case studies
Get a list of all the genes in a mapping interval involved in flower development Customized data sets: Flower development PVV4.1NCC1
Customized data sets: Flower development AT1G09000 MAP kinase kinase kinase activity, cellular component unknown, embryo, flower, flower development, kinase activity, leaf, petal differentiation and expansion stage, response to oxidative stress, root, seed, shoot apex, whole plant, D bilateral stage, E expanded cotyledon stage, F mature embryo stage, Choose gene models to express for complementation experiments...
Customized data sets: Flower development Obtain gene confidence scores for all associated gene models
You work on a transcription factor that affects jasmonic acid biosynthesis Do JA biosynthetic genes share common sequences in their promoters? Obtain a list of all the genes involved in JA biosynthesis Get upstream promoter sequences Search for over-represented DNA sequences in promoters Creating customized data sets
Customized data sets: JA biosynthesis jasmonic acid
Customized data sets: JA biosynthesis Take this gene list to TAIR... to get upstream sequences
Customized data sets: JA biosynthesis Get upstream promoter sequences
Customized data sets: JA biosynthesis Search for over-represented or prevalent DNA sequences in promoters Use the Motif Analyzer in TAIR to identify common 6-mers AT1G69490 AT1G48270 AT1G11870 AT1G12820
Creating customized data sets You are studying a protein with an exciting new domain: Thr-x-Ala-x-Ile-x-Arg Are there other TxAxIxR proteins? Do they share additional domains? Find all of the proteins that have the TxAxIxR domain Identify all of the other domains found in those proteins
Customized data sets: TxAxIxR proteins Find all of the proteins that have the TxAxIxR domain
Customized data sets: TxAxIxR proteins Identify all of the other domains found in those proteins
Analyzing data sets Sometimes you want to analyze data sets We have a few analysis tools: Analyze = DISPLAY data in a visual manner with a few statistics Data must be pre-cleaned If you want to display quantitative metabolic data on genes, enzymes or compounds OMICS viewer If you want to look for over-represented annotations for a list of genes or proteins All the genes up-regulated in a mutant All of the proteins found in the ovule GO categorization tool
GO categorization Classify your list of genes/proteins using GO annotations
... or use a tool at AmiGO (on hand-out)
Putting TAIR and the PMN to work for you Use TAIR to find detailed information for specific genes or proteins Locus page, gene model page, protein page Many sections, many data types, many external links GBrowse Many tracks New gene confidence scores as part of TAIR9 release Use TAIR and the PMN to generate and work with customized data sets Create and add data to lists of proteins and genes Specific and Advanced Search pages Motif analysis tools FTP files with large data sets Visualize and analyze data OMICs viewer (PMN) GO categorization (TAIR) If youre having trouble getting any information you want...
We are here to help! Please visit us and ask questions at the Curation Booth! Workshop Part II: Practice sets and individual help
Acknowledgements TAIR, AraCyc, and the PMN Current Curators: - Tanya Berardini (lead curator – functional annotation) - David Swarbreck (lead curator – structural annotation) - Peifen Zhang (Director and lead curator- metabolism) - A. S. Karthikeyan (curator) - Philippe Lamesch (curator) - Donghui Li (curator) - Rajkumar Sasidharan (curator) Recent Past Contributors: - Debbie Alexander (curator) - Christophe Tissier (curator) - Hartmut Foerster (curator) Tech Team Members: - Bob Muller (Manager) - Larry Ploetz (Sys. Administrator) - Raymond Chetty - Anjo Chi - Vanessa Kirkup - Cynthia Lee - Tom Meyer - Shanker Singh - Chris Wilks Metabolic Pathway Software: - Peter Karp and SRI group Eva Huala (Director and Co-PI) Sue Rhee (PI and Co-PI)
Part I: Tips and techniques from curators Bonus slides... Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science
Customized data sets: Flower development Find all the associated GO terms and PO terms and get evidence codes
Obtain a list of all the genes involved in JA biosynthesis Customized data sets: JA biosynthesis
Another option Use pathway page
Customized data sets: JA biosynthesis Experimental results provide a more detailed sequence: (A or T)C(A or C or G)TCGGT(G or T)A