Databases to Support Disease-Focused Research Type 1 Diabetes Huntington’s Disease Nat Goodman Institute for Systems Biology January 2003.

Slides:



Advertisements
Similar presentations
Molecular Systems Biology 3; Article number 140; doi: /msb
Advertisements

Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Partnerships in a Changing World Future Relationships between Publishers, Academic Libraries and Scientists.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
1/ Thomson Scientific/ 13 July 2015 Investigator Portal.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
WormBase: A Resource for the Biology & Genome of C. elegans Lincoln D. Stein.
BACKGROUND Have a gene involved in neurological disease, its function unclear Knockout is lethal, so… Designed a conditional knockout (cKO) mouse where.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Copyright OpenHelix. No use or reproduction without express written consent1.
Database Publishing at Nature Timo Hannay Nature Publishing Group 7 October 2005.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Networks and Interactions Boo Virk v1.0.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Finish up array applications Move on to proteomics Protein microarrays.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
Text Mining Special Interest Group Stuart Murray, Wyeth Research Novartis Institute for Biomedical Research, Cambridge, MA 6-8 th October 2004.
Copyright OpenHelix. No use or reproduction without express written consent1.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Got genom e? Community Meetings GMOD.org The GMOD community meets semi- annually to discuss GMOD components, best practices,
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
The generic Genome Browser (GBrowse) A combination database and interactive web page for manipulating and displaying annotations on genomes Developed by.
Workshop Aims NMSU GO Workshop 20 May Aims of this Workshop  WIIFM? modeling examples background information about GO modeling  Strategies for.
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
Bioinformatics Core Facility Guglielmo Roma January 2011.
A pilot KB of biological pathways important in Alzheimer’s Disease Tim Clark MassGeneral Institute for Neurodegenerative Disease June.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
The Gene Ontology and its insertion into UMLS Jane Lomax.
Copyright OpenHelix. No use or reproduction without express written consent1.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
1 Cancer Models Database (caMOD). 2 History  January 2000 – Prototype is presented during the Mouse Models of Human Cancers (MMHCC) Steering Committee.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Decoding the Network Footprint of Diseases With increasing availability of data, there is significant activity directed towards correlating genomic, proteomic,
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
From the Advanced Search page of the Cochrane Library, we have clicked on the Cochrane Reviews: By Topic hyperlink. This has displayed the Topics for Cochrane.
Copyright OpenHelix. No use or reproduction without express written consent1.
A curated database of biological pathways.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics and Computational Biology
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
What’s New in FlyBase EDRC 2015, Heidelberg. Visualising interaction networks.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
High throughput biology data management and data intensive computing drivers George Michaels.
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Ukpmc.ac.uk As a result of the mandates Research in the open How mandates work in practice 29 th May, 2009 Paul Davey, UK PubMed Central Engagement Manager,
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
CACAO Training ASM-JGI 2012.
Department of Genetics • Stanford University School of Medicine
Presentation transcript:

Databases to Support Disease-Focused Research Type 1 Diabetes Huntington’s Disease Nat Goodman Institute for Systems Biology January 2003

VanBUG January, 2003Nat Goodman Slide 1 The Basic Idea  Database (website) to support research of scientists working on diseases of interest  Key challenge: make it useful!  Data must be relevant to current research relevant to current research rigorously accurate rigorously accurate timely timely coordinated with other databases coordinated with other databases  Steering committee provides scientific direction  Also, easy-to-use, yadda, yadda, yadda

VanBUG January, 2003Nat Goodman Slide 2 What Else is Like This?  Other disease-focused websites  Alzheimer Research Forum (Alzforum) ? ALS Therapy Development Foundation (ALS-TDF)  Technology  disease databases – Stanford breast cancer microarray website Any others? Any others?  Model organism databases MGD, FlyBase, WormBase, TAIR, SGD, … MGD, FlyBase, WormBase, TAIR, SGD, …  Protein family databases GPCRs, cytochrome P450s, … GPCRs, cytochrome P450s, …  Locus-specific databases HLA, CF, … HLA, CF, …  Alliance for Cellular Signaling (AfCS)-Nature Gateway

VanBUG January, 2003Nat Goodman Slide 3 Potential Data Scope  Genomic regions  Genes & proteins functional summaries functional summaries curated sequences, genomic context, structures curated sequences, genomic context, structures orthologs, families, multiple alignments orthologs, families, multiple alignments  Microarray results  Genotypes  Protein-protein interactions  Pathway models  Empirical results on hot topics  Reagents antibodies, mouse models, clones, constructs, … antibodies, mouse models, clones, constructs, …  Therapeutic studies drug, transplantation, gene transfer drug, transplantation, gene transfer molecular, cellular, lower organism, mouse, other mammals molecular, cellular, lower organism, mouse, other mammals clinical clinical  Patient information clinical & pathologic features clinical & pathologic features  Biomarkers  Literature scanning and alerting  Reports of negative and “ho- hum” results  Lay explanations

VanBUG January, 2003Nat Goodman Slide 4 Practical Concerns Too much data prioritize!  Steering committee to the rescue Too much overlap collaborate!AlzforumRefSeqGO Stanford HOPES!!! OMIMBIND ? MGD  MEDLINE Too much software reuse!Alzforum other collaborating databases PubCrawlerGBrowseBioPerl Generic Model Organism Database (GMOD)

VanBUG January, 2003Nat Goodman Slide 5 Some Differences Between Projects Data Type Type 1 Diabetes HD Genomics ~17 susceptibility regions Single gene disorder Genes Several hundred genes in susceptibility regions ~40 huntingtin (Htt) interactors ~100 genes of interest Microarray A few datasets available Hereditary Disease Array Group led by Jim Olson Others ? Genotyping Consortium for fine- scale mapping Two efforts to map age-of-onset modifiers Therapies Coordinated program for islet cell transplantation Gene & drug therapy Pharma, too! Semi-coordinated program for drug screening Separate clinical studies Orphan disease

VanBUG January, 2003Nat Goodman Slide 6 First Data Scope for HD Website Data Type Details Large scale datasets Mouse & molecular drug screening Protein-protein interactions (Hughes, Myriad Proteomics) Protein abundance in cerebrospinal fluid (Watts, ISB) Gene list Human, mouse, rat orthologs Sequences Functional summaries Empirical results Example: Htt interaction with transcription factors - binding, transcriptional activity, cell death ReagentsAntibodies Genetic constructs Pathway models Hypothesized disease mechanisms Example: Htt & CREB-mediated transcription

VanBUG January, 2003Nat Goodman Slide 7 Pathway Model (Wild type) Normal CREB-mediated transcription Software: VisualCell™ from Gene Network Sciences

VanBUG January, 2003Nat Goodman Slide 8 Pathway Model (Diseased) Software: VisualCell™ from Gene Network Sciences

VanBUG January, 2003Nat Goodman Slide 9 Steering Committee Response

VanBUG January, 2003Nat Goodman Slide 10 Steering Committee Guidelines  Peer-review!  Connect everything to literature  Rigorously scrutinized, but diverse, science  Data – “just the facts, Ma’am” – not conjecture  Hypotheses presented as such – not as fact

VanBUG January, 2003Nat Goodman Slide 11 My Response Hmm… this is kinda narrow for a community website

VanBUG January, 2003Nat Goodman Slide 12 Compromise Community information Non-reviewed Primary datasets Non-reviewed Core Reviewed scientific material Tied to literature Steering committee in charge!

VanBUG January, 2003Nat Goodman Slide 13 Current Core Data Scope Data Type Details Comprehensive bibliography Milestone papers Annotation by curators & committee User comments Published drug screens in mouse Bibliography & dataset Mouse models Bibliography & dataset Antibodies Published microarray studies Bibliography, lists of changed genes, links to full datasets Gene list Bibliography Human, mouse, rat orthologs Sequences Htt interactions Short functional descriptions

VanBUG January, 2003Nat Goodman Slide 14 Current Core Services  Genome / gene browser View genes in human, mouse, rat syntenic regions View genes in human, mouse, rat syntenic regions Accesses UC Santa Cruz DAS server plus local databases Accesses UC Santa Cruz DAS server plus local databases All standard Santa Cruz information visible here, too All standard Santa Cruz information visible here, too Based on GBrowse – collaboration with L. Stein Based on GBrowse – collaboration with L. Stein  Literature alerting Specify MEDLINE queries Specify MEDLINE queries Can include our bibliographies Can include our bibliographies System runs periodically to get new hits System runs periodically to get new hits Based on PubCrawler– collaboration with K. Wolfe, K. Hokamp Based on PubCrawler– collaboration with K. Wolfe, K. Hokamp

VanBUG January, 2003Nat Goodman Slide 15 Current Satellite Data Scope Data Type Details News Like news in Science and Nature Forum Interviews with leading scientists Live discussions on hot topics with subsequent transcripts Web delivery of presentations Mini-reviews derived from above Calendar of events Conferences, etc. Contact info for HD researchers With permission! Lay explanations For major sections, at least Primary datasets Protein-protein interactions (Hughes) Protein abundance in CSF (Watts)

VanBUG January, 2003Nat Goodman Slide 16 Help From Our Friends Data Type WhoWhat All bibliographies Alzforum citation database Comprehensive bibliography Alzforum scanning & librarian Mouse models Alzforumdatabase AntibodiesAlzforum database & curator Published microarray studies HDAG data & review Gene list MGD orthologs (we hope) RefSeq sequences, descriptions GOannotations BIND Htt interactions News, forum, calendar, contacts Alzforum Lay explanations HOPES Primary datasets Myriad, ISB data

VanBUG January, 2003Nat Goodman Slide 17 Software Architecture Perl / CGI scripts RefSeq MGD (?) BIND local databases citations antibodies mouse models news & things Alzforum Other friends Delivery by FTP & API, too web delivery

VanBUG January, 2003Nat Goodman Slide 18 Genome Browser Screenshot

VanBUG January, 2003Nat Goodman Slide 19 Alzforum Home Page

VanBUG January, 2003Nat Goodman Slide 20 Alzforum Papers of the Week

VanBUG January, 2003Nat Goodman Slide 21 Alzforum Mouse Model List

VanBUG January, 2003Nat Goodman Slide 22 A Few Words About IP  Open source  Open data  Strong privacy

VanBUG January, 2003Nat Goodman Slide 23 Four Rules for a Successful Website 1.Too much data Prioritize! Prioritize!  What will be most useful?  Rely on scientific experts 2.Too much software Reuse! Reuse!  Lots of great software available  Developers willing to help 3.Too much overlap Collaborate! Collaborate!  Many databases welcome this  Less work – better product -- more fun! 4.Obsess on quality  Bad data wastes everyone’s time

VanBUG January, 2003Nat Goodman Slide 24 Acknowledgements ISB Project Team George Lake Michelle Whiting Paul Edlefsen Robert Hubley HDF Carl Johnson Minka van Beuzekom Steering Committee Carl Johnson and Minka van Beuzekom, HDF Dan Goldowitz University of Tennessee Emma Hockly Guy’s Hospital Bruce Kristal Cornell University Marcy MacDonald Massachusetts General Hospital Ray Truant McMaster University Alzforum June Kinoshita RefSeq Kim Pruitt GO Consortium Evelyn Camon HOPES Bill Durham HDAG Jim Olson Myriad Proteomics Bob Hughes ISB Julian Watts