Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

Similar presentations


Presentation on theme: "Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *"— Presentation transcript:

1 Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine * Not necessarily in that order.

2 We have the human sequence: OK, now what? One species is not enough: One species is not enough: model organisms (one strain is not enough) model organisms (one strain is not enough) comparative studies comparative studies The sequence is just the beginning The sequence is just the beginning sequence variants sequence variants gene regulation and interaction networks gene regulation and interaction networks non-coding functional elements non-coding functional elements environmental effects environmental effects Genotype to phenotype Genotype to phenotype

3 The Mouse the premier animal model for studying human disease the premier animal model for studying human disease > 95% same genes > 95% same genes same diseases, similar reasons (e.g., cancer, hypertension, diabetes, osteoporosis, …) same diseases, similar reasons (e.g., cancer, hypertension, diabetes, osteoporosis, …) 1000s lab strains, diff. characteristics 1000s lab strains, diff. characteristics precise genetic control precise genetic control

4 The Jackson Laboratory Private nonprofit research institution (est. 1929) Private nonprofit research institution (est. 1929) Studying mouse as a model of human biology and disease Studying mouse as a model of human biology and disease National Cancer Research Center National Cancer Research Center Supplier of laboratory strains to researchers worldwide Supplier of laboratory strains to researchers worldwide Areas: metabolism, development, cancer, immune response Areas: metabolism, development, cancer, immune response

5 Bar Harbor, ME 04609

6 Mouse Genome Informatics (MGI) Consortium of NIH-funded projects Consortium of NIH-funded projects Housed at TJL Housed at TJL Integrates and disseminates public data resources covering selected aspects of mouse biology Integrates and disseminates public data resources covering selected aspects of mouse biology First program project funding 1989 First program project funding 1989 > $10M/y total, >60 people > $10M/y total, >60 people Online since Online since 1994.

7

8 MGI Concept Map Genes and other loci Expression Data Mapping Data Molecular Fragments DNA and Protein Sequences Strains Phenotypes Anatomy Genotypes Alleles References Accession IDs Variants

9 Integration in MGI Identifying objects. Resolving or noting discrepancies. Integration is key to knowledge discovery in age of genomics in age of genomics

10 The Power Of Integration: Queries What transcription factors are expressed in a 2-cell embryo and not in a blastocyst? What transcription factors are expressed in a 2-cell embryo and not in a blastocyst? integration of multiple expression assay data sets and data types. integration of multiple expression assay data sets and data types. standardization of anatomical references and developmental stages standardization of anatomical references and developmental stages What development QTLs contain these TFs? What development QTLs contain these TFs? integration of expression data and mapping data integration of expression data and mapping data genetic map result of integrating lots of mapping data genetic map result of integrating lots of mapping data What strains are distinguished by SNPs in this region? What strains are distinguished by SNPs in this region? And so on… And so on…

11 The MGI System (from 40,000 feet) MGI RDBMS Web Files Data Downloads Literature Curation SQL Load scripts Editing Interface Servlets CGI Scripts Files Report Scripts

12 MGI in Context MGI db Scientific Literature Mutagenesis Centers GenBank LocusLink Unigene TIGR DoTS OMIM Ensembl GO Interpro SwissProt ATCC RIKEN Anatomy RPCI RatMap NIA MGC I.M.A.G.E. NCBI RefSeq

13 Integration relies on Standard Vocabularies Structured vocabularies Structured vocabularies The common semantic frameworks The common semantic frameworks Structured into is-a/part-of hierarchies Structured into is-a/part-of hierarchies Evidence-based annotation Evidence-based annotation Associations of vocabulary terms with objects Associations of vocabulary terms with objects Evidence (codes), citations, etc., decorate the associations Evidence (codes), citations, etc., decorate the associations Structured annotations and queries Structured annotations and queries

14 Structured Vocabularies in MGI Gene Ontology (GO) Gene Ontology (GO) Functional gene annotations Functional gene annotations Mammalian Phenotype (MP) Mammalian Phenotype (MP) Annotations to genotypes (e.g. knockouts) Annotations to genotypes (e.g. knockouts) Mouse Anatomical Dictionary Mouse Anatomical Dictionary Annotations of expression Annotations of expression Other standardized, non-structured vocabularies Other standardized, non-structured vocabularies Mouse strains Mouse strains cell lines cell lines clone libraries clone libraries tissues tissues lots of smaller ones lots of smaller ones

15 Challenges Domain very difficult to frame Domain very difficult to frame Huge variability, variety of data, formats, providors, update schedules &semantics, etc… Huge variability, variety of data, formats, providors, update schedules &semantics, etc… Biologists and Computer Scientists think differently. Biologists and Computer Scientists think differently. communication is paramount, but difficult communication is paramount, but difficult Rapid changes, e.g., in last 10 years: Rapid changes, e.g., in last 10 years: genetic crosses -> YAC/BAC mapping -> RH mapping -> genome sequence genetic crosses -> YAC/BAC mapping -> RH mapping -> genome sequence northern blots -> microarrays -> mpss northern blots -> microarrays -> mpss

16 System Evolution The system is a software ecosystem The system is a software ecosystem Maintenance is the cost of success Maintenance is the cost of success Changes and cost/benefit Changes and cost/benefit If it ain’t broke, don’t fix it If it ain’t broke, don’t fix it Commitments/agenda/priorities Commitments/agenda/priorities

17 Credits Richard Baldarelli Matt Baya Jon Beal Dale Begley Judy Blake John Boddy Dirck Bradt Carol Bult Nancy Butler Donna Burkart Jeff Campbell Lori Corbani Rebecca Corey Sharon Cousins Diane Dahmen Harold Drabkin Janan Eppig Jackie Finger David Garippa Lucette Glass Carroll Goldsmith Pat Grant Terry Hayamizu David Hill Jim Kadin Ben King Debbie Krupke Moyha Lennon-Pierce Jill Lewis Ira Lu Cathy Lutz Lois Maltais Prita Mani Mike McCrossin Louise McKenzie David Miers Daniel Modrusan Dieter Naf Li Ni Janice Ormsby Sridhar Ramachandran Deborah Reed Joel Richardson Martin Ringwald David Shaw Bob Sinclair Cynthia Smith Connie Smith Paul Szauter Leslie Trombley Pierre Vanden Borre Michael Walker Linda Washburn Josh Winslow Iry Witham Sophia Zhu


Download ppt "Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *"

Similar presentations


Ads by Google