ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.

Slides:



Advertisements
Similar presentations
The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute.
Advertisements

The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
The MGED Ontology Workshop MGED 7 September 8, 2004 Chris Stoeckert Center for Bioinformatics & Dept. of Genetics University of Pennsylvania.
Visualisationmodule Catherine Leroy, Pierre Marguerite, Bhuwan Tiwari, Niran Abeygunawardena, Sergio Contrino, Anna Farne, Ele Holloway, Gaurab Mukherjee,
Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays Microarray Data Analysis and Management: Bio-ontologies for Microarrays EMBL-EBI,
Transcriptomics Patrick Kemmeren European Bioinformatics Institute Genomics Lab, UMC Utrecht.
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
EMBL Outstation — The European Bioinformatics Institute MIAME and ArrayExpress - a standard for microarray data annotation and a database to store it Helen.
Microarray Gene Expression Database (MGED) Ontology Working Group Chris Stoeckert Center for Bioinformatics University of Pennsylvania July 26, 2001.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
The importance of meta data capture – problems and solutions Helen Parkinson Microarray Informatics Team European Bioinformatics Institute NERC Meta Data.
EBI is an Outstation of the European Molecular Biology Laboratory. MAGE-TAB - The ArrayExpress Production Experience Helen Parkinson, PhD.
Excerpts from a Sample Description courtesy of M. Hoffman, S. Schmidtke, Lion BioSciences Organism: mus musculus [ NCBI taxonomy browser ] Cell source:
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute.
1 Update on ArrayExpress & standards Ugis Sarkans, EBI.
European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute.
Susanna-Assunta Sansone (Toxicogenomics project coordinator) Microarray Informatics Team EMBL- EBI (European Bioinformatics Institute) Transcriptome Symposium,
ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics.
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
September 2003 Aix en Provence Jonathon Blake EMBL Biochemical Instrumentation.
Sharing Microarray Experiment Knowledge Chips to Hits Oct. 28, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for Bioinformatics University of.
Standards and Ontologies for Data Annotation Helen Parkinson Microarray Informatics Team European Bioinformatics Institute NBN-EBI Course, October 2002.
MIAMExpress development and local installation DESPRAD Meeting,November 2002 Mohammad shojatalab
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003.
EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology ArrayExpress Helen Parkinson,
From MIAME to MAML: Microarray Gene Expression Database (MGED) Chris Stoeckert Center for Bioinformatics University of Pennsylvania Sept. 19, 2001 GE ^
The European Bioinformatics Institute Atlas of Gene Human Gene Expression Proposal - resources Alvis Brazma, Tom Freeman and Helen Parkinson.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
MIAMExpress development October 2002 Mohammad shojatalab
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Generating Useful Information in Toxicogenomics: Focused Efforts: Microarray Standards Feb. 6, 2003, The National Academies Chris Stoeckert, Ph.D. Center.
TEMBLOR review meeting - EMBL-EBI, Hinxton, October 20 th 2003 Integration of J-Express with ArrayExpress Partner 20 University of Bergen Inge Jonassen.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions.
TEMBLOR mid-term review Participation in DESPRAD project Bernd Drescher Robert Wagner.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
ArrayExpress Ugis Sarkans EMBL - EBI
Using ArrayExpress.
How to store and visualize RNA-seq data
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays
From MIAME to MAML: Microarray Gene Expression Database (MGED)
MGED Ontology Working Group Report
Presentation transcript:

ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED V, Tokyo 2002

 Introducing ArrayExpress  Submission methods  Ontologies and Annotation  The Future Talk structure

Infrastructure at the EBI ArrayExpress (Oracle) Other public Microarray Databases (GEO, CIBEX) ww w EBI Expression Profiler External Bioinformatic databases Data analysis ww w Queries ww w MIAMExpress (MySQL) MAGE-ML Submissions MAGE-ML Array Manufacturers LIMS Microarray software Data Analysis software MAGE-ML Export Local MIAMExpress Installations MAGE-ML files Submissions MAGE-ML pipelines

ArrayExpress  Public database for gene expression data, one of three including CIBEX and GEO  Aims to store well annotated (MIAME compliant) and well structured data  MIAME  Recorded info should be sufficient to interpret and replicate the experiment  Information should be structured so that querying and automated data analysis and mining are feasible* *Brazma et al,.Nature Genetics, 2001

ArrayExpress conceptual model Publication External links HybridisationArraySample Source (e.g., Taxonomy ) Experiment Normalisation Gene (e.g., EMBL ) Data

ArrayExpress details  Database schema derived from MAGE-OM  Standard SQL, we use Oracle  Validating data loader for MAGE-ML  Web interface – Queries - experiment, array, sample – Browsing – views on expt  Object model-based query mechanism, automatic mapping to SQL

Data Submission to ArrayExpress  Via MAGE-ML pipeline  From LIMS/local database  Current direct MAGE-ML submitters: Sanger, TIGR, Paul Spellman, Affymetrix  Via external tools e.g BASE, J-express  Via MIAMExpress  Accepts Array, Experiment and Protocol submissions  Provides accession numbers

ArrayExpress curation effort  User support and help documentation  Curation at source (not destination)  Support on ontologies and CV’s  Minimize free text, removal of synonyms  MIAME encouragement  Help on MAGE-ML  Goal: to provide high-quality, well- annotated data to allow automated data analysis

Public Data Access  Data export as tab delimited file  Export to Expression profiler  As MAGE-ML, from query interface  Arrays exportable as tab delimited file

Submission Tool

MIAMExpress submission and annotation tool  Based on MIAME concepts and questionnaire  Perl-CGI, MySQL database  Experiment, Array, Protocol submissions  Generic annotation tool  Exports MAGE-ML

What makes up a submission ? Final data ArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationExperiment Protocols Normalisation

Array Definition Format  Tab delimited file format that can be parsed to MAGE-ML  Defines relationships between features and sequences  Provides sequence annotation  Example in the Agenda, open letter to journals

Example BioSequence annotation

Sample annotation  Gene expression data only have meaning in the context of detailed sample descriptions  If the data is going to be interpreted by independent parties, sample information has to be searchable and in the database  Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description  None of this is trivial

What Does an Ontology Do?  Captures knowledge  Creates a shared understanding – between humans and for computers  Makes knowledge machine processable  Makes meaning explicit – by definition and context  It is more than a controlled vocabulary, it has structure

MGED Biomaterial (sample) Ontology  Under construction – by MGED OWG – Using OILed  Motivated by MIAME and coordinated with the database model (mapping available)  We are extending classes, providing constraints, defining terms, and adding terms  Extension to include other MAGE-OM required ontology entries and cv’s

MGED BioMaterial Internal Terms

Internal and External Terms combined

Examples of external ontologies and cv’s  NCBI taxonomy database  Jackson Lab mouse strains and genes  Edinburgh mouse atlas anatomy  HUGO nomenclature for Human genes  Chemical and compound Ontologies, e.g. CAS  TAIR  Flybase anatomy  GO (  GOBO ontologies

Example Annotation  Sample source and treatment description, and its correct annotation using the MGED BioMaterial Ontology classes and corresponding external references: “Seven week old C57BL/6N mice were treated with fenofibrate. Liver was dissected out, RNA prepared”

©-BioMaterialDescription ©-Biosource Property ©-Organism ©-Age ©-DevelopmentStage ©-Sex ©-StrainOrLine ©-BiosourceProvider ©-OrganismPart ©-BioMaterialManipulation ©-EnvironmentalHistory ©-CultureCondition ©-Temperature ©-Humidity ©-Light ©-PathogenTests ©-Water ©-Nutrients ©-Treatment ©-CompoundBasedTreatment (Compound) (Treatment_application) (Measurement) MGED BioMaterial Ontology Instances 7 weeks after birth Female Charles River, Japan 22  2  C 55  5% 12 hours light/dark cycle Specified pathogen free conditions ad libitum MF, Oriental Yeast, Tokyo, Japan in vivo, oral gavage 100mg/kg body weight External References NCBI Taxonomy Mouse Anatomical Dictionary International Committee on Standardized Genetic Nomenclature for Mice International Committee on Standardized Genetic Nomenclature for Mice Mouse Anatomical Dictionary ChemIDplus Mus musculus musculus id: Stage 28 C57BL/6 Liver Fenofibrate, CAS

Future  Data acquisition for ArrayExpress  Update ArrayExpress to newest MAGE-OM  V2.0 MIAMExpress, domain specific, portable  Further ontology development and integration into tools, use of OWL  Curation tools (other than grep, Perl scripts)  Improved query interface for AE  ArrayExpress update tool

Resources  Schemas for both ArrayExpress and MIAMExpress, access to code  MAGE-ML examples, Arrays, Expts, Protocols  MIAME glossary, MAGE-MIAME-ontology mappings  List of ontology resources from MGED pages  Help in establishing pipelines  MGED software MAGEstk  Curation, help and advice

Acknowledgments  Microarray Informatics Team, EBI, esp. Alvis Brazma, Ugis Sarkans, Mohammad Shojatalab, Ele Holloway, Gaurab Mukherjee, Philippe Rocca-Serra, Susanna Sansone  Chris Stoeckert, U. Penn.  Members of MGED  Sanger Institute - Rob Andrews, Jurg Bahler, Adam Butler, Kate Rice,  EMBL Heidelberg - Wilhelm Ansorge, Martina Muckenthaler,Thomas Preiss

‘’I think you should be more explicit here in step two’’ ‘The miracle of microarray data analysis’ Genome Biology 2001, 2 (9)