European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute.

Slides:



Advertisements
Similar presentations
ArrayExpress A public database for microarray based gene expression data European Bioinformatics Institute EMBL-EBI Alvis.
Advertisements

Misha Kapushesky November 28, 2003 Expression Profiler: Next Generation.
The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute.
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
The MGED Ontology Workshop MGED 7 September 8, 2004 Chris Stoeckert Center for Bioinformatics & Dept. of Genetics University of Pennsylvania.
Visualisationmodule Catherine Leroy, Pierre Marguerite, Bhuwan Tiwari, Niran Abeygunawardena, Sergio Contrino, Anna Farne, Ele Holloway, Gaurab Mukherjee,
1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.
Minimum Information About a Microarray Experiment - MIAME MGED 5 workshop.
Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011.
Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
 Goals Unambiguous description of how the investigation was performed Consistent annotation, powerful queries and data integration  Details NOT model.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
MIAME and Data Standards Phillip Lord. Why Standards? "However, there is a subtle implication that standardization (fixation) is a good thing". An anonymous.
Transcriptomics Patrick Kemmeren European Bioinformatics Institute Genomics Lab, UMC Utrecht.
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
EMBL Outstation — The European Bioinformatics Institute MIAME and ArrayExpress - a standard for microarray data annotation and a database to store it Helen.
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute.
1 MAGE-OM and ArrayExpress database model Ugis Sarkans, EBI.
1 Update on ArrayExpress & standards Ugis Sarkans, EBI.
The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.
Susanna-Assunta Sansone (Toxicogenomics project coordinator) Microarray Informatics Team EMBL- EBI (European Bioinformatics Institute) Transcriptome Symposium,
ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics.
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
September 2003 Aix en Provence Jonathon Blake EMBL Biochemical Instrumentation.
MIAMExpress development and local installation DESPRAD Meeting,November 2002 Mohammad shojatalab
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
From MIAME to MAML: Microarray Gene Expression Database (MGED) Chris Stoeckert Center for Bioinformatics University of Pennsylvania Sept. 19, 2001 GE ^
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
Genomics Laboratory University Medical Center Utrecht... Microarray technology group microarray production and use Transcription regulation genome-wide.
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
EMBL- EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK Standards and infrastructure for managing experimental metadata Philippe Rocca-Serra,
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Generating Useful Information in Toxicogenomics: Focused Efforts: Microarray Standards Feb. 6, 2003, The National Academies Chris Stoeckert, Ph.D. Center.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
TEMBLOR review meeting - EMBL-EBI, Hinxton, October 20 th 2003 Integration of J-Express with ArrayExpress Partner 20 University of Bergen Inge Jonassen.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Mining the Biomedical Research Literature Ken Baclawski.
1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
Describing Bioinformatic Metadata at EBI James Malone
ArrayExpress Ugis Sarkans EMBL - EBI
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Using ArrayExpress.
From MIAME to MAML: Microarray Gene Expression Database (MGED)
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
Presentation transcript:

European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute EMBL-EBI Microarray Gene Expression Data Society

European Bioinformatics Institute MGED Society Outline F Establishing the infrastructure for sharing microarray data – MGED, MIAME, MAGE-ML, databases F Microarray Informatics at the EBI

Microarrays - a tool for the golden age of genome discoveries

European Bioinformatics Institute MGED Society Some questions for the golden age of genomics F How gene expression differs in different cell types? F How gene expression changes when the organism develops and cells are differentiating? F How gene expression differs in a normal and diseased (e.g., cancerous) cell? F How gene expression changes when a cell is treated by a drug? F How gene expression is regulated – which genes regulate which and how?

European Bioinformatics Institute MGED Society Potential amounts of microarray data F Experiments: ~ genes in a human genome ~ 320 cell types in a human organism –2000 compounds for screening –2 concentrations –3 time points –5 replicates F Data ~ data-points  1 Tera Byte

European Bioinformatics Institute MGED Society Making microarray data available to the public F Authors web-sites F Local, lab based public databases (Stanford University, Whitehead,…) F Journal web-sites F There is a wide community consensus that there is a need for public repositories for microarray data, analogous to DDBJ/EMBL/Genbank for sequence data

Raw data Array scans Spots Quantitations Quantitation matrices Genes Samples Gene expression data matrix Gene expression levels Which data to share?

Samples Genes Gene expression levels – problem 2 Sample annotations problem 1 Gene annotations Gene expression matrix Annotations

hybridisation labelled nucleic acid array RNA extract source Sample treatment elements (spots) Design protocols image quantitation matrix Sample annotation Gene annotation

hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design Experiment Gene expression data matrix transformation integration Gene expression measurements

European Bioinformatics Institute MGED Society Problem 4 F The nature and structure of the above described gene expression data and annotations are complex F For the public repositories to make the maximum use out of these data, standards for representing and communicating it should be established

European Bioinformatics Institute MGED Society Standards for microarray data F Understanding and agreement what data and annotations should be provided F Standard controlled vocabularies (ontologies) that can be used in such annotations F Standard format for exchange of annotated data F Understanding how to compare different datasets

European Bioinformatics Institute MGED Society Microarray Gene Expression Database meeting was organised in Cambridge, UK, November 1999 to discuss these problems

European Bioinformatics Institute MGED Society MGED 1 – some participants F Affymetrix F DDBJ F DKFZ F EMBL F Gene Logic F Incyte F Max Plank Institute F NCGR F NHGRI F Sanger Centre F Stanford University F Uni Pennsylvania F Uni Washington, Seattle F Whitehead Institute

European Bioinformatics Institute MGED Society MGED working groups F Experiment annotation F Data exchange format and modelling F Ontologies F Data normalisation and transformations F Queries

European Bioinformatics Institute MGED Society MGED meetings MGED 2, Heidelberg, May 2000 MGED 3, Stanford University, April 2001 MGED 4, Boston, February 2002 MGED 5, Tokyo, September 2002

European Bioinformatics Institute MGED Society MGED Society was founded in June 2002 Microarray Gene Expression Data (MGED) society is an international organisation for facilitating sharing of functional genomics and proteomics array data Board of 17 directors

European Bioinformatics Institute MGED Society MGED standards F Annotation content – MIAME F Data representation and exchange format MAGE-OM (MAGE-ML) – jointly with OMG

European Bioinformatics Institute MGED Society MIAME – Minimum Information About a Microarray experiment An attempt to outline the minimum information required to interpret unambiguously and potentially reproduce and verify an array based gene expression experiment

European Bioinformatics Institute MGED Society MGED standards

hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design Experiment Gene expression data matrix normalization integration MIAME – the content (annotation) of all boxes and lines should be given

European Bioinformatics Institute MGED Society MIAME ‘checklist’ to authors and reviewers F Experimental design F Samples used, RNE extraction and labelling F Hybridisation F Measurement data and specifications F Array Design –(Row images) –Image quantitation (data and specification) –Gene expression data matrix (data and transformations)

European Bioinformatics Institute MGED Society MIAME ‘checklist’ F An open letter was sent to the journals last week - all the information in MIAME ‘checklist’ should be made available as a requirement for accepting publications F The Lancet has indicated that it will adopt MIAME checklist as a requirement F Nature will adjust its policy in the line with MIAME recommendations

European Bioinformatics Institute MGED Society A need for a supporting infrastructure F MIAME itself will not solve the problem F A standard format is needed for representing and exchanging this information

European Bioinformatics Institute MGED Society MGED standards 2 F Data exchange format – MicroArray Gene Expression Mark-up language – MAGE-ML – an XML based file format able to capture all MIAME required information F Based on object model MAGE-OM (Paul Spellman, Michael Miller, Jason Stewart, Ugis Sarkans, …) F Adopted by OMG as a standard for microarrays

Treatment Transformation BioEventExperiment ArrayDesign BioMaterial BioAssayData BioAssay DesignElement UML Packages of MAGE HigherLevelAnalysis BioSequence Array QuantitationType Description Protocol Measurement AuditAndSecurity BQS

MAGE – an example diagram

European Bioinformatics Institute MGED Society Use case of MAGE: ArrayExpress architecture ArrayExpress (Oracle) Browser MIAMEexpress MAGE-ML (DTD) MAGE-OM MAGE-ML (doc) data loader Velocity template engine Castor object/ relational mapping Web page template Web page template Java servlets Tomcat

European Bioinformatics Institute MGED Society MGED standards 3 F MGED ontologies – organism part, cell type, diseased state, genotype, chemical compounds (Chris Stoeckert, Helen Parkinson, Susanna Sansone,…) F Symposium “Standards and Ontologies for Functional Genomics” – November 17-20, Cambridge, UK

European Bioinformatics Institute MGED Society MGED standards 4 F Data transformation and normalisation (Cathy Ball, John Quackenbush, Gavin Sherlock, …)

European Bioinformatics Institute MGED Society Infrastructure for sharing microarray data F Standard for experiment annotation F Standard for data exchange F Public repositories F Local databases and LIMS F Ways of comparing the data

European Bioinformatics Institute MGED Society ArrayExpress – a MIAME/MAGE supportive public repository for microarray data at EBI ArrayExpress MIAMExpress Expression Profiler MAGE-ML Internet www MAGE-ML Submissions Queries, Analysis

European Bioinformatics Institute MGED Society Microarray data sharing infrastructure Public repositories MAGE-ML ww w Data queries, retrieval, and analysis Data submissions Array descriptions (from manufacturers) Data analysis software MIAMExpress local instalations LIMS MAGE-ML LIMS Data analysis software html Other databases MAGE-ML ww w wwwwww

European Bioinformatics Institute MGED Society MIAME/MAGE supportive software F Sanger Institute LIMS (MIDAS) F TIGR LIMS F Gene Traffic (Iobion) F Affymetrix F MAXDB (Manchester) F Rosetta Resolver (Rosetta Biosoftware) F Base (Lund) F J-Express (Molmine) F MIAMExpress (EBI) F ArrayExpress (EBI)

European Bioinformatics Institute MGED Society Acknowledgements F MGED board –Cathy Ball (Stanford) –Helen Causton (Imperial Col) –Terry Gaasterland (Rockefel) –Jason Gonzales (Iobion) –Pascal Hingamp (Marseille) –Barbara Jasny (Science) –Helen Parkinson (EBI) –John Quackenbush (TIGR) –Martin Ringwald (Jackson) –Gavin Sherlock (Stanford) –Paul Spellman (Berkely) –Jason Stewart (Open Inf) –Chris Stoeckert (Uni Penns) –Yoshio Tateno (DDBJ) –Ron Taylor (Colorado) –Charles Troup (Agilent) –MGED supporters –Rob Andrews (Sanger) –Wilhelm Ansorge (EMBL) –Mike Cherry (Stanford) –Peter Dansky (Affymetrix) –David Hancock (Manchester) –Frank Holstege (Utrecht) –Michael Miller (Rosetta) –Kate Rice (Sanger) –Christian Schwager (EMBL) –Joe White (TIGR) –Rick Young (MIT) –EBI Microarry Team –Niran Abeygunawardena –Helen Parkinson –Philippe Rocca-Sera –Susanna Sansone –Ugis Sarkans –Mohammadreza Shojatalob –Jaak Vilo

Microarray informatics at the EBI F ArrayExpress (Helen Parkinson) F Expression profiler data analysis tool and promoter analysis (Jaak Vilo) F Reconstructing and analysing gene networks

European Bioinformatics Institute MGED Society Gene Networks – graphs: nodes are genes, arcs are relationships

European Bioinformatics Institute MGED Society Different ways to build a gene network G1G2 - The product of gene G1 is a transcription factor, which binds to the promoter of gene G2 – physical interaction network G1G2 - The disruption of gene G1 changes the expression level of gene G2 – data interpretation network G1G2 - Gene G2 is mentioned in a paper about gene G1 – literature networks

Data for over 200 gene disruptions in Yeast Hughes et al, Cell, 102 (2000)

European Bioinformatics Institute MGED Society Discretization of the data: The normalized expression log(ratios) are discretized using different thresholds  = 2 , 2.1 , …, 4  : X <    d(X) =  1    X    d(X) = 0 X >   d(X) = 1

European Bioinformatics Institute MGED Society Gene disruption network A C B D AA BB CC gene B gene C gene D gene A

Data for over 200 gene disruptions in Yeast Hughes et al, Cell, 102 (2000)

European Bioinformatics Institute MGED Society Mutation network for S. Cerevisiae

European Bioinformatics Institute MGED Society Mutation network   =2, filtered for the genes marked in red (mating) Thomas Schlitt, Johan Rung

European Bioinformatics Institute MGED Society Comparison to literature network derived from YPD Result Overlap between calculated networks and YPD-graph is always larger than overlap between randomised networks and the YPD-graph

European Bioinformatics Institute MGED Society Network modularity F Is there one “big” dominant connected component and possibly a number of small components, or several components of comparable sizes? F Can the network be broken down in several components of comparable size by removing nodes of high degree (i.e., nodes with many incoming or outgoing edges)?

European Bioinformatics Institute MGED Society

European Bioinformatics Institute MGED Society

European Bioinformatics Institute MGED Society

Number of connected components in the networks  componentfull network 1% removed 5% removed 10% removed 2.0 largest second total largest second total largest second total

European Bioinformatics Institute MGED Society Other opinions F Wagner, 2002 (Genome Res) – there exists many independent modules F Feathersone, 2002 (Bioessays) - there is only one giant module F All depends on the definition of the ‘module’

European Bioinformatics Institute MGED Society Disruption network properties F In and out degree of genes distributed according to power-low F There are no obvious modules in this particular network F ‘Local’ networks make sense (J.Rung, T.Schlitt et al, to appear in ECCB special issue of Bioinformatics)

European Bioinformatics Institute MGED Society Gaurab Mukherjee, Alvis Brazma, Gonzalo Garcia Lara, Ugis Sarkans, Koichi Tazaki, Ahmet Ociamen, Helen Parkinson, Mohammadreza Shojatalab, Thomas Schlitt, Katja Kivinen, Misha Kapushesky, Ele Holloway, Nastja Samsonova, Philppe Rocca-Serra, Johan Rung, Niran Abeygunawardena, Susanna Sansone, Jaak Vilo Microarray Informatics at the EBI