Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.

Slides:



Advertisements
Similar presentations
Misha Kapushesky November 28, 2003 Expression Profiler: Next Generation.
Advertisements

Garnet.arabidopsis.org.uk Beatrice Schildknecht NASC Data Availability and NASC tools NASC Nottingham Arabidopsis Stock Centre
The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute.
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Minimum Information About a Microarray Experiment - MIAME MGED 5 workshop.
Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
MIAME Minimum Information About a Microarray Experiment
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.
EMBL Outstation — The European Bioinformatics Institute MIAME and ArrayExpress - a standard for microarray data annotation and a database to store it Helen.
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute.
European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute.
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
Gene Expression Omnibus (GEO)
Susanna-Assunta Sansone (Toxicogenomics project coordinator) Microarray Informatics Team EMBL- EBI (European Bioinformatics Institute) Transcriptome Symposium,
ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics.
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Making Sense of Public Domain Expression Data- GeneVestigator
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
From MIAME to MAML: Microarray Gene Expression Database (MGED) Chris Stoeckert Center for Bioinformatics University of Pennsylvania Sept. 19, 2001 GE ^
Microarray - Leukemia vs. normal GeneChip System.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
Genomics Laboratory University Medical Center Utrecht... Microarray technology group microarray production and use Transcription regulation genome-wide.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Gene Expression Omnibus (GEO)
1 Cancer Models Database (caMOD). 2 History  January 2000 – Prototype is presented during the Mouse Models of Human Cancers (MMHCC) Steering Committee.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
ArrayExpress Ugis Sarkans EMBL - EBI
Expression Data Integration Microarray Gene Expression Database Meeting Sunday 14th November 1999.
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
Building a community for genome and proteome annotation
Using ArrayExpress.
From MIAME to MAML: Microarray Gene Expression Database (MGED)
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
Presentation transcript:

Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000

Public data repositories for microarray data There is a growing consensus in the life science community for a need for public repositories of gene expression data analogous to DDBJ/EMBL/GenBank for sequences

Some of the reasons: Gradually building up gene expression profiles for various organisms, tissues, cell types, developmental stages, various states, under influence of various compounds Through links to other genomics databases builds up systematic knowledge about gene functions and networks Comparison of profiles, access and analysis of data by third parties Cross validation of results and platforms - quality control

Systematic gene expression profiling initiatives in public domain The International Life Science Institute (ILSI) is coordinating a program undertaken by ~25 pharmaceutical and food companies to generate toxicity related gene expression data under defined experimental conditions –evaluate gene expression profiles in standardised test systems following exposure to toxicants –relate changes in gene expression to other measures of toxicity

Microarray data handling and analysis - a major bottleneck (Calculations by Jerry Lanfear) Experiments: – genes in human –320 cell types –2000 compounds –3 time points –2 concentrations –2 replicates Data –8 x data-points –1 x = 1 petaB of data

Expression data repository projects Public repositories in making: –GEO - NCBI –GeneX - NCGR –ArrayExpress - EBI In-house databases - Stanford, MIT, University of Pennsylvania, Organism specific databases: Mouse in Jackson Proprietary databases - Gene Logic, NCI

Difficulties Raw data are images What is needed for higher level analysis and mining is gene expression matrix (genes/samples/gene expression levels) –lack of standard measurement units for gene expression –lack of standards for sample annoation

Raw data - images Treated sample labeled red (Cy5) Control data labeled green (Cy3) Competitive hybridization onto chip Red dot - gene overexpressed in treated sample Green dot - gene underexpressed in treated sample Yellow - equally expressed Intensity - “absolute” level red/green - ratio of expression 2 - 2x overexpressed x underexpressed log 2 ( red/green ) - “log ratio” 1 2x overexpressed -1 2x underexpressed cDNA plotted microarray Stanford university (Yeast,1997)

Gene expression matrix Samples Genes Gene expression levels

What we would like to have –gene expression levels expressed in some standard units (e.g. molecules per cell) –reliability measure associated with each value (e.g. standard deviation) What we do have –each experiment using different units –no reliability information

Comparing expression data cminc

Comparing expression data ??

Measurement units In perspective: –standard controls for experiments (on chips and in the samples) –replicate measurements Temporary solution: –storing intermediate analysis results (including the images) and annotations of how they were obtained - i.e., the evidence

Comparing expression data - problem 2 How gene names relate in different data matrices? How samples relate in different data matrices?

Sample annotation Gene expression data have any meaning only in the context of what are the experimental conditions of the target system Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample annotation Sample annotations in current public databases are typically useless

In perspective Standard units for gene expression measurements Standards for sample annotation.

More immediate actions To understand what information about microarray experiments should be captured to make the descriptions reasonably self- contained Develop data exchange format able to capture this minimum information Develop recommendations how data should be normalised and what controls should be used

MGED group The MGED group is an open discussion group initially established at the Microarray Gene Expression Database meeting MGED 1 (14-15 November, 1999, Cambridge, UK). The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. The underlying goal is to facilitate the establishing of gene expression data repositories, comparability of gene expression data from different sources and interoperability of different gene expression databases and data analysis software. Since 1999 the group has had two general meetings and the third one is planned for 2001 For more see

MGED participants including Affymetrix Berkeley DDBJ DKFZ EMBL Gene Logic Incyte Max Plank Institute NCBI NCGR NHGRI Sanger Centre Stanford Uni Pennsylvania Uni Washington Whitehead Institute

Working groups Microarray experiment annotations and minimum information standards (A. Brazma) XML-data communication standards and interfaces (P. Spellman) Ontology for sample description (M. Bittner) Cross platform comparison and normalisation (F.Holstege, R.Bumgarner) Future user group - queries, query languages and data mining (M. Vingron)

MGED state of art Formulation of the “minimum information about a microarray experiment” (MIAME) to ensure its interpretability and reproducibility Data exchange format based on XML - microarray markup language (MAML) submitted to OMG in November

MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications see for details

MIAME concepts MIAME is aimed at co-operative data submitter Concept of “qualifier, value, source” lists, where source is either user defined or an external reference Reusable information can be referenced, but should be provided at least once (array descriptions, standard protocols) Raw data should be reported, together with the authors interpretations

MAML MAML is an XML based data exchange format able to capture MIAME compliant information The work is still in progress, the first draft has been submitted to OMG as a data exchange standard for microarray data

MAML concepts Annotations + data; data can be given as a set of external 2D matrices Data format independent on particular scanner or image analysis sofwater Sample and treatment can be represented as a DAG Concept of composite images and composite spots

Sample and treatment representation Sample 1Sample 2Sample 3 Array 1 Array 2 Treatments

Expression matrix - raw and processed Samples Genes Gene expression levels Images Spots Spot/Image quantiations 

Microarray image analysis data representation Images Spots Quantitations primary images composite images e.g., green/red ratios primary spots composite spots

MAML future The NOMAD microarray LIMS system will export data in MAML format ArrayExpress and GEO will import data in MAML format We hope that OMG will accept MAML as the industry standard We hope that MAML will become a defacto standard

MGED steering committee Meeting in Bethesda on 17 Nov 2000 MIAME accepted and a publication urging the journals and funding agencies to adopt it will be prepared MGED will become ISCB Special Interest Group Next general MGED meeting in Stanford, March 29-31

Top level object model for gene expression database