Download presentation
Presentation is loading. Please wait.
Published byOsborne Harvey Modified over 9 years ago
1
EMBL Outstation — The European Bioinformatics Institute MIAME and ArrayExpress - a standard for microarray data annotation and a database to store it Helen Parkinson Microarray Informatics Team European Bioinformatics Institute Hinxton
2
EMBL Outstation — The European Bioinformatics Institute Three parts of my talk Microarray data standards Ontologies for gene expression data ArrayExpress - a public database for microarray data Analysis tools at the EBI
3
EMBL Outstation — The European Bioinformatics Institute The size of the datasets Experiments: – ~100 000 different transcripts in human – ~320 cell types – 2000 compounds – 3 time points – 2 concentrations – 2 replicates Data – 8 x 10 11 data-points – 1 x 10 15 = 1 Peta Byte for Affymetrix (data from Jerry Lanfear)
4
EMBL Outstation — The European Bioinformatics Institute Microarray data Microarrays are widely used in experiments and already producing massive amounts of data These data have to be stored in a well organised and standard way, if they are to be accessed and analysed by the wide research community There is a general consensus that there is a need for a public repository for microarray data It is much less clear what exactly should be stored in such a repository
5
EMBL Outstation — The European Bioinformatics Institute A gene expression database from the data analyst’s point of view Samples Genes Gene expression levels Sample annotations Gene annotations Gene expression matrix
6
EMBL Outstation — The European Bioinformatics Institute Three parts of a gene expression database Gene annotation – can be given by links to gene sequence databases and GO ( function,process,cell compartment ) – not perfect but lets not worry about it Sample annotation – we do not have any external databases for sample description (except species taxonomy) – problem 1 Gene expression matrix – what are the measurement units for gene expression levels? – problem 2
7
EMBL Outstation — The European Bioinformatics Institute Problem/consideration 1 – sample annotation Gene expression data only have meaning in the context of detailed sample descriptions If the data is going to be interpreted by independent parties, sample information has to be searchable and in the database Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description
8
EMBL Outstation — The European Bioinformatics Institute Sample annotation- what can be done? Few cv’s and ontologies for sample description are available (species taxonomy, model organisms) Some use of free text descriptions are unavoidable (curation workload) Existing efforts of creating such ontologies should be coordinated (MGED ontology working group) Use existing ontologies and cv’s wherever possible
9
EMBL Outstation — The European Bioinformatics Institute Problem 2 – the lack of gene expression measurement units What we would like to have – gene expression levels expressed in some standard units (e.g. molecules per cell) – reliability measure associated with each value (e.g. standard deviation) What have we got – each experiment using different units – no reliability information
10
EMBL Outstation — The European Bioinformatics Institute Comparing expression data cminc
11
EMBL Outstation — The European Bioinformatics Institute Comparing expression data ??
12
EMBL Outstation — The European Bioinformatics Institute Comparing expression data
13
EMBL Outstation — The European Bioinformatics Institute What to do in the absence of standard measurement units? Record raw, intermediate and final analysis data together with the detailed annotation of how the analysis has been performed This effectively passes on the responsibility about interpreting the final analysis data to the user
14
EMBL Outstation — The European Bioinformatics Institute Raw data Array scans Genes Samples Gene expression data Gene exp. levels Three levels of microarray data processing Spots Quantitations Quantitation matrices Spot quantitations
15
EMBL Outstation — The European Bioinformatics Institute Measurement units In perspective: – standard controls for experiments (on chips and in the samples) should be introduced – replicate measurements will become a norm Temporary solution: – storing intermediate analysis results (including the images) and annotations of how they were obtained – Standards within experiments themselves (standard controls and protocols)
16
EMBL Outstation — The European Bioinformatics Institute Standards for microarray data Standards are needed to build a well organised microarray database – Standards for annotation – Standards for data exchange – Standards for controls in the experiment and data normalisation www.dnachip.org/mged/normalization.html
17
EMBL Outstation — The European Bioinformatics Institute How to create microarray data standards 1. To understand thoroughly what is the minimum information about a microarray experiment that is needed to interpret it unambiguously and what is the structure of this information (objects and relationships) 2. To create the technical data format able to capture this information 3. Finding appropriate controlled vocabularies
18
EMBL Outstation — The European Bioinformatics Institute Standardisation of microarray data and annotations -MGED group The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. Includes most of the worlds largest microarray laboratories and companies (TIGR,Affymetrix Stanford,Sanger,Agilent etc) www.mged.org
19
EMBL Outstation — The European Bioinformatics Institute MGED MGED 2 meeting in Heidelberg in 2000, MGED 3 in Stanford in 2001, both ~ 300 participants Minimum Information About a Microarray Experiment – MIAME version 1.0 posted Collaboration with OMG on data formats MAML+GEML = MAGE-ML and MAGE-OM MGED 4 meeting in February 2001, Boston MGED will become an ISCB Special Interest Group
20
EMBL Outstation — The European Bioinformatics Institute MIAME – Minimum Information About a Microarray Experiment Publication External links 6 parts of a microarray experiment www.mged.org HybridisationArray Gene (e.g., EMBL ) Sample Source (e.g., Taxonomy ) Data Experiment Normalisation
21
EMBL Outstation — The European Bioinformatics Institute sample source and treatment ID as used in section 1 organism (NCBI taxonomy) additional "qualifier, value, source" list; the list includes: cell source - provider type (if derived from primary sources (s)) sex age growth conditions development stage organism part (tissue) animal/plant strain or line genetic variation (e.g., gene knockout, transgenic variation) individual individual genetic characteristics (e.g., disease alleles, polymorphisms) disease state or normal target cell type cell line and source (if applicable) in vivo treatments (organism or individual treatments) in vitro treatments (cell culture conditions) treatment type (e.g., small molecule, heat shock, cold shock, food deprivation) compound is additional clinical information available (link) separation technique (e.g., none, trimming, microdissection, FACS) laboratory protocol for sample treatment…… MIAME Section on Sample Source and Treatment
22
EMBL Outstation — The European Bioinformatics Institute What is an ontology? An ontology is a specification of concepts that includes the relationships between those concepts. Provides semantics and constraints Allows for computational inferences and reliable comparisons
23
EMBL Outstation — The European Bioinformatics Institute MGED Biomaterial Ontology Under construction by Chris Stoeckert – Using OILed (may use others) Motivated by MIAME and coordinated with the database model Extend classes, provide constraints, define terms, provide terms to use,develop cv’s for submissions (EBI)
24
EMBL Outstation — The European Bioinformatics Institute Use case scenario
25
EMBL Outstation — The European Bioinformatics Institute Ontology Example Concept=Age def=in standard units referenced to an identifiable time point from (class) developmental stage Age=6 {units=days}, {dev_stage}=dauer Hierarchy=Dev_stage->larva->dauer
26
EMBL Outstation — The European Bioinformatics Institute Excerpts from a Sample Description courtesy of M. Hoffman, S. Schmidtke, Lion BioSciences Organism: mus musculus [ NCBI taxonomy browser ] Cell source: in-house bred mice (contact: person@somewhere.ac.uk) Sex: female [ MGED ] Age: 3 - 4 weeks after birth [ MGED ] Growth conditions: normal controlled environment 20 - 22 o C average temperature housed in cages according to EU legislation specified pathogen free conditions (SPF) 14 hours light cycle 10 hours dark cycle Developmental stage: stage 28 (juvenile (young) mice)) [ GXD "Mouse Anatomical Dictionary" ] Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ] Strain or line: C57BL/6 [International Committee on Standardized Genetic Nomenclature for Mice] Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [International Committee on Standardized Genetic Nomenclature for Mice ] Treatment: in vivo [MGED] intraperitoneal injection of Dexamethasone into mice, 10 microgram per 25 g bodyweight of the mouse Compound: drug [MGED] synthetic glucocorticoid Dexamethasone, dissolved in PBS
27
EMBL Outstation — The European Bioinformatics Institute ArrayExpress conceptual model Publication External links HybridisationArraySample Source (e.g., Taxonomy ) Experiment Normalisation Gene (e.g., EMBL ) Data
28
EMBL Outstation — The European Bioinformatics Institute ArrayExpress object model
29
EMBL Outstation — The European Bioinformatics Institute ArrayExpress – the state of the art ArrayExpress Object model supporting MIAME requirements developed Data model implemented in Oracle Data loader from MAML file format Expression Profiler – data analysis tool already available
30
EMBL Outstation — The European Bioinformatics Institute ArrayExpress – plans and schedule EU grant – new staff being recruited A web based query interface - under development A web based submission tool – under test Participation in OMG – MAGE-OM & MAGE- ML MAGE-ML will replace MAML in October Full scale database operation expected to start at the beginning of 2002 Expression Profiler to link to ArrayExpress
31
EMBL Outstation — The European Bioinformatics Institute Microarray data analysis Expression Profiler – a web based gene expression data analysis tool: www.ebi.ac.uk/microarray/ www.ebi.ac.uk/microarray/
32
EPCLUST ( cluster Expression profiles ) GENOMES sequence, function, annotation SPEXS (Sequence Pattern Exhaustive Search) novel patterns URLMAP : provide links Expression Profiler - web based tool for microarray data analysis http://www.ebi.ac.uk/microarray/ Expression data External data, tools pathways, function, etc. PATMATCH k nown patterns
33
EMBL Outstation — The European Bioinformatics Institute Conclusions Microarray standardisation is a challenge and an imperative Join MGED to contribute to this process www.mged.org www.mged.org Participate in the development of ontologies and controlled vocabularies Send me your protocols Make your data available Feedback on MIAME, it’s up for discussion
34
EMBL Outstation — The European Bioinformatics Institute Acknowledgments Microarray Informatics Team, EBI Alvis Brazma, Katja Kivinen, Helen Parkinson, Olga Perez, Johan Rung, Ugis Sarkans,Thomas Schlitt, Mohammad Shojatalab, Lev Soinov, Koichi Tazaki, Jaak Vilo Industry Support team, EBI Alan Robinson MGED steering committee MIAME working group Chris Stoeckert, U. Penn. and MGED
35
EMBL Outstation — The European Bioinformatics Institute Useful URL’s www.mged.org www.mged.org www.tigr.org www.tigr.org www.ebi.ac.uk/array www.ebi.ac.uk/array www.geneontology.org www.geneontology.org www.hgmp.mrc.ac.uk www.hgmp.mrc.ac.uk www.dnachip.org/mged/normalization.html www.dnachip.org/mged/normalization.html F parkinson@ebi.ac.uk
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.