The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute.

Slides:



Advertisements
Similar presentations
ArrayExpress A public database for microarray based gene expression data European Bioinformatics Institute EMBL-EBI Alvis.
Advertisements

Bioinformatics Platform Three-tier Architecture Object-based Relational Database implemented using Oracle Middleware implemented using Entity-Class Operations,
Misha Kapushesky November 28, 2003 Expression Profiler: Next Generation.
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
CCPN project modeling framework University of Cambridge European Bioinformatics Institute MSD group.
ArrayExpress Query Interface Gonzalo Garc í a Lara January, / 24.
Visualisationmodule Catherine Leroy, Pierre Marguerite, Bhuwan Tiwari, Niran Abeygunawardena, Sergio Contrino, Anna Farne, Ele Holloway, Gaurab Mukherjee,
Presented by Amr Ali AL-Hossary (M.B.,B.Ch)
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Minimum Information About a Microarray Experiment - MIAME MGED 5 workshop.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
NYU Microarray Database (NYUMAD)
Transcriptomics Patrick Kemmeren European Bioinformatics Institute Genomics Lab, UMC Utrecht.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
EBI is an Outstation of the European Molecular Biology Laboratory. MAGE-TAB - The ArrayExpress Production Experience Helen Parkinson, PhD.
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
1 MAGE-OM and ArrayExpress database model Ugis Sarkans, EBI.
1 Update on ArrayExpress & standards Ugis Sarkans, EBI.
European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute.
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
Susanna-Assunta Sansone (Toxicogenomics project coordinator) Microarray Informatics Team EMBL- EBI (European Bioinformatics Institute) Transcriptome Symposium,
ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics.
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
Introduction to MDA (Model Driven Architecture) CYT.
MIAMExpress development and local installation DESPRAD Meeting,November 2002 Mohammad shojatalab
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Presentation on SubmissionTrackingTool: by Anjan Sharma.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology ArrayExpress Helen Parkinson,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
MIAMExpress development October 2002 Mohammad shojatalab
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
CERN – European Organization for Nuclear Research Administrative Support - Internet Development Services CET and the quest for optimal implementation and.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Generating Useful Information in Toxicogenomics: Focused Efforts: Microarray Standards Feb. 6, 2003, The National Academies Chris Stoeckert, Ph.D. Center.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions.
TEMBLOR mid-term review Participation in DESPRAD project Bernd Drescher Robert Wagner.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
Describing Bioinformatic Metadata at EBI James Malone
ArrayExpress Ugis Sarkans EMBL - EBI
PROJECT SECME Carthik A. Sharma Juan Carlos Vivanco Majid Khan Santhosh Kumar Grandai. Software Engineering Fall 2002.
Exploiting semantic technologies to build an application ontology
Using ArrayExpress.
Presentation transcript:

The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute

Outline Microarray data and standards overview ArrayExpress overall principles ArrayExpress architecture AE repository AE data warehouse Future plans and conclusions

Samples Genes Gene expression levels – problem 2 Sample annotations problem 1 Gene annotations Gene expression matrix Gene expression data and annotation

Platform comparison (Tan et al, PNAS, 2003) Our conclusion was very straightforward: there was very little overlap in the types of data in terms of differential expression (Margareth Cam, NIH)

hybridisation labelled nucleic acid array RNA extract Sample Array design hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid Microarray RNA extract Sample Experiment Gene expression data matrix normalization integration Protocol genes

Array scans Spots Quantitations Genes Samples Different processing levels of MA data A B C D

MGED standards MIAME – minimum information about a microarray experiment MAGE-OM and MAGE-ML – microarray gene expression object model and mark- up language MO – microarray ontology Data normalisation and transformations (and quality control)

BioEvent Experiment ArrayDesign BioMaterial BioAssayData BioAssay DesignElement UML Packages of MAGE HigherLevelAnalysis BioSequence Array QuantitationType Description Protocol Measurement AuditAndSecurity BQS what was used what was done results miscellaneous

MAGE – an example diagram

ArrayExpress aims An archive for microarray data supporting scientific publications Providing easy access to public gene expression and other to microarray data in a structured format Facilitating the sharing of microarray designs and protocols Facilitating the establishment of infrastructure for microarray data sharing

AE users Experimentalists Single-gene biologists Bioinformaticians; genome-wide studies Bioinformaticians – algorithm developers Software developers

ArrayExpress repository Other Microarray Databases (SMD, TIGR, Utrecht, RZPD) ww w EBI Expression Profiler External Databases (EMBL, UniProt, Ensemble) Data analysis Queries, analysis MIAMExpress Submissions Array Manufacturers (Affymetrix, Agilent) Data Analysis Software (R/Bioconductor, J-Express, Resolver) Submissions Warehouse (Biomart) ArrayExpress infrastructure Submission tracking/ curation tool External MIAMExpress installations (Camb. U., EMBL) ww w MAGE-ML Analysis ArrayExpres MAGE-ML

AE: overall principles Adherence to community standards Data captured in a granular, formalized manner Modern but proven software technologies Incremental development

AE design considerations Separate data archiving from the query- optimized data warehouse Generate default implementation, then refine –~2 full-time developers –pressure to bring system online quickly Use object abstraction layer –deal with performance overhead on case-by- case basis

Web page template Tomcat Curation environment Oracle DB MAGE-ML DTD MAGE-OM MAGE-ML (doc) MAGE-ML document MAGE loader Velocity Castor object/ relational mapping Java servlets MAGE validator MAGE unloader error.log Web page template Repository architecture overview

AE schema -Why auto-generated? –AE must be able to import any valid MAGE-ML and not lose information –good for navigating through data in terms of object model –if some queries dont work well, add something to the schema Experiment-Biomaterial, Experiment-Protocol links –so far works for 400Gb of data

Auto-generated web pages

To ontologize or not to ontologize At the beginning:At the end:

To ontologize or not to ontologize At the beginning:At the end:

Model vs. ontology Model – stable; ontologies – flexible Adding/modifying/deleting attributes – easy; adding/modifying/deleting associations – hard Therefore: attributes and their types in ontologies, domain structure (classes + associations) in the model

Experiment1 type performer …. Hybridization data 1 Experimental factors Quantitation type definitions … > data points NetCDF

Data warehouse schema

What BioMart gives to AEDW Query language abstraction –Joins automatically generated Schema optimized for performance Clear database integration roadmap

ArrayExpress environment

Future plans Data management environment automation Flexible data warehouse interface Programmatic interface (HTTP/XML based) Distributed infrastructure??

Distributed data infrastructure ArrayExpress A local database A local database A local database Query broker Users query find resource deliver data

Conclusions Conceptual object modeling works well for complex life sciences domains Many software infrastructure components can be auto-generated from object models A range of approaches can be used for modeling, e.g., UML framework + ontologies Repository and data warehouse – different aims and different implementation principles

Acknowledgements Gonzalo Garcia Lara - web interface Ahmet Oezcimen - DBA Anjan Sharma - curation tool Sergio Contrino, Richard Coulson – data warehouse Niran Abeygunawardena – webmaster Mohammadreza Shojatalab – MIAMExpress Misha Kapushesky – Expression Profiler Curation team: –Helen Parkinson, Ele Holloway, Gaurab Mukherjee, Anna Farne, Tim Rayner Domain-specific projects: –Susanna Sansone, Philippe Rocca- Serra Alvis Brazma MGED collaborators –Stanford, TIGR, Affymetrix, EMBL, …. BioMart team