Presentation is loading. Please wait.

Presentation is loading. Please wait.

ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.

Similar presentations


Presentation on theme: "ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED."— Presentation transcript:

1 ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED V, Tokyo 2002

2  Introducing ArrayExpress  Submission methods  Ontologies and Annotation  The Future Talk structure

3 Infrastructure at the EBI ArrayExpress (Oracle) Other public Microarray Databases (GEO, CIBEX) ww w EBI Expression Profiler External Bioinformatic databases Data analysis ww w Queries ww w MIAMExpress (MySQL) MAGE-ML Submissions MAGE-ML Array Manufacturers LIMS Microarray software Data Analysis software MAGE-ML Export Local MIAMExpress Installations MAGE-ML files Submissions MAGE-ML pipelines

4 ArrayExpress  Public database for gene expression data, one of three including CIBEX and GEO  Aims to store well annotated (MIAME compliant) and well structured data  MIAME  Recorded info should be sufficient to interpret and replicate the experiment  Information should be structured so that querying and automated data analysis and mining are feasible* *Brazma et al,.Nature Genetics, 2001

5 ArrayExpress conceptual model Publication External links HybridisationArraySample Source (e.g., Taxonomy ) Experiment Normalisation Gene (e.g., EMBL ) Data

6 ArrayExpress details  Database schema derived from MAGE-OM  Standard SQL, we use Oracle  Validating data loader for MAGE-ML  Web interface – Queries - experiment, array, sample – Browsing – views on expt  Object model-based query mechanism, automatic mapping to SQL

7 Data Submission to ArrayExpress  Via MAGE-ML pipeline  From LIMS/local database  Current direct MAGE-ML submitters: Sanger, TIGR, Paul Spellman, Affymetrix  Via external tools e.g BASE, J-express  Via MIAMExpress  Accepts Array, Experiment and Protocol submissions  Provides accession numbers

8 ArrayExpress curation effort  User support and help documentation  Curation at source (not destination)  Support on ontologies and CV’s  Minimize free text, removal of synonyms  MIAME encouragement  Help on MAGE-ML  Goal: to provide high-quality, well- annotated data to allow automated data analysis

9 Public Data Access  Data export as tab delimited file  Export to Expression profiler  As MAGE-ML, from query interface  Arrays exportable as tab delimited file

10

11 Submission Tool

12 MIAMExpress submission and annotation tool  Based on MIAME concepts and questionnaire  Perl-CGI, MySQL database  Experiment, Array, Protocol submissions  Generic annotation tool  Exports MAGE-ML

13 What makes up a submission ? Final data ArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationExperiment Protocols Normalisation

14 Array Definition Format  Tab delimited file format that can be parsed to MAGE-ML  Defines relationships between features and sequences  Provides sequence annotation  Example in the Agenda, open letter to journals

15 Example BioSequence annotation

16 Sample annotation  Gene expression data only have meaning in the context of detailed sample descriptions  If the data is going to be interpreted by independent parties, sample information has to be searchable and in the database  Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description  None of this is trivial

17 What Does an Ontology Do?  Captures knowledge  Creates a shared understanding – between humans and for computers  Makes knowledge machine processable  Makes meaning explicit – by definition and context  It is more than a controlled vocabulary, it has structure

18 MGED Biomaterial (sample) Ontology  Under construction – by MGED OWG – Using OILed  Motivated by MIAME and coordinated with the database model (mapping available)  We are extending classes, providing constraints, defining terms, and adding terms  Extension to include other MAGE-OM required ontology entries and cv’s

19 MGED BioMaterial Internal Terms

20

21 Internal and External Terms combined

22 Examples of external ontologies and cv’s  NCBI taxonomy database  Jackson Lab mouse strains and genes  Edinburgh mouse atlas anatomy  HUGO nomenclature for Human genes  Chemical and compound Ontologies, e.g. CAS  TAIR  Flybase anatomy  GO (www.geneontology.org)www.geneontology.org  GOBO ontologies

23 Example Annotation  Sample source and treatment description, and its correct annotation using the MGED BioMaterial Ontology classes and corresponding external references: “Seven week old C57BL/6N mice were treated with fenofibrate. Liver was dissected out, RNA prepared”

24 ©-BioMaterialDescription ©-Biosource Property ©-Organism ©-Age ©-DevelopmentStage ©-Sex ©-StrainOrLine ©-BiosourceProvider ©-OrganismPart ©-BioMaterialManipulation ©-EnvironmentalHistory ©-CultureCondition ©-Temperature ©-Humidity ©-Light ©-PathogenTests ©-Water ©-Nutrients ©-Treatment ©-CompoundBasedTreatment (Compound) (Treatment_application) (Measurement) MGED BioMaterial Ontology Instances 7 weeks after birth Female Charles River, Japan 22  2  C 55  5% 12 hours light/dark cycle Specified pathogen free conditions ad libitum MF, Oriental Yeast, Tokyo, Japan in vivo, oral gavage 100mg/kg body weight External References NCBI Taxonomy Mouse Anatomical Dictionary International Committee on Standardized Genetic Nomenclature for Mice International Committee on Standardized Genetic Nomenclature for Mice Mouse Anatomical Dictionary ChemIDplus Mus musculus musculus id: 39442 Stage 28 C57BL/6 Liver Fenofibrate, CAS 49562-28-9

25 Future  Data acquisition for ArrayExpress  Update ArrayExpress to newest MAGE-OM  V2.0 MIAMExpress, domain specific, portable  Further ontology development and integration into tools, use of OWL  Curation tools (other than grep, Perl scripts)  Improved query interface for AE  ArrayExpress update tool

26 Resources  Schemas for both ArrayExpress and MIAMExpress, access to code  MAGE-ML examples, Arrays, Expts, Protocols  MIAME glossary, MAGE-MIAME-ontology mappings  List of ontology resources from MGED pages  Help in establishing pipelines  MGED software MAGEstk  Curation, help and advice www.mged.orgwww.mged.org www.ebi.ac.uk/arrayexpress

27 Acknowledgments  Microarray Informatics Team, EBI, esp. Alvis Brazma, Ugis Sarkans, Mohammad Shojatalab, Ele Holloway, Gaurab Mukherjee, Philippe Rocca-Serra, Susanna Sansone  Chris Stoeckert, U. Penn.  Members of MGED  Sanger Institute - Rob Andrews, Jurg Bahler, Adam Butler, Kate Rice,  EMBL Heidelberg - Wilhelm Ansorge, Martina Muckenthaler,Thomas Preiss

28 ‘’I think you should be more explicit here in step two’’ ‘The miracle of microarray data analysis’ Genome Biology 2001, 2 (9)


Download ppt "ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED."

Similar presentations


Ads by Google