Presentation is loading. Please wait.

Presentation is loading. Please wait.

ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics.

Similar presentations


Presentation on theme: "ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics."— Presentation transcript:

1 ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone sansone@ebi.ac.uk Microarray Informatics Team European Bioinformatics Institute (EBI) Hoffmann-La Roche The European Bioinformatics Institute

2 Acknowledgments  Microarray Informatics Team, EBI, esp.: Alvis Brazma Helen Parkinson Mohammad Shojatalab Ugis Sarkans  Industry Support team, EBI  MGED steering committee  MIAME working group  Chris Stoeckert, U. Penn. and members of MGED The European Bioinformatics Institute

3 Talk structure The European Bioinformatics Institute  Part I= ArrayExpress at EBI: A public repository for gene expression data  Demo= MIAMExpress: Submission/annotation tool  Part II= ILSI-HESI IMD: Toxicogenomics data transfer to ArrayExpress

4 Part I - Talk structure  Data standardization: MGED group MIAME concepts MGED Ontology  Uses of MIAME concepts: ArrayExpress database MAGE-OM the object model  Data flow in – out ArrayExpress The European Bioinformatics Institute

5 Part I - Talk structure  Data standardization: MGED group MIAME concepts MGED Ontology The European Bioinformatics Institute

6 Data standardization - MGED  MGED = Microarray Gene Expression Db EBI+world’s largest labs (TIGR, Sanger, Stanford, Agilent, Affymetrics, etc.) www.mged.org  Aims Facilitate adoption of standards: –Annotation –Data representation Introduce: –Experimental controls –Data normalization methods The European Bioinformatics Institute

7 Data standardization - Why?  Size of dataset  Different platforms - nylon, glass  Different technologies - oligos, spotted  References to external db not stable!  Gene expression data only have a meaning in the context of a detailed experiment description The European Bioinformatics Institute

8 MIAME- Minimum Information About Microarray Experiment The European Bioinformatics Institute  MGED group has published: MIAME v1.0 doc ( Brazma et al., Nature Gen, 2001 )  Minimum information that must be reported about a microarray experiment in order to ensure: its interpretability potential verification of the results

9 MIAME- Minimum Information About Microarray Experiment Publication External links Describes the 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute

10 MIAME - Experimental design Experiment 6 parts of a microarray experiment Normalisation Data Sample HybridisationArray Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication The set of the hybridisation experiments as a whole The European Bioinformatics Institute

11 MIAME - Experimental design  One/more hybridisations experiments in some way related and addressing related questions: Author, contact information, citations Type of experiment e.g.: –time course –normal vs diseased comparison Experimental factors – i.e. tested parameters in the experiment e.g.: –time –dose –response to a compound List of organisms used in the experiment List of platforms used The European Bioinformatics Institute

12 MIAME - Experimental design  List of samples, array and hybridisations and their relationship e.g.: SamplesS1, S2, S3 ArraysA1, A2, A3 Hybridisations:H1 is S1 and S2 on A1 H2 is S2 and S3 on A2 H3 is S1 and S2 on A3  Which hybridisations are replicates e.g.: H1 and H3 are replicates The European Bioinformatics Institute

13 MIAME - Experimental design  Quality related indicators e.g.: type of replicates  Free-text description of the experiment or link to an e-publication The European Bioinformatics Institute

14 MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment Hybridisation Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute Array

15 MIAME - Array design Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Each array used and each element (spot) on the array The European Bioinformatics Institute

16 MIAME - Array design  For the database, the array description should be normally submitted only once  For each physical array used in the experiment a unique ID and the array type are given  Array design related information e.g.: platform type = insitu synthesized or spotted, array provider, etc. surface type = glass, membrane, etc. The European Bioinformatics Institute

17 MIAME - Array design  Properties of each type of elements on the array, that are generated by similar protocols e.g.: synthesized oligos, PCR products, plasmids, colonies, etc.  Each element (spot) on the array: Elements may be simple or composite (Affymetrix) Each element must be identified by either the sequence, clone ID, PCR primer pair, or in any other unambiguous way Composite elements may be identified by a reference sequence Elements may be linked to genes (preferably) This information is normally provided in a separate file e.g.: –spreadsheet The European Bioinformatics Institute

18 MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute Sample

19 MIAME - Sample Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Samples used, the extract preparation and labelling The European Bioinformatics Institute

20 MIAME - Sample  Sample source e.g.: Organism Cell source and type Developmental stage Organism part (tissue) Animal/plant strain or line Genetic variation Disease state or normal Typically only some of these qualifiers are relevant and there is the need to implement the annotation for sample source ! (To be continued……)

21 The European Bioinformatics Institute MIAME - Sample  Sample treatment e.g.: in vivo / in vitro Compounds There is the need to implement the annotation for sample treatment ! (To be continued……)  Hybridisation extract preparation Laboratory protocol, including extraction method, whether RNA, mRNA, or genomic DNA is extracted, amplification method  Labelling Laboratory protocol, including amount of nucleic acids labelled, label used (e.g. Cy3, Cy5, 33P, etc)

22 MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment Array Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute Hybridisation

23 MIAME - Hybridisations Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Procedures and parameters The European Bioinformatics Institute

24  Laboratory protocol including: The solution e.g.: –concentration of solutes Blocking agent Wash procedure Quantity of labelled target used Time, concentration, volume, temperature Description of the hybridisation instruments The European Bioinformatics Institute MIAME - Hybridisations

25 MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) Experiment Normalisation The European Bioinformatics Institute Data

26 MIAME - Data Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute Images, quantitation, specifications

27 MIAME - Data The European Bioinformatics Institute Conditions Genes Gene expression levels Final data Raw data Array scans Intermediate data Spots Quantitations Spot quantitations  Three data processing levels:

28 MIAME - Data The European Bioinformatics Institute  Why three data processing levels? Each experiment uses different units! Non reliable information  Lack of gene expression measurement units!  What do we do in absence of standards? Record raw, intermediate and final analysis data Together with detailed annotation on the analysis  This passes on the responsibility of interpreting the final data to the user

29 MIAME - Data The European Bioinformatics Institute Raw data Array scans  The scanner image file e.g.: TIFF, DAT  Scanning information: Scan parameters: – laser power – spatial resolution – pixel space – PMT voltage Laboratory protocol for scanning Scanning hardware and software  No MGED consensus on raw data!!

30 MIAME - Data The European Bioinformatics Institute Intermediate data Spots Quantitations Spot quantitations  Image analysis and quantitation: Complete image analysis output for each element normally given as separate file e.g.: – spreadsheet  Image analysis information: Image analysis software specifications All parameters

31 MIAME - Data The European Bioinformatics Institute  Summarised information from possible replicates: Derived measurement values summarising related elements as used by the author Reliability information for these values given as separate file, e.g.: – spreadsheet Specifications of these two e.g.: – median value of the replicates, standard deviation Conditions Genes Gene expression levels Final data

32 MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute

33 MIAME - Normalisation Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute A typical experiment involves a number of hybridisations in which the data from multiple samples are analysed and compared For this comparison, the reported hybridisation intensities (from the image processing) must be first normalised

34 MIAME - Normalisation  Normalisation adjust for a number of technical variations between and within hybridisation  Normalisation strategy e.g.: Spiking Housekeeping gene Total array  Normalisation algorithm  Control array elements  Hybridisation extract preparation The European Bioinformatics Institute

35 6 parts of a microarray experiment Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment  Annotation implementations required Gene expression data only have a meaning in the context of a detailed sample (source-treatment) and array (gene) description The European Bioinformatics Institute MIAME - Annotation

36 MIAME - Gene annotation Normalisation Data Sample Hybridisation Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute Source (e.g. Taxonomy)  Unambiguous identification: Interpret data  !!Synonyms!! Alternative to gene names Community approved names  Usable external sources e.g.: EMBL-GenBank (sequence acc#) Jackson Lab (approved mouse gene names) HUGO (approved human gene names)

37 MIAME - Sample annotation Normalisation Data Sample Hybridisation Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute  Unambiguous identification: Interpret data  Usable external sources e.g.: NCBI Taxonomy (organisms) Jackson Lab (mouse strains) Mouse Atlas (mouse anatomy) Merck Index, CAS # (compounds)  CVs and ontologies are needed: Reduce free-text description Facilitate data queries-analysis Source (e.g. Taxonomy)

38 What are CV and Ontology? The European Bioinformatics Institute  CV = Controlled Vocabulary: Set of restrictive terms used to describe something, in the simplest case it could be a list  Ontology: Describes the relationship between the terms in a structured way Provides semantics and constraints Allows for computational inferences and reliable comparisons

39 Ontology example  Build an ontology for e.g.: Affymetrics GeneChip Rat Toxicology U34 Array (Top Level Class) Array element type (Sub-Class) oligos (slot constraint) manufactured by Affymetrics (instance) GeneChip Rat Toxicology U34 Array The European Bioinformatics Institute

40 MIAME - MGED Ontology  MGED Sample (BioMaterial) ontology: Under construction by Chris Stoeckert www.cbil.upenn.edu/Ontology/MGED_ontology.html Motivated by MIAME Defines terms, provides constraints, develops CVs for microarray experiment submissions Links also to external CVs and ontologies The European Bioinformatics Institute

41 MIAME – Q,V,S triplets  MIAME definitions include the Q,V,S triplets: User defined ‘qualifier, value, source’ triplet Used to describe a new term –qualifier = what the term describes (cell type) –value = its value (epithelial) –source = its source (Gray’s anatomy- 38 th ed.) User defined terms are added to the MGED ontology The European Bioinformatics Institute

42 Part I - Talk structure  Data standardization: MGED group MIAME concepts MGED Ontology  Uses of MIAME concepts: ArrayExpress database MAGE-OM the object model The European Bioinformatics Institute

43 Uses of MIAME concepts The European Bioinformatics Institute  Specifies the content of the information: Sufficient information must be recorded to: – Correctly interpret – Replicate the experiments Structured information must be recorded to: – Correctly retrieve – Analyse the data  Uses: Creation of MIAME-compliant databases e.g.: – ArrayExpress at EBI Development of submission/annotation tool for generating MIAME-compliant information e.g.: – MIAMExpress

44 ArrayExpress  A public repository for gene expression data  MIAME-compliant The European Bioinformatics Institute HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) Data Experiment Normalisation Top level structure (conceptual model):

45 The European Bioinformatics Institute  MAGE-OM Microarray Gene Expression Object Model: MIAME compliant Standard Joint submission to OMG, 2001, by MGED and Rosetta – OMG (Object Management Group) is an international non-profit software consortium that is setting standards in the area of distributed object computing ArrayExpress- Object Model

46 The European Bioinformatics Institute  MAGE-ML Mark-up Language: Derived from MAGE-OM Describe and communicate MIAME information DTD = ‘predominantly’ computer readable……  UML Unified Modelling Language: UML specifications are used to develop and describe MAGE-OM UML = ……human readable ArrayExpress- Object Model

47 MAGE-OM - UML specifications Related classes are grouped together in packages MAGE-OM has 16 packages Class name Attributes Top level class=package Packages linked to each other by reference Class describes objects Relationships

48 MAGE-OM mapping to MIAME The European Bioinformatics Institute HybridisationArraySample DataExperiment Normalisation + other 7 “auxiliary” packages: AuditandSecurity, Protocol, Measuraments, BioEvent, BQS, Description, HighLevelAnalysis ExperimentDesign BioAssay ArrayDesign, ArrayManufacture, BioSequence, DesignElement BioMaterial BioAssayData, QuantitationType

49 Part I - Talk structure  Data standardization: MGED group MIAME concepts MGED Ontology  Uses of MIAME concepts: ArrayExpress database MAGE-OM the object model  Data flow in – out ArrayExpress The European Bioinformatics Institute

50 Data flow in-out ArrayExpress Users EBI Web server Browse-Query central database data warehouse ArrayExpress The European Bioinformatics Institute curation tool database image server Update MAGE-ML Output Loader MIAMExpress Submission LIMS Submission MIAMExpress

51 Data flow in-out ArrayExpress Users EBI Web server Browse-Query central database data warehouse ArrayExpress The European Bioinformatics Institute curation tool database image server Update Output Loader MIAMExpress Submission LIMS Submission MIAMExpress MAGE-ML  MIAME compliant  Data model implemented in ORACLE  Deals with: Raw data Processed data Data transformation  Independent of: Experimental platform Image analysis method Normalization method

52 Data flow in-out ArrayExpress Users EBI Web server Browse-Query central database data warehouse ArrayExpress The European Bioinformatics Institute curation tool database image server Update Output Loader Submission LIMS Submission MIAMExpress MAGE-ML MIAMExpress  Submission/annotation tool  Generates MIAME-compliant information  Beta-testers  Demo version (general)  Target specific interfaces e.g.: Specie specific Toxicology specific

53 Talk structure The European Bioinformatics Institute  Part I= ArrayExpress at EBI: A public repository for gene expression data  Demo= MIAMExpress: Submission/annotation tool

54 Talk structure The European Bioinformatics Institute  Part I= ArrayExpress at EBI: A public repository for gene expression data  Demo= MIAMExpress: Submission/annotation tool  Part II= ILSI-HESI IMD: Toxicogenomics data transfer to ArrayExpress

55 Part II - Talk structure  Data transfer from IMD to ArrayExpress: Can data be parsed? MIAME-compliant?  Toxicology specific MIAMExpress interface: ILSI toxicogenomics data submission  Areas of collaboration-Summary The European Bioinformatics Institute

56 Part II - Talk structure  Data transfer from IMD to ArrayExpress: Can data be parsed? MIAME-compliant? The European Bioinformatics Institute

57 Data parsing?  From IMD to ArrayExpress: Lexical parsing –Mapping information to MAGE-OM !! Semantic parsing !! –Glossary issues The European Bioinformatics Institute

58 Normalisation The European Bioinformatics Institute Sample Hybridisation Array Data Experiment ExperimentDesign IMD = Experimental condition description ?? Experimental design (study) ?? Data mapping - Semantics!

59 Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array Data mapping - Semantics! IMD=chip, microarray chip !! Synonyms !!

60 Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array ArrayManufacture, Biosequence ArrayManufacture, Biosequence IMD=chip description, microarray chip description !! Synonyms !! Data mapping - Semantics!

61 Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array IMD=chip design, microarray chip design !! Synonyms !! Biosequence Data mapping - Semantics!

62 Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array PlatformType IMD=platform, microarray platform, microarray platform type !! Synonyms !! Data mapping - Semantics!

63 MIAME - compliant?  IMD MIAME-compliant? “Minimal system” for data exchange Comparisons  Current status for toxicogenomic data: Non-MIAME compliant  Additional information required: To be flagged as MIAME compliant To build queries to the database: – ArrayExpress has a object model query mechanism  Why additional information? The European Bioinformatics Institute

64 ILSI-HESI Objective The European Bioinformatics Institute  ILSI-HESI objective: To have publicly available information to assist in developing consensus on potential applications and interpretation of microarray data with respect to mechanism-based risk assessment To critically assess the potential utility of these new method for the process of hazard identification  Toxicologists (other than ILSI-HESI members) Can correctly interpret and replicate the toxicogenomics experiments Can correctly retrieve and analyse the toxicogenomics data  Sufficient and structured information must be recorded in order to achieve ILSI-HESI objective

65 IMD - Data  Three type of data: Required: –fold_change of spot intensity Optional: – relative_intensity – coefficient_variation of relative_intensity Additional: –present/absent/marginal_call (for Affymetrics) –P_value (for replicates) The European Bioinformatics Institute

66 MIAME compliant - Data The European Bioinformatics Institute Conditions Genes Gene expression levels Final data Raw data Array scans Intermediate data Spots Quantitations Spot quantitations  Requirements:

67 The European Bioinformatics Institute  Why three data processing levels? Lack of gene expression measurement units!  What do we do in absence of standards? Record raw, intermediate and final analysis data Together with detailed annotation on the analysis  This allows toxicologists (other than ILSI-HESI members) to interpret the final data  Increase the value of toxicology data by achieving ILSI-HESI objective To give a critical mass to the ILSI-HESI studies MIAME compliant - Data

68 IMD – Experiment description The European Bioinformatics Institute  Hepatotoxicity e.g.: Oral (gavage) Study in Male SD Rats on Methapyrilene

69 AuditandSecurity Array Experiment ExtractionProtocol ImageAnalysisProtocol LabellingProtocol Sample= TreatmentAppl. Sample= Treatment Sample= Org. Sample= BioSource Normalization Sample= Treatment Sample= BioSourceProvider Normalization Sample= ?

70 IMD – Experiment description The European Bioinformatics Institute  Good level of information  Still incomplete to be MIAME compliant e.g.: Detailed protocols required e.g.: – Hybridization chamber type, scanner type, label quantity etc.  Need for : CV and ontologies

71 ChemID: 3 systematic names and 39 synonyms !!

72 The European Bioinformatics Institute Excerpt from Sample Description courtesy of M. Hoffman, S. Schmidtke, Lion BioSciences Organism: Mus musculus [ NCBI taxonomy browser ] Cell source: in-house bred mice (contact: person@somewhere.ac.uk) Sex: female [ MGED ] Age: 3 - 4 weeks after birth [ MGED ] Growth conditions: normal controlled environment 20 - 22 o C average temperature housed in cages according to EU legislation specified pathogen free conditions (SPF) 14 hours light cycle 10 hours dark cycle Developmental stage: stage 28 (juvenile (young) mice) [ GXD "Mouse Anatomical Dictionary" ] Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ] Strain or line: C57BL/6 [ International Committee on Standardized Genetic Nomenclature for Mice ] Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [ International Committee on Standardized Genetic Nomenclature for Mice ] Treatment: in vivo [ MGED ] [ intraperitoneal ] injection of [ Dexamethasone ] into mice, 10 microgram per 25 g bodyweight of the mouse Compound: drug [ MGED ] synthetic [ glucocorticoid ] [ Dexamethasone ], dissolved in PBS

73 Part II - Talk structure  Data transfer from IMD to ArrayExpress: Can data be parsed? MIAME-compliant?  Toxicology specific MIAMExpress interface: ILSI toxicogenomics data submission  Areas of collaboration-Summary The European Bioinformatics Institute

74 Toxicology specific MIAMExpress  Toxicology specific interface options: in vivo or in vitro Study specific (Hepatotoxicity, Nephrotoxicity, Genotoxicity)  CVs and ontologies to be developed: CVs in pull down menus ‘Q,V,S’ users driven ontologies Extend MGED ontology to include toxicology specifics terms  Dynamic, fast and easy to use  Browse: Protocols Arrays The European Bioinformatics Institute

75 Areas of collaboration  Data transfer: Parser from IMD to ArrayExpress (MAGE-ML) Additional information required: –MIAME compliant flag (e.g. data, protocols, sample pooling etc.) –Build complex queries  Data submission: Submission via toxicology specific MIAMExpress –CVs and ontologies –Interfaces options –Protocols  Other data: Volume (79 from Hetapotoxicity) Clinical chemistry, Histophatology –Format (images also?) and volume  Mailing list The European Bioinformatics Institute


Download ppt "ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics."

Similar presentations


Ads by Google