Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,

Similar presentations


Presentation on theme: "Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,"— Presentation transcript:

1 Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Microarrays, one of the latest breakthroughs in experimental molecular biology, are producing considerable amounts of gene expression and other functional genomics data. The handling, storage, and analysis of these data are becoming the major bottlenecks in the utilization of the microarray technology. Storing and annotating these data is not a trivial problem due to many reasons. The raw microarray data are images, which have to be transformed into gene expression matrices -- tables, where rows represent genes, columns represent various samples such as different tissues, and values at each position characterizing the expression level of the particular gene in the particular sample. This process is not a trivial one due to replicate measurements, replicate spots, different oligos reporting information about the expression level of the same gene, problems with sequence homology and potential cross- hybridisation, cross-platform comparisons, and so forth. The high-level gene expression matrices representing genes and respective expression levels, also have to be integrated with other genomic data and analysed further, if any knowledge about the underlying biological processes is to be extracted (see [1]). The European Bioinformatics Institute initiated an international effort to establish standards for microarray data representation, annotation and exchange [2]. Recommendations of MIAME - The Minimum Information About a Microarray Experiment - specify the minimum information that must be reported about a microarray (or any DNA array) based gene expression monitoring experiment in order to ensure the interpretability, as well as potential verification of the results by third parties. An XML based data exchange format - Microarray Markup Language (MAML) is being developed in collaboration with Microarray Gene Expression Database (MGED) Group (see www.mged.org). EBI is establishing a database ArrayExpress, a public repository for microarray data, which will accept data in MAML format. Expression Profiler, a set of online tools for gene expression data analysis has been developed at the EBI and is available for public use (www.ebi.ac.uk/microarray). The analysis software in the Expression Profiler facilitates the clustering, exploration, and visualization of the gene expression data, as well as linking the analysis results to tools and databases elsewhere. Expression Profiler includes tools that assist with the analysis of expression data in connection with other data types. Currently, the DNA sequence data can be analysed and visualized as well as expression data, permitting users to discover, study, and visualize putative transcription factor binding sites [3]. One of the prospects of analysing microarray data is a reverse engineering of gene regulatory networks from gene expression and other genomics data. We have been successfully using our tools for in silico prediction of transcription factor binding sites [3]. Furthermore, we are developing models for describing gene regulatory networks, and use this modelling approach to find insights into the regulation of gene expression in response to the activity of other molecules in the cell as well as extracellular signals. ArrayExpress – a public repository for microarray data Helen Parkinson, Mohammadreza Shojatalab, Ugis Sarkans and Alvis Brazma European Bioinformatics Institute (EMBL-EBI), – Hinxton Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Publication 6 parts of a microarray experiment HybridisationArray External Databases for Gene/sequence id’s Sample Ontologies for sample Description Data Experiment Normalisation MIAME based ArrayExpress Conceptual Model MIAME – six parts of a microarray experiment Login/contact info Experiment submission Browse existing arrays from ArrayExpress Sample details, n/n samples Authors Laboratory Protocols Pending exp. submissions Experiment details Array Submission Pending array submissions Hybs prot./sample hybs Data upload Qualifier,V, S Browse Public /User protocols Add new protocol Protocol sub Protocol top page Extract Hyb Label Scan Other Overview Samples  change Hybs  change Protocols  change Authors  change Data files  change Overview Samples  change Hybs  change Protocols  change Authors  change Data files  change Submit Qualifier,V,S Array protocol Minimum Information About A Microarray Experiment (MIAME) suggests that recorded information should be sufficient to interpret and replicate the experiment and the information should be structured so that querying and automated data analysis and mining are feasible 1 2 MIAME based Submission Tool 3 1 2 3 The ArrayExpress model is designed around the 6 MIAME sections. The prototype submission tool is a GUI implementation of the MIAME questionnaire (www.mged.org/Annotations- wg/index.html). The submission tool writes to a mysql database which retains the MIAME concept but doesn’t support the complex query capability of the full ArrayExpress model, as this is not required for data submission. The schema for the submission tool and screenshots are shown in below left.www.mged.org/Annotations- wg/index.html The tool is currently set up as a generic submission tool for all species but has the potential for species specific or experiment specific implementations. Additionally it will be freely available for use as a LIMS for users who have limited local bioinformatics support. It has been designed for small scale users and has full contextual help and will be supported by ArrayExpress database staff. The tool will be further developed according to user need. Large scale users with local databases are expected to use the MAGE-ML (XML based) data submission format, this process is analogous to the way that sequencing centres deposit data into sequence databases. One of the most important requirements of MIAME is sample annotation. Complete and accurate sample description is complex and will require the construction of an ontology and inclusion of controlled vocabularies which are referenced by the submission tool. The ontology will need to encompass, tissues, cell lines, developmental stages, disease states, compounds, drugs, strains (and just about anything else that you can think of) related to a microarray experiment sample. Some of these terms have been defined and included into an ontology by Chris Stoeckert (U.Penn.) as part of the MGED (Microarray gene expression database group) ontology working group. The submission tool will be used as a source of terms and controlled vocabulary for the MGED ontology. Acronym Key ArrayExpressthe public database based on MAGE-OM MAGE-OMmicroarray gene expression object model, developed by MGED and Rosetta and submitted to OMG (Object Managment Group) for adoption as a specification for expression data exchange. MGED The MGED group is an open discussion group established at the Microarray Gene Expression Database meeting MGED 1 (1999). The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. The underlying goal is to facilitate the establishment of gene expression data repositories, comparability of gene expression data from different sources and interoperability of different gene expression databases and data analysis software (www.mged.org)www.mged.org MAGE-ML Microarray gene expression mark-up language an XML data exchange format able to capture MIAME, based on MAGE-OM 4 ArrayExpress Prototype Query Interface 4 A prototype query interface has been developed for the ArrayExpress database, this supports complex queries across biosource, experimental factors etc.


Download ppt "Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,"

Similar presentations


Ads by Google