ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

Slides:



Advertisements
Similar presentations
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Advertisements

Program Management Portal: Overview for the Client
Microsoft Excel 2003 Illustrated Complete Excel Files and Incorporating Web Information Sharing.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Gene Ontology John Pinney
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Tutorial 11: Connecting to External Data
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Classroom User Training June 29, 2005 Presented by:
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL
Gene Expression Omnibus (GEO)
The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.
EBI is an Outstation of the European Molecular Biology Laboratory. EBI Bioinformatics Roadshow ILRI/BecA Nairobi Campus 2 nd - 3 rd March 2011.
Copyright OpenHelix. No use or reproduction without express written consent1.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
Objectives of ViroTica-Db : database on European ressources and centres of activity ➢ To provide an on-line European database linked to existing web sites.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
ITGS Databases.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Gene Expression Omnibus (GEO)
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Mining Functional Genomics Data ArrayExpress and Gene Expression Atlas: Amy Tang, PhD ArrayExpress Production Team Functional Genomics.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Dr Sarah Morgan Training team
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Describing Bioinformatic Metadata at EBI James Malone
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Describing and Annotating Experimental Data: Hands On.
Invoices and Service Invoices Training Presentation for Raytheon Supply Chain Platform (RSCP) April 2016.
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
ArrayExpress Ugis Sarkans EMBL - EBI
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
American Diploma Project Administrative Site Training New Jersey.
ArrayExpress and Gene Expression Atlas:
T3/Tutorials: Data Submission
Using ArrayExpress.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
How to store and visualize RNA-seq data
Gene Expression Omnibus (GEO)
Welcome to the Quantitative Trait Loci (QTL) Tutorial
Welcome - webinar instructions
Manage Sourcing - Supplier
Presentation transcript:

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

ArrayExpress2 Talk structure  Why do we need a database for functional genomics data?  ArrayExpress database Archive Gene Expression Atlas  ArrayExpress content  How to query the database  How to download data  How to submit data

What is functional genomics (FG)? The aim of FG is to understand the function of genes and other parts of the genome FG experiments typically utilize genome-wide assays to measure and track many genes (or proteins) in parallel under different conditions High-throughput technologies such as microarrays and high-throughput sequencing (HTS) are frequently used in this field to interrogate the transcriptome 3ArrayExpress

What biological questions is FG addressing? When and where are genes expressed? How do gene expression levels differ in various cell types and states? What are the functional roles of different genes and in what cellular processes do they participate? How are genes regulated? How do genes and gene products interact? How is gene expression changed in various diseases or following a treatment? 4ArrayExpress

Components of a FG experiment ArrayExpress5

ArrayExpress  Is a public repository for FG data, which provides easy access to well annotated data in a structured and standardized format  Serves the scientific community as an archive for data supporting publications, together with GEO at NCBI and CIBEX at DDBJ  Facilitates the sharing of experimental information associated with the data such as microarray designs, experimental protocols,……  Based on community standards: MIAME guidelines & MAGE-TAB format for microarray, MINSEQE guidelines for HTS data ( ArrayExpress6

Reporting standards for sequencing MINSEQE checklist  Minimal Information about a high-throughput Nucleotide SEQuencing Experiment  The proposed guidelines for MINSEQE are (still work in progress): 1.General information about the experiment 2.Essential sample annotation including experimental factors and their values (e.g. compound and dose) 3.Experimental design including sample data relationships (e.g. which raw data file relates to which sample, ….) 4.Essential experimental and data processing protocols 5.Sequence read data with quality scores, raw intensities and processing parameters for the instrument 6.Final processed data for the set of assays in the experiment ArrayExpress7

MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray experiment. We adapted it to handle HTS data: IDFInvestigation Description Format file, contains top-level information about the experiment including title, description, submitter contact details and protocols. SDRFSample and Data Relationship Format file contains the relationships between samples and arrays, as well as sample properties and experimental factors, as provided by the data submitter. Data filesRaw and processed data files. The ‘raw’ data files are the trace data files (.srf or.sff). Fastq format files are also accepted, but SRF format files are preferred. The trace data files that you submit to ArrayExpress will be stored in the European Nucleotide Archive (ENA).European Nucleotide Archive The processed data file is a ‘data matrix’ file containing processed values, e.g. files in which the expression values are linked to genome coordinates. 8 Standards for microarray & sequencing MAGE-TAB format HTS data in ArrayExpress and Atlas

ArrayExpress9 ArrayExpress – two databases

What is the difference between them? ArrayExpress10 ArrayExpress Archive Central object: experiment Query to retrieve experimental information and associated data Expression Atlas Central object: gene/condition Query for gene expression changes across experiments and across platforms

ArrayExpress – two databases ArrayExpress11

ArrayExpress Archive – when to use it? Find FG experiments that might be relevant to your research Download data and re-analyze it. Often data deposited in public repositories can be used to answer different biological questions from the one asked in the original experiments. Submit microarray or HTS data that you want to publish. Major journals will require data to be submitted to a public repository like ArrayExpress as part of the peer-review process. ArrayExpress 12

How much data in AE Archive? ArrayExpress13 GEO import

ArrayExpress14 Browsing the AE Archive

The direct link to raw and processed data. An icon indicates that this type of data is available. The total number of experiments and assay retrieved Species investigated Curated title of experiment The date when the data were loaded in the Archive AE unique experiment ID Number of assays The list of experiments retrieved can be printed, saved as Tab- delimited format or exported to Excel or as RSS feed loaded in Atlas flag Raw sequencing data available in ENA ArrayExpress15

ArrayExpress16 Browsing the AE Archive

Experimental factor ontology (EFO)  Application focused ontology modeling experimental factors (EFs) in AE – selected by default  Developed to: increase the richness of annotations that are currently made in AE Archive to promote consistency to facilitate automatic annotation and integrate external data  EFs are transformed into an ontological representation, forming classes and relationships between those classes  EFO terms map to multiple existing domain specific ontologies, such as the Disease Ontology and Cell Type Ontology ArrayExpress17

ArrayExpress18 Building EFO An example sarcoma cancer neoplasm disease Kaposi’s sarcoma Take all experimental factors sarcoma cancer neoplasm Kaposi’s sarcoma disease is the parent term is a type of disease is synonym of neoplasm is a type of cancer is a type of sarcoma Find the logical connection between them disease neoplasm cancer sarcoma Kaposi’s sarcoma [-] Organize them in an ontology

ArrayExpress19 Exploring EFO An example

Searching AE Archive Simple query ArrayExpress20

Searching AE Archive Simple query  Search across all fields: AE accession number e.g. E-MEXP-568 Secondary accession numbers e.g. GEO series accession GSE5389 Experiment name Submitter's experiment description Sample attributes, experimental factor and values, including species (e.g. GeneticModification, Mus musculus, DREB2C over-expression) Publication title, authors and journal name, PubMed ID  Synonyms for terms are always included in searches e.g. 'human' and 'Homo sapiens’ ArrayExpress21

AE Archive query output Matches to exact terms are highlighted in yellow Matches to synonyms are highlighted in green Matches to child terms in the EFO are highlighted in pink

23 RNA-seq data in AE Archive HTS data in ArrayExpress and Atlas

24 AE Archive – experiment view HTS data in ArrayExpress and Atlas

Master headline Link to raw data in ENA

26 AE Archive – experiment view HTS data in ArrayExpress and Atlas

SDRF file – sample & data relationship ArrayExpress27

ArrayExpress – two databases ArrayExpress28

Expression Atlas – when to use it? Find out if the expression of a gene (or a group of genes with a common gene attribute, e.g. GO term) change(s) across all the experiments available in the Expression Atlas; Discover which genes are differentially expressed in a particular biological condition that you are interested in. ArrayExpress 29

 The criteria we use for selecting experiments for inclusion in the Atlas are as follows: Array designs relating to experiment must be provided to enable re-annotation using Ensembl or Uniprot (or have the potential for this to be done) High MIAME scores Experiment must have 6 or more hybridizations Sufficient replication and large sample size EF and EFV must be well annotated Adequate sample annotation must be provided Processed data must be provided or raw data which can be renormalized must be available Expression Atlas construction Experiment selection criteria ArrayExpress30

ArrayExpress31  New meta-analytical tool for searching gene expression profiles across experiments in AE  Data is taken as normalized by the submitter  Gene-wise linear models (limma) and t-statistics are applied to calculate the strength of genes’ differential expression across conditions across experiments  The result is a two-dimensional matrix where rows correspond to genes and columns correspond to conditions, rather than samples.  The matrix entries are p-values together with a sign, indicating the significance and direction of differential expression Expression Atlas construction Analysis pipeline

ArrayExpress32 Expression Atlas construction

ArrayExpress33 Expression Atlas construction

ArrayExpress34 Expression Atlas

ArrayExpress35 Atlas home page Query for genes Query for conditions Restrict query by direction of differential expression The ‘advanced query’ option allows building more complex queries

Atlas home page The ‘Genes’ and ‘Conditions’ search boxes ArrayExpress36

Atlas home page A single gene query ArrayExpress37

Atlas gene summary page ArrayExpress38

Atlas experiment page ArrayExpress39

ArrayExpress40 Atlas experiment page – HTS data

ArrayExpress41 Atlas home page A ‘Conditions’ only query

ArrayExpress42 Atlas heatmap view

Atlas gene-condition query ArrayExpress43

Atlas advanced search ArrayExpress44

Atlas advanced search ArrayExpress45

Atlas advanced search ArrayExpress46

ArrayExpress47 Data submission to AE

ArrayExpress48 Data submission to AE

Submission of HTS gene expression data Submit via MAGE-TAB submission route Submit: MAGE-TAB spreadsheet containing details of the samples and protocols used. Trace data files for each sample (in SRF, FASTQ or SFF format ) Processed data files For non-human species we will supply your SRF or FASTQ files to the European Nucleotide Archive (ENA). If you have human identifiable sequencing data you need to submit to the The European Genome-phenome Archive and not ArrayExpress. They will supply you with a suitable template for submission and store human identifiable data securely. ArrayExpress49

Types of data that can be submitted ArrayExpress50

What happens after submission? confirmation Curation The curation team will review your submission and will you with any questions. Possible reopening for editing We will send you an accession number when all the required information has been provided. We will load your experiment into ArrayExpress and provide you with a reviewer login for viewing the data before it is made public. ArrayExpress51

Find out more Visit our eLearning portal, Train online, at for courses on ArrayExpress and Atlas us at: Atlas mailing list: ArrayExpress52