Presentation is loading. Please wait.

Presentation is loading. Please wait.

ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL

Similar presentations


Presentation on theme: "ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL"— Presentation transcript:

1 ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL gabry@ebi.ac.uk

2 What is functional genomics (FG)? The aim of FG is to understand the function of genes and other parts of the genome FG experiments typically utilize genome-wide assays to measure and track many genes (or proteins) in parallel under different conditions High-throughput technologies such as microarrays and high-throughput sequencing (HTS) are frequently used in this field to interrogate the transcriptome 2ArrayExpress

3 What biological questions is FG addressing? When and where are genes expressed? How do gene expression levels differ in various cell types and states? What are the functional roles of different genes and in what cellular processes do they participate? How are genes regulated? How do genes and gene products interact? How is gene expression changed in various diseases or following a treatment? 3ArrayExpress

4 Components of a FG experiment ArrayExpress4

5 FG public repositories: ArrayExpress  Is a public repository for FG data, which provides easy access to well annotated data in a structured and standardized format  Serves the scientific community as an archive for data supporting publications, together with GEO at NCBI and CIBEX at DDBJ  Facilitates the sharing of experimental information associated with the data such as microarray designs, experimental protocols,……  Based on community standards: MIAME guidelines & MAGE-TAB format for microarray, MINSEQE guidelines for HTS data (http://www.mged.org/minseqe/) ArrayExpress5

6 Community standards for data requirement  MIAME = Minimal Information About a Microarray Experiment  MINSEQE = Minimal Information about a high-throughput Nucleotide SEQuencing Experiment  The checklist: ArrayExpress6 RequirementsMIAMEMINSEQE 1. Experiment design / background description 2. Sample annotation and experimental factor 3. Array design annotation (e.g. probe sequence) 4. All protocols (wet-lab bench and data processing) 5. Raw data files (from scanner or sequencing machine) 6. Processed data files (normalised and/or transformed)

7 MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray or HTS experiments IDFInvestigation Description Format file, contains top-level information about the experiment including title, description, submitter contact details and protocols. SDRFSample and Data Relationship Format file contains the relationships between samples and arrays, as well as sample properties and experimental factors, as provided by the data submitter. ADF Array Design Format file that describes probes on an array, e.g. sequence, genomic mapping location (for array data ony) Data filesRaw and processed data files. The ‘raw’ data files are the trace data files (.srf or.sff). Fastq format files are also accepted, but SRF format files are preferred. The trace data files that you submit to ArrayExpress will be stored in the European Nucleotide Archive (ENA). The processed data file is a ‘data matrix’ file containing processed values, e.g. files in which the expression values are linked to genome coordinates.European Nucleotide Archive Standards for microarray & sequencing MAGE-TAB format ArrayExpress7

8 8 ArrayExpress – two databases

9 What is the difference between them? ArrayExpress Archive Central object: experiment Query to retrieve experimental information and associated data Expression Atlas Central object: gene/condition Query for gene expression changes across experiments and across platforms ArrayExpress9

10 ArrayExpress – two databases ArrayExpress10

11 ArrayExpress Archive – when to use it? Find FG experiments that might be relevant to your research Download data and re-analyze it. Often data deposited in public repositories can be used to answer different biological questions from the one asked in the original experiments. Submit microarray or HTS data that you want to publish. Major journals will require data to be submitted to a public repository like ArrayExpress as part of the peer-review process. ArrayExpress11

12 How much data in AE Archive? (as of mid-September 2012) ArrayExpress12

13 HTS data in AE Archive (as of mid-September 2012) Microarray vs HTS RNA-, DNA-, ChIP- seq breakdown ArrayExpress13

14 ArrayExpress www.ebi.ac.uk/arrayexpress/ ArrayExpress14

15 Browsing the AE Archive The direct link to raw and processed data. An icon indicates that this type of data is available. The total number of experiments and assay retrieved Species investigated Curated title of experiment The date when the data were loaded in the Archive AE unique experiment ID Number of assays The list of experiments retrieved can be printed, saved as Tab- delimited format or exported to Excel or as RSS feed loaded in Atlas flag Raw sequencing data available in ENA ArrayExpress15

16 Browsing the AE Archive ArrayExpress16

17 Searching AE with the Experimental factor ontology (EFO)  Application focused ontology modeling the relationship between experimental factors (EFs) in AE  Developed to: increase the richness of annotations that are currently made in AE Archive to promote consistency to facilitate automatic annotation and integrate external data  EFs are transformed into an ontological representation, forming classes and relationships between those classes  Combine terms from a subset of well-maintained and compatible ontologies, e.g. Gene Ontology, NCBI Taxonomy ArrayExpress17

18 Building EFO An example sarcoma cancer neoplasm disease Kaposi’s sarcoma Take all experimental factors sarcoma cancer neoplasm Kaposi’s sarcoma disease is the parent term is a type of disease is synonym of neoplasm is a type of cancer is a type of sarcoma Find the logical connection between them disease neoplasm cancer sarcoma Kaposi’s sarcoma [-] Organize them in an ontology ArrayExpress18

19 Exploring EFO An example ArrayExpress19 More information at: http://www.ebi.ac.uk/efo

20 Searching AE Archive Simple query ArrayExpress20

21 AE Archive query output Matches to exact terms are highlighted in yellow Matches to synonyms are highlighted in green Matches to child terms in the EFO are highlighted in pink ArrayExpress21

22 AE Archive – experiment view ArrayExpress22

23 SDRF file – sample & data relationship ArrayExpress23

24 ArrayExpress – two databases ArrayExpress24

25 Expression Atlas – when to use it? Find out if the expression of a gene (or a group of genes with a common gene attribute, e.g. GO term) change(s) across all the experiments available in the Expression Atlas; Discover which genes are differentially expressed in a particular biological condition that you are interested in. ArrayExpress 25

26 Array (platform) designs relating to the experiment must be provided. Probe annotation must be adequate to enable re- annotation of external references (e.g. Ensembl gene ID, Uniprot ID) At least 3 replicates for each value of the experimental factor Maximum 4 experimental factors Adequate sample annotation using EFO terms Presence of data files: CEL raw data files for Affymetrix assays, processed data files for non-Affymetrix ones Expression Atlas construction Experiment selection criteria during curation ArrayExpress26

27 Expression Atlas construction Analysis pipeline genes Cond.1Cond.2Cond.3 Linear model* (Bio/C Limma ) Cond.1 Cond.2 Cond.3 Input data (Affy CEL, non-Affy processed) 1= differentially expressed 0 = not differentially expressed Output: 2-D matrix * More information about the statistical methodology: http://nar.oxfordjournals.org/content/38/suppl_1/D690.full http://nar.oxfordjournals.org/content/38/suppl_1/D690.full ArrayExpress27

28 Expression Atlas construction Analysis pipeline “Is gene X differentially expressed in condition 1 in this experiment?” Cond.1 mean Cond.2 mean Cond.3 mean Mean of all samples = a single expression value for gene X Compare and calculate statistic ArrayExpress28

29 genes Cond.1Cond.2Cond.3 Exp.1 genes Cond.4Cond.5Cond.6 Exp. 2 genes Cond.XCond.YCond.Z Exp. n Statistical test Statistical test Statistical test Each experiment has its own “verdict” or “vote” on whether a gene is differentially expressed or not under a certain condition ArrayExpress29 Expression Atlas construction

30 Summary of the “verdicts” from different experiments ArrayExpress30

31 Expression Atlas ArrayExpress31

32 Atlas home page http://www.ebi.ac.uk/gxa/ Query for genes Query for conditions Restrict query by direction of differential expression The ‘advanced query’ option allows building more complex queries ArrayExpress32

33 Atlas home page The ‘Genes’ and ‘Conditions’ search boxes ArrayExpress33

34 Atlas home page A single gene query ArrayExpress34

35 Atlas gene summary page

36 Atlas experiment page ArrayExpress36

37 Atlas experiment page – HTS data ArrayExpress37

38 Atlas home page A ‘Conditions’ only query ArrayExpress38

39 Atlas heatmap view ArrayExpress39

40 Atlas gene-condition query ArrayExpress40

41 Atlas advanced search ArrayExpress41

42 Atlas advanced search ArrayExpress42

43 Atlas advanced search ArrayExpress43

44 A glimpse of what’s coming… “Differential atlas” “Is gene X differentially expressed in condition 1 in this experiment?” Cond.1 mean Cond.2 mean Cond.3 mean Mean of all samples = a single expression value for gene X Compare and calculate statistic ArrayExpress44

45 A glimpse of what’s coming… “Differential atlas” mock-up (1) ArrayExpress45

46 A glimpse of what’s coming… “Differential atlas” mock-up (2) ArrayExpress46

47 Gene expression in normal tissues, not looking for differentially expressed genes based on different conditions E.g. “Give me all the genes expressed in normal human kidney” Can also filter genes by expression level (e.g. FPKM values) Start with Illumina Body Map 2.0 RNA-seq data 16 tissues: adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells We are working on something similar for mouse A glimpse of what’s coming… “Baseline atlas” ArrayExpress47

48 A glimpse of what’s coming… “Baseline atlas” mock-up display ArrayExpress48

49 Data submission to AE ArrayExpress49

50 Data submission to AE www.ebi.ac.uk/microarray/submissions.html ArrayExpress50

51 Submission of HTS data ArrayExpress acts as a “broker” for submitter. Meta-data and processed data: ArrayExpress Raw sequence reads* (e.g. fastq, bam): ENA *See http://www.ebi.ac.uk/ena/about/sra_data_format for accepted read file formathttp://www.ebi.ac.uk/ena/about/sra_data_format ArrayExpress51

52 Find out more Visit our eLearning portal, Train online, at http://www.ebi.ac.uk/training/online/ for courses on ArrayExpress and Atlas http://www.ebi.ac.uk/training/online/ Watch this short YouTube video on how to navigate the MAGE-TAB submission tool: http://youtu.be/KVpCVGpjw2Yhttp://youtu.be/KVpCVGpjw2Y Email us at: miamexpress@ebi.ac.ukmiamexpress@ebi.ac.uk Atlas mailing list: arrayexpress-atlas@ebi.ac.ukarrayexpress-atlas@ebi.ac.uk ArrayExpress52


Download ppt "ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL"

Similar presentations


Ads by Google