Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.

Similar presentations


Presentation on theme: "Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering."— Presentation transcript:

1 Tutorial 8 Gene expression analysis 1

2 How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering –Tools for clustering - EPCLUST Functional analysis –Go annotation –DAVID 2

3 Gene expression data sources 3 MicroarraysRNA-seq experiments

4 How to interpret an expression data matrix Each column represents all the gene expression levels from a single sample. Each row represents the expression of a gene across all experiments. Sample 1Sample 2Sample 3Sample 4Sample 5Sample 6 Gene 1-1.2-2.1-3-1.51.82.9 Gene 22.70.2-1.11.6-2.2-1.7 Gene 3-2.51.5-0.1-1.10.1 Gene 42.92.62.5-2.3-0.1-2.3 Gene 50.11.92.62.22.7-2.1 Gene 6-2.9-1.9-2.4-0.1-1.92.9 4

5 Raw data pre-processing Raw data – the data values that we get from the microarray/ sequencer. Raw values are a general term used for the raw measurements made by an instrument. In microarrays the raw data is probe intensities. In sequencing the raw data is counts per gene. Raw data will almost always need to undergo some kind of processing in order to be in adequate quality and have a biological meaning. –For example high throughput sequencing raw data are the sequenced reads. They need to get mapped to the genome, possibly filtered, and then variant calling is done. 5

6 6 Expression profiles DBs GEO (Gene Expression Omnibus) http://www.ncbi.nlm.nih.gov/geo/ Human genome browser http://genome.ucsc.edu/ ArrayExpress http://www.ebi.ac.uk/arrayexpress/

7 7 The current rate of submission and processing is over 10,000 samples per month. In 2002 Nature journals announce requirement for microarray data deposit to public databases.

8 8 Searching for expression profiles in the GEO http://www.ncbi.nlm.nih.gov/geo/

9 GEO accession IDs GPL**** - platform ID GSM**** - sample ID GSE**** - series ID GDS**** - dataset ID A Series record defines a set of related samples considered to be part of a group. A GDS record represents a collection of biologically and statistically comparable GEO samples. Not every experiment has a GDS. 9

10 Download dataset Clustering Statistical analysis 10

11 Raw data (soft file) 11... Probes Genes Expression values per sample (GSM) Gene annotations

12 Clustering analysis 12 Zoom in

13 Clustering analysis – zoom in 13

14 14 Clustering analysis – zoom in

15 15

16 Viewing the expression levels 16

17 17 Viewing the expression levels

18 18

19 Clustering Grouping together genes with a similar signature 19

20 This clustering method is based on distances between expression profiles of different genes. Genes with similar expression patterns are grouped together. 20 Hierarchical Clustering

21 21 In both phylogenetic trees and in clustering we create a tree based on distance matrix. When computing phylogenetic trees: We compute distances between sequences. When computing clustering dendograms we compute distances between expression values. ATCTGTCCGCTCG ATGTGTGCGCTTG Expr.1Expr.2Expr.3Expr.4Expr.5Expr.6 Gene 1 Gene 2 Rings a bell?... Score

22 22 Hierarchical clustering methods produce a tree or a dendrogram. They avoid specifying how many clusters are appropriate. The partitions are obtained from cutting the tree at different levels. 2 clusters 4 clusters 6 clusters

23 23 The more clusters you want the higher the similarity is within each cluster. http://discoveryexhibition.org/pmwiki.php /Entries/Seo2009

24 Hierarchical clustering results 24 http://www.spandidos- publications.com/10.3892/ijo.2012.1644 You can cluster both samples and genes (separately)

25 An algorithm to classify the data into K number of groups. 25 K=4 Unsupervised Clustering – K-means clustering

26 How does it work? 26 The algorithm iteratively divides the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters. 1 k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). 2 k clusters are created by associating every observation with the nearest mean 3 The centroid of each of the k clusters becomes the new means. 4 Steps 2 and 3 are repeated until convergence has been reached.

27 27 How should we determine K? Trial and error Take K as square root of gene number

28 28 http://www.bioinf.ebc.ee/EP/EP/EPCLUST/ Tool for clustering - EPclust

29 29

30 30 Choose distance metric Choose algorithm

31 31 Hierarchical clustering

32 32 Zoom in by clicking on the nodes

33 33

34 34 K-means clustering

35 Graphical representation of the cluster Samples found in cluster 35

36 10 clusters, as requested 36

37 Now that we have clusters – we want to know what is the function of each group. There is a need for some kind of generalization for gene functions. 37 Now what?

38 Gene Ontology (GO) http://www.geneontology.org/ The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains: Biological process Cellular component Molecular function

39 39 Cellular Component (CC) - the parts of a cell or its extracellular environment. Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis. Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. Gene Ontology (GO)

40 The GO tree – a partial example

41

42 DAVID Functional Annotation Bioinformatics Microarray Analysis Identify enriched biological themes, particularly GO terms Discover enriched functional-related gene/protein groups http://david.abcc.ncifcrf.gov/

43 ID conversion annotation

44 Functional annotation - upload 44 Gene list you want to explore (for example all the genes in a certain cluster) What is the identifier? (probes/ gene names/ gene IDs) You can supply a background list as well

45 Functional annotation - results 45 Different kinds of enrichments are calculated

46 Genes from your list involved in this category Charts for each category Functional annotation - results

47 Minimum number of genes for corresponding term Maximum EASE score/ E-value Genes from your list involved in this category P-Value Enriched terms associated with your genes Source of term Adjusted P-Value

48 Gene expression analysis 48 How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering –Tools for clustering - EPCLUST Functional analysis –Go annotation –DAVID


Download ppt "Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering."

Similar presentations


Ads by Google