Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene expression analysis

Similar presentations


Presentation on theme: "Gene expression analysis"— Presentation transcript:

1 Gene expression analysis
Tutorial 7 Gene expression analysis

2 Gene expression analysis
How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering - EPCLUST Functional analysis - Go annotation

3 Gene expression data sources
Microarrays RNA-seq experiments

4 How to interpret an expression data matrix
Exp1 /Sample 1 Exp2 /Sample 2 Exp3 /Sample 3 Exp4 /Sample 4 Exp5 /Sample 5 Exp6 /Sample 6 Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9 Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7 Gene 3 -2.5 1.5 -0.1 -1 0.1 Gene 4 2.6 2.5 -2.3 Gene 5 2.2 Gene 6 -2.9 -1.9 -2.4 Each column represents all the gene expression levels from: In two-color array: from a single experiment. In one-color array: from a single sample. Each row represents the expression of a gene across all experiments.

5 How to interpret an expression data matrix
Exp1 /Sample 1 Exp2 /Sample 2 Exp3 /Sample 3 Exp4 /Sample 4 Exp5 /Sample 5 Exp6 /Sample 6 Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9 Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7 Gene 3 -2.5 1.5 -0.1 -1 0.1 Gene 4 2.6 2.5 -2.3 Gene 5 2.2 Gene 6 -2.9 -1.9 -2.4 Each element is a log ratio: In two-color array: log2 (T/R). T - the gene expression level in the testing sample R - the gene expression level in the reference sample In one-color array: log2(X) X - the gene expression level in the current sample

6 How to interpret an expression data matrix
In two-color array: Scale In one-color array: Scale Red indicates a positive log ratio: T>R Bright green indicates a high expression value Black indicates a log ratio of zero: T=~R Green indicates a positive log ratio: T>R Black indicates no expression Expr.1 Expr.2 Expr.3 Expr.4 Expr.5 Expr.6 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Samp 1 Samp 2 Samp 3 Samp 4 Samp 5 Samp 6 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6

7 Different representations
Microarray Data: Different representations T>R Log ratio Log ratio T<R Exp Exp

8 How to analyze gene expression data

9 Expression profiles DBs
GEO (Gene Expression Omnibus) Human genome browser ArrayExpress

10 The current rate of submission and processing is over 10,000 Samples per month.
In 2002 Nature journals announce requirement for microarray data deposit to public databases.

11 Searching for expression profiles in the GEO
*further curated= statistically comparable datasets

12 GEO accession IDs GPL**** - platform ID GSM**** - sample ID
GSE**** - series ID GDS**** - dataset ID A Series record denes a set of related Samples considered to be part of a group. A GDS record represents a collection of biologically and statistically comparable GEO samples. Not every experiment has a GDS.

13 Clustering Statistic analysis Download dataset

14 Clustering analysis

15 Clustering analysis – zoom in

16 Clustering analysis – zoom in

17

18 Viewing the expression levels

19 Viewing the expression levels

20

21 Grouping together “similar” genes
Clustering Grouping together “similar” genes

22 Clustering Unsupervised learning: The classes are unknown a priori and need to be “discovered” from the data. Supervised learning: The classes are predefined and the task is to understand the basis for the classification from a set of labeled objects. This information is then used to classify future observations.

23 Unsupervised Clustering
Hierarchical methods - These methods provide a hierarchy of clusters, from the smallest, where all objects are in one cluster, through to the largest set, where each observation is in its own cluster. Partitioning methods - These usually require the specification of the number of clusters. Then a mechanism for apportioning objects to clusters must be determined.

24 Hierarchical Clustering
This clustering method is based on distances between expression profiles of different genes. Genes with similar expression patterns are grouped together.

25 Rings a bell?... In both phylogenetic trees and in clustering we create a tree based on distances matrix. When computing phylogenetic trees: We compute distances between sequences. When computing clustering dendograms we compute distances between expression values. Expr.1 Expr.2 Expr.3 Expr.4 Expr.5 Expr.6 Gene 1 Gene 2 ATCTGTCCGCTCG ATGTGTGCGCTTG Score Score

26 How to determine the similarity between two genes?
Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, (2005) ,

27 Hierarchical clustering methods produce a tree or a dendrogram.
They avoid specifying how many clusters are appropriate by providing a partition for each K. The partitions are obtained from cutting the tree at different levels. 2 clusters 4 clusters 6 clusters

28 The more clusters you want the higher the similarity is within each cluster.

29 Hierarchical clustering results

30 Unsupervised Clustering – K-means clustering
An algorithm to classify the data into K number of groups. K=4

31 How does it work? 1 2 3 4 The centroid of each of the k clusters becomes the new means. k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). k clusters are created by associating every observation with the nearest mean Steps 2 and 3 are repeated until convergence has been reached. The algorithm iteratively divides the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

32 How should we determine K?
Trial and error Take K as square root of gene number

33 Tool for clustering - EPclust

34

35 Choose distance metric
Choose algorithm

36 Hierarchical clustering

37 Zoom in by clicking on the nodes

38

39 K-means clustering K-means clustering

40 Samples found in cluster
Graphical representation of the cluster Graphical representation of the cluster

41 10 clusters, as requested

42 Now what? Now that we have clusters – we want to know what is the function of each group. There is a need for some kind of generalization for gene functions.

43 Gene Ontology (GO) The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:

44 Gene Ontology (GO) Cellular Component (CC) - the parts of a cell or its extracellular environment. Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis. Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.

45 The GO tree

46 GO sources ISS Inferred from Sequence/Structural Similarity
IDA Inferred from Direct Assay IPI Inferred from Physical Interaction TAS Traceable Author Statement NAS Non-traceable Author Statement IMP Inferred from Mutant Phenotype IGI Inferred from Genetic Interaction IEP Inferred from Expression Pattern IC Inferred by Curator ND No Data available IEA Inferred from electronic annotation

47

48 DAVID http://david.abcc.ncifcrf.gov/
DAVID  Functional Annotation Bioinformatics Microarray Analysis Identify enriched biological themes, particularly GO terms Discover enriched functional-related gene/protein groups Cluster redundant annotation terms Explore gene names in batch 

49 annotation classification ID conversion

50 Functional annotation
Genes from your list involved in this category Upload Charts for each category Charts for each category Charts for each category

51 Minimum number of genes for corresponding term
Maximum EASE score/ E-value Genes from your list involved in this category Genes from your list involved in this category Enriched terms associated with your genes Source of term E-Value

52 A group of terms having similar biological meaning due to sharing similar gene members

53 Gene expression analysis
How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering - EPCLUST Functional analysis - Go annotation


Download ppt "Gene expression analysis"

Similar presentations


Ads by Google