Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.

Similar presentations


Presentation on theme: "Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress."— Presentation transcript:

1 Tutorial 8 Clustering 1

2 General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress Tools –EPCLUST –Mev 2

3 Microarray - Reminder 3

4 Expression Data Matrix Each column represents all the gene expression levels from a single experiment. Each row represents the expression of a gene across all experiments. Exp1Exp 2Exp3Exp4Exp5Exp6 Gene 1-1.2-2.1-3-1.51.82.9 Gene 22.70.2-1.11.6-2.2-1.7 Gene 3-2.51.5-0.1-1.10.1 Gene 42.92.62.5-2.3-0.1-2.3 Gene 50.12.62.22.7-2.1 Gene 6-2.9-1.9-2.4-0.1-1.92.9 4

5 Expression Data Matrix Each element is a log ratio: log 2 (T/R). T - the gene expression level in the testing sample R - the gene expression level in the reference sample Exp1Exp 2Exp3Exp4Exp5Exp6 Gene 1-1.2-2.1-3-1.51.82.9 Gene 22.70.2-1.11.6-2.2-1.7 Gene 3-2.51.5-0.1-1.10.1 Gene 42.92.62.5-2.3-0.1-2.3 Gene 50.12.62.22.7-2.1 Gene 6-2.9-1.9-2.4-0.1-1.92.9 5

6 Microarray Data Matrix Black indicates a log ratio of zero, i.e. T=~R Green indicates a negative log ratio, i.e. T<R Red indicates a positive log ratio, i.e. T>R Grey indicates missing data 6

7 Exp Log ratio Exp Log ratio Microarray Data: Different representations T<R T>R 7

8 8 A real example ~500 genes 3 knockdown conditions To complicate to analyze without “help”

9 Microarray Data: Clusters 9

10 How to determine the similarity between two genes? (for clustering) Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, 1499 - 1501 (2005), http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html 10

11 Unsupervised Clustering Hierarchical Clustering 11

12 genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram). 1 6 3 5 2 4 1 6 3 52 4 12 Leaves (shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes. Hierarchical Clustering

13 13 If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four). Hierarchical clustering finds an entire hierarchy of clusters.

14 Hierarchical clustering result 14 Five clusters

15 An algorithm to classify the data into K number of groups. 15 K=4 K-means Clustering

16 How does it work? 16 The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters. 1234 k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). k clusters are created by associating every observation with the nearest mean The centroid of each of the k clusters becomes the new means. Steps 2 and 3 are repeated until convergence has been reached.

17 17 Different types of clustering – different results

18 18 How to search for expression profiles GEO (Gene Expression Omnibus) http://www.ncbi.nlm.nih.gov/geo/ Human genome browser http://genome.ucsc.edu/ ArrayExpress http://www.ebi.ac.uk/arrayexpress/

19 19

20 Datasets - suitable for analysis with GEO tools Expression profiles by gene Microarray experiments Probe sets Groups of related microarray experiments 20 Searching for expression profiles in the GEO

21 Download dataset Clustering Statistic analysis 21

22 Clustering analysis 22

23 Download dataset Clustering Statistic analysis 23

24 24 The expression distribution for different lines in the cluster

25 25

26 Searching for expression profiles in the Human Genome browser. 26

27 Keratine 10 is highly expressed in skin 27

28 28 http://www.ebi.ac.uk/arrayexpress/ ArrayExpress

29 29

30 30 What can we do with all the expression profiles? Clusters! How? EPCLUST http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

31 31

32 32

33 33

34 34

35 35

36 36

37 Edit the input matrix: Transpose,Normalize,Randomize 37 Hierarchical clustering K-means clustering In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

38 38 Clusters Data

39 Edit the input matrix: Transpose,Normalize,Randomize 39 Hierarchical clustering K-means clustering In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

40 Graphical representation of the cluster Samples found in cluster 40

41 10 clusters, as requested 41

42 42 http://www.tm4.org/mev/ Multi experiment viewer


Download ppt "Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress."

Similar presentations


Ads by Google