Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.

Similar presentations


Presentation on theme: "Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering."— Presentation transcript:

1 Tutorial 7 Gene expression analysis 1

2 Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering –EPCLUST –Mev Functional analysis –Go annotation 2

3 Gene expression data sources 3 MicroarraysRNA-seq experiments

4 Expression Data Matrix Each column represents all the gene expression levels from a single experiment. Each row represents the expression of a gene across all experiments. Exp1Exp 2Exp3Exp4Exp5Exp6 Gene 1-1.2-2.1-3-1.51.82.9 Gene 22.70.2-1.11.6-2.2-1.7 Gene 3-2.51.5-0.1-1.10.1 Gene 42.92.62.5-2.3-0.1-2.3 Gene 50.12.62.22.7-2.1 Gene 6-2.9-1.9-2.4-0.1-1.92.9 4

5 Expression Data Matrix Each element is a log ratio: log 2 (T/R). T - the gene expression level in the testing sample R - the gene expression level in the reference sample Exp1Exp 2Exp3Exp4Exp5Exp6 Gene 1-1.2-2.1-3-1.51.82.9 Gene 22.70.2-1.11.6-2.2-1.7 Gene 3-2.51.5-0.1-1.10.1 Gene 42.92.62.5-2.3-0.1-2.3 Gene 50.12.62.22.7-2.1 Gene 6-2.9-1.9-2.4-0.1-1.92.9 5

6 Expression Data Matrix Black indicates a log ratio of zero, i.e. T=~R Green indicates a negative log ratio, i.e. T<R Red indicates a positive log ratio, i.e. T>R Grey indicates missing data 6

7 Exp Log ratio Exp Log ratio Microarray Data: Different representations T<R T>R 7

8 8 How to search for expression profiles GEO (Gene Expression Omnibus) http://www.ncbi.nlm.nih.gov/geo/ Human genome browser http://genome.ucsc.edu/ ArrayExpress http://www.ebi.ac.uk/arrayexpress/

9 9

10 Datasets - suitable for analysis with GEO tools Expression profiles by gene Microarray experiments Probe sets Groups of related microarray experiments 10 Searching for expression profiles in the GEO

11 Download dataset Clustering Statistic analysis 11

12 Clustering analysis 12

13 Download dataset Clustering Statistic analysis 13

14 14 The expression distribution for different lines in the cluster

15 Searching for expression profiles in the Human Genome browser. 15

16 Keratine 10 is highly expressed in skin 16

17 17 http://www.ebi.ac.uk/arrayexpress/ ArrayExpress

18 18

19 19

20 20

21 21

22 22 How to analyze gene expression data

23 Unsupervised Clustering - Hierarchical Clustering 23

24 genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram). 1 6 3 5 2 4 1 6 3 52 4 24 Leaves (shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes. Hierarchical Clustering

25 How to determine the similarity between two genes? (for clustering) Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, 1499 - 1501 (2005), http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html 25

26 26 If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four). Hierarchical clustering finds an entire hierarchy of clusters.

27 Hierarchical clustering result 27 Five clusters

28 An algorithm to classify the data into K number of groups. 28 K=4 Unsupervised Clustering – K-means clustering

29 How does it work? 29 The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters. 1234 k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). k clusters are created by associating every observation with the nearest mean The centroid of each of the k clusters becomes the new means. Steps 2 and 3 are repeated until convergence has been reached.

30 30 How should we determine K? Trial and error Take K as square root of gene number

31 31 http://www.bioinf.ebc.ee/EP/EP/EPCLUST/ Tools for clustering - EPclust

32 32

33 33

34 34

35 35

36 36

37 37

38 Edit the input matrix: Transpose,Normalize,Randomize 38 Hierarchical clustering K-means clustering In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

39 39 Hierarchical clustering In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

40 40 Clusters Data

41 41 K-means clustering In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

42 Graphical representation of the cluster Samples found in cluster 42

43 10 clusters, as requested 43

44 44 http://www.tm4.org/mev/ Tools for clustering - MeV

45 45 1007_s_at 1053_at 117_at 121_at 1255_g_at 1294_at 1316_at 1320_at 1405_i_at 1431_at 1438_at 1487_at 1494_f_at 1598_g_at What can we learn from clusters? Gene expression function analysis

46 Gene Ontology (GO) http://www.geneontology.org/ The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:

47 47 Cellular Component (CC) - the parts of a cell or its extracellular environment. Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis. Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. Gene Ontology (GO)

48 The GO tree

49 GO sources ISSInferred from Sequence/Structural Similarity IDAInferred from Direct Assay IPI Inferred from Physical Interaction TASTraceable Author Statement NASNon-traceable Author Statement IMPInferred from Mutant Phenotype IGIInferred from Genetic Interaction IEPInferred from Expression Pattern ICInferred by Curator NDNo Data available IEAInferred from electronic annotation

50 Search by AmiGO

51 Results for alpha-synuclein

52 DAVID Functional Annotation Bioinformatics Microarray Analysis Identify enriched biological themes, particularly GO terms Discover enriched functional-related gene/protein groups Cluster redundant annotation terms Explore gene names in batch http://david.abcc.ncifcrf.gov/

53 ID conversion annotation classification

54 Functional annotation Upload Annotation options

55

56 56

57 Gene expression analysis Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering –EPCLUST –Mev Functional analysis –Go annotation 57


Download ppt "Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering."

Similar presentations


Ads by Google