Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.

Tutorial 8 Gene expression analysis 1

How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering –Tools for clustering - EPCLUST Functional analysis –Go annotation –DAVID 2

Gene expression data sources 3 MicroarraysRNA-seq experiments

How to interpret an expression data matrix Each column represents all the gene expression levels from a single sample. Each row represents the expression of a gene across all experiments. Sample 1Sample 2Sample 3Sample 4Sample 5Sample 6 Gene 1-1.2-2.1-3-1.51.82.9 Gene 22.70.2-1.11.6-2.2-1.7 Gene 3-2.51.5-0.1-1.10.1 Gene 42.92.62.5-2.3-0.1-2.3 Gene 50.11.92.62.22.7-2.1 Gene 6-2.9-1.9-2.4-0.1-1.92.9 4

Raw data pre-processing Raw data – the data values that we get from the microarray/ sequencer. Raw values are a general term used for the raw measurements made by an instrument. In microarrays the raw data is probe intensities. In sequencing the raw data is counts per gene. Raw data will almost always need to undergo some kind of processing in order to be in adequate quality and have a biological meaning. –For example high throughput sequencing raw data are the sequenced reads. They need to get mapped to the genome, possibly filtered, and then variant calling is done. 5

6 Expression profiles DBs GEO (Gene Expression Omnibus) http://www.ncbi.nlm.nih.gov/geo/ Human genome browser http://genome.ucsc.edu/ ArrayExpress http://www.ebi.ac.uk/arrayexpress/

7 The current rate of submission and processing is over 10,000 samples per month. In 2002 Nature journals announce requirement for microarray data deposit to public databases.

8 Searching for expression profiles in the GEO http://www.ncbi.nlm.nih.gov/geo/

GEO accession IDs GPL**** - platform ID GSM**** - sample ID GSE**** - series ID GDS**** - dataset ID A Series record defines a set of related samples considered to be part of a group. A GDS record represents a collection of biologically and statistically comparable GEO samples. Not every experiment has a GDS. 9

Download dataset Clustering Statistical analysis 10

Raw data (soft file) 11... Probes Genes Expression values per sample (GSM) Gene annotations

Clustering analysis 12 Zoom in

Clustering analysis – zoom in 13

14 Clustering analysis – zoom in

Viewing the expression levels 16

17 Viewing the expression levels

Clustering Grouping together genes with a similar signature 19

This clustering method is based on distances between expression profiles of different genes. Genes with similar expression patterns are grouped together. 20 Hierarchical Clustering

21 In both phylogenetic trees and in clustering we create a tree based on distance matrix. When computing phylogenetic trees: We compute distances between sequences. When computing clustering dendograms we compute distances between expression values. ATCTGTCCGCTCG ATGTGTGCGCTTG Expr.1Expr.2Expr.3Expr.4Expr.5Expr.6 Gene 1 Gene 2 Rings a bell?... Score

22 Hierarchical clustering methods produce a tree or a dendrogram. They avoid specifying how many clusters are appropriate. The partitions are obtained from cutting the tree at diﬀerent levels. 2 clusters 4 clusters 6 clusters

23 The more clusters you want the higher the similarity is within each cluster. http://discoveryexhibition.org/pmwiki.php /Entries/Seo2009

Hierarchical clustering results 24 http://www.spandidos- publications.com/10.3892/ijo.2012.1644 You can cluster both samples and genes (separately)

An algorithm to classify the data into K number of groups. 25 K=4 Unsupervised Clustering – K-means clustering

How does it work? 26 The algorithm iteratively divides the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters. 1 k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). 2 k clusters are created by associating every observation with the nearest mean 3 The centroid of each of the k clusters becomes the new means. 4 Steps 2 and 3 are repeated until convergence has been reached.

27 How should we determine K? Trial and error Take K as square root of gene number

28 http://www.bioinf.ebc.ee/EP/EP/EPCLUST/ Tool for clustering - EPclust

30 Choose distance metric Choose algorithm

31 Hierarchical clustering

32 Zoom in by clicking on the nodes

34 K-means clustering

Graphical representation of the cluster Samples found in cluster 35

10 clusters, as requested 36

Now that we have clusters – we want to know what is the function of each group. There is a need for some kind of generalization for gene functions. 37 Now what?

Gene Ontology (GO) http://www.geneontology.org/ The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains: Biological process Cellular component Molecular function

39 Cellular Component (CC) - the parts of a cell or its extracellular environment. Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis. Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. Gene Ontology (GO)

The GO tree – a partial example

DAVID Functional Annotation Bioinformatics Microarray Analysis Identify enriched biological themes, particularly GO terms Discover enriched functional-related gene/protein groups http://david.abcc.ncifcrf.gov/

ID conversion annotation

Functional annotation - upload 44 Gene list you want to explore (for example all the genes in a certain cluster) What is the identifier? (probes/ gene names/ gene IDs) You can supply a background list as well

Functional annotation - results 45 Different kinds of enrichments are calculated

Genes from your list involved in this category Charts for each category Functional annotation - results

Minimum number of genes for corresponding term Maximum EASE score/ E-value Genes from your list involved in this category P-Value Enriched terms associated with your genes Source of term Adjusted P-Value

Gene expression analysis 48 How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering –Tools for clustering - EPCLUST Functional analysis –Go annotation –DAVID

Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.

Similar presentations

Presentation on theme: "Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.

Similar presentations

Presentation on theme: "Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering."— Presentation transcript:

Similar presentations

About project

Feedback