Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
Cluster analysis for microarray data Anja von Heydebreck.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Microarray GEO – Microarray sets database
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Computational Biology, Part 12 Expression array cluster analysis Robert F. Murphy, Shann-Ching Chen Copyright  All rights reserved.
Interactive Exploration of Hierarchical Clustering Results HCE (Hierarchical Clustering Explorer) Jinwook Seo and Ben Shneiderman Human-Computer Interaction.
Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.
Introduction to Bioinformatics - Tutorial no. 12
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.
Fuzzy K means.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
NaviCell Web Service Data visualization tutorial.
More Analysis of Gene Expression Data Brent D. Foy, Ph.D. Wright State University.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Using geWorkbench: Hierarchical & SOM Clustering Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of.
Gene expression analysis
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
A B Supporting Information Figure S1: Distribution of the density of expression intensities for the complete microarray dataset (A) and after removal of.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Expression profiling & functional genomics Exercises.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Image Processing Intro2CS – week 6 1. Image Processing Many devices now have cameras on them Lots of image data recorded for computers to process. But.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
CellExpress Tutorial A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
Cluster Analysis II 10/03/2012.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Figure 1. Exploring and comparing context-dependent mutational profiles in various cancer types. (A) Mutational profiles of pan-cancer somatic mutations,
Image from Gene-Chips (Micorrrays) Statistics for microarray analysis (SMA)
Cluster analysis of 50 genes identified as affecting variability and or pheromone response output Cluster analysis of 50 genes identified as affecting.
Supplementary Figure 1. A Case no. #1 #2 #3 #5 #6 #7 #8 #9 #10 #11 #13
Tracking CD40 signaling during germinal center development
Figure S1: Gene importance plot derived from Variable/ Feature selection using machine learning on the training dataset. MeanDecreaseGini is the measure.
Gene expression analysis
Topological overlap matrix (TOM) plots of weighted, gene coexpression networks constructed from one mouse studies (A–F) and four human studies including.
Arjun Pennathur, MD, Liqiang Xi, MD, Virginia R. Litle, MD, William E
(A) Hierarchical clustering was performed to identify groups of patients with similar RNASeq expression of 20 genes associated with reduced survivability.
LR LS SR SS RR RS Cluster T7 Cluster T6 Cluster T4 Cluster T1
StatQuest!
Bettina Heidecker et al. BTS 2016;1:
Cluster analysis and pathway-based characterization of differentially expressed genes and proteins from integrated proteomics. Cluster analysis and pathway-based.
Gene Expression Profiles of Cutaneous B Cell Lymphoma
Hierarchical clustering analysis of 7785 genes (genes with a log-ratio variation in the 25th centile and >5% missing data were excluded) (A) A heat map.
Cancer Cell Line Encyclopedia
Functional classification and visualization of differentially expressed genes. Functional classification and visualization of differentially expressed.
Pancreatic adenocarcinoma, chronic pancreatitis, and normal pancreas samples can be distinguished on the basis of gene expression profiling. Pancreatic.
One-way hierarchical cluster analysis of SAM-identified genes using the TMEV software to see the data substructure. One-way hierarchical cluster analysis.
Mutational load and mutations in the interferon signaling pathway among patients with advanced melanoma with or without response to anti–PD-1 blockade.
Presentation transcript:

Cluster Analysis Hierarchical and k-means

Expression data Expression data are typically analyzed in matrix form with each row representing a gene and each column representing a chip or sample.

Expression data We represent the data matrix by the symbol X and denote the data as follows:

Clustering on transposition of X

Filtering The first step in analyzing microarray data is to filter out genes that are not expressed or do not show variation across sample types. –always remove from the analyses the rows corresponding to genes that were not expressed on any of the chips. –For example, if gene chips are used to analyze tumor and normal tissues, the two groups can be compared using t-statistics calculated for each gene.

Normalization for Clustering Normalizing a gene across samples is accomplished by subtracting from each expression level the mean of the expression levels for that gene and then dividing by the standard deviation of that gene. Calculate the mean and standard deviation of the gene of interest:

Normalized expression values

Distance Measures

Distance Matrix

Hierarchical Clustering Average Linkage Algorithm (unweighted centroid clustering)

Example: A distance matrix of 4 genes the first step merges genes A and B whose distance is The distances are updated as follows: –Replace the two genes A and B by the midpoint (AB) between them and recalculate the distance of gene C to this midpoint (d(AB, C) = 2.85) and gene D to this midpoint (d(AB, D) = 4.81). Note that d(C, D) = 2.7 is unchanged.

Differences between clustering methods For example, in Figure 3A the first merging clustered genes A and B and the distance of this new cluster to gene D was d(AB, D) = For single linkage, the distance would be d(AB, D) = 4.74 and for complete linkage the distance would be d(AB, D) = 5.

Heat Maps The heat map presents a grid of colored points where each color represents a gene expression value in the sample.

Heat Map Example The grid coordinates correspond to the sample by gene combinations. In this case, the columns (samples) are tumors, some from patients who have relapsed and some from patients who have not relapsed. The rows represent 348 genes found to distinguish the patients according to their relapse status. Ordering determined by hierarchical clustering

Software for Clustering and HeatMaps Eisen first has developed a powerful clustering and visualization tool for microarray data You can download it from the following website

Cluster Clusters filtered microarray datasets using different methods. Need to upload data (rows, genes; columns conditions; gene expression values)

Cluster

Adjust Data

Cluster Data

TreeView To visualize the clustering result as a heatmap. Load the.cdt file created by Cluster package and visualize coexpressed genes (red upregulated and green down regulated in the condition of interest; median centered dataset)