Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

Basic Gene Expression Data Analysis--Clustering
Gene Shaving – Applying PCA Identify groups of genes a set of genes using PCA which serve as the informative genes to classify samples. The “gene shaving”
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.
The Broad Institute of MIT and Harvard Clustering.
University at BuffaloThe State University of New York Interactive Exploration of Coherent Patterns in Time-series Gene Expression Data Daxin Jiang Jian.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Gene Expression Chapter 9.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Basis State Prediction of Cell-Cycle Transcription Factors in Saccharomyces cerevisiae Dr. Matteo Pellegrini Dr. Shawn Cokus Sherri Rose UCLA Molecular,
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Gene expression Guy Nimrod.
Clustering Algorithms Bioinformatics Data Analysis and Tools
Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush.
Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.
Cluster analysis  Function  Places genes with similar expression patterns in groups.  Sometimes genes of unknown function will be grouped with genes.
Introduction to Bioinformatics - Tutorial no. 12
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Gene Expression Analysis using Microarrays Anne R. Haake, Ph.D.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Chapter 7 Essential Concepts in Molecular Pathology Companion site for Molecular Pathology Author: William B. Coleman and Gregory J. Tsongalis.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Sarah Carratt and Carmen Castaneda Department of Biology Loyola Marymount University BIOL 398/MATH 388 March 24, 2011 Cold Adaption in Budding Yeast Babette.
More on Microarrays Chitta Baral Arizona State University.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Finish up array applications Move on to proteomics Protein microarrays.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Gene expression analysis
Analysis of the yeast transcriptional regulatory network.
Searching for structured motifs in the upstream regions of hsp70 genes in Tetrahymena termophila. Roberto Marangoni^, Antonietta La Terza*, Nadia Pisanti^,
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Lecture 7. Functional Genomics: Gene Expression Profiling using
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 4 Clustering Algorithms Bioinformatics Data Analysis and Tools
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Sudhakar Jonnalagadda and Rajagopalan Srinivasan
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Unsupervised Learning
Clustering Manpreet S. Katari.
Gene expression.
Bioinformatics tools to identify structured motifs in the upstream regions of stress-response-involved genes in Tetrahymena thermophila Antonietta La Terza*,
Cold Adaptation in Budding Yeast
Cold Adaption in Budding Yeast
Loyola Marymount University
Cold Adaptation in Budding Yeast
Cluster Analysis in Bioinformatics
GPX: Interactive Exploration of Time-series Microarray Data
Andrew D Basehoar, Sara J Zanton, B.Franklin Pugh  Cell 
Volume 9, Issue 5, Pages (May 2002)
Unsupervised Learning
Presentation transcript:

Gene Expression Clustering

The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Microarray Technology

Microarray - standard laboratory technique. Information about gene expression. Tens of thousands of data points. Analyze by computational methods.

Gene Clustering To cluster genes means to group together genes with similarity in their expression patterns.

Why do we need to cluster genes? Unknown gene function. Common regulatory elements. Pathways and biological processes. Defining new disease subclasses. Predict categorization of new samples. Data reduction and visualization.

Gene Clustering Clustering methods can be divided into two major groups: Supervised clustering –classify according to previous knowledge (group prediction). Unsupervised clustering – no previous knowledge is used (pattern discovery).

Unsupervised clustering In many cases we have little a-priory knowledge about genes. There are many different methods of unsupervised clustering. We will present Hierarchical clustering.

The Method

Hierarchical clustering All data instances start in their own clusters. Two most closely related clusters are merged. Repeated until a single cluster remains. Arranges the data into a tree structure Can be broken into the desired number of clusters.

Hierarchical clustering The raw data Chip20 … Chip2Chip1Gene x 1,20 … x 1,2 x 1,1 1 x 2,20 … x 2,2 x 2,1 2 x 3,20 … x 3,2 x 3, x 12000,20 … x 12000,2 x 12000,1 12,000

Hierarchical clustering Normalized data

Hierarchical clustering Calculate the Distance Matrix Euclidean distance formula: Correlation coefficient (  ): A B C

Hierarchical clustering Calculate the Distance Matrix Average linkage - midpoint. Single linkage – smallest distance. Complete linkage - largest distance.

Hierarchical clustering Calculate the Distance Matrix Chip2Chip1Gene A B C CBA A B C

Hierarchical clustering Average Linkage Algorithm DCBA A B C D CBDA

Hierarchical clustering Average Linkage Algorithm DCAB AB C D CDBA

Hierarchical clustering Average Linkage Algorithm CDAB AB CD DCBA

Hierarchical clustering dendogram DCBA

Hierarchical clustering heat maps red corresponding to high expression levels green corresponding to low expression levels black corresopnding to intermediate expression levels.

Hierarchical clustering Experiment Control Random 1 – randomized by rows. Random 2 – randomized by columns. Random 3 – randomized by both rows and columns.

Examples

Example I We present here an experiment of Spellman et al that was published in Mol. Biol. Cell 9, (1998). Goals of the experiment: Identify all cell cycle regulated genes in Yeast. Show clustering at work.

Example I Cell Cycle

Example I Methods DNA microarrays contained all the yeast genome. Measure levels of mRNA as a function of time.

Example I Methods Synchronization:  factor. Elutriation – size based. Cdc15 – heat mutation. Factors: cln3p, clb2p deletation. induced with these factors. Data from a previously published study (Cho et al. 1998) Control sample: asynchronous cultures.

Example I Methods Measurements analyzed based on: Fourier algorithm - assesses periodicity. Correlation measurement - compared with previously identified cell cycle regulated genes.

Example I Methods Calculate a score for each gene - "CDC score". Threshold CDC value. 91% of the genes previously shown to be cell cycle regulated are included. About 800 genes were identified as cell cycle regulated.

Example I Phasing By time of peak expression:

By similarity of expression across the measurements: Example I Clustering

Hierarchical clustering. Identified 9 clusters. Genes in each cluster share: Common upstream elements Regulation by similar transcription factors. Common function (only for known genes). Cln3p and clb2p has the same effect on the genes in a cluster.

Example I Clustering Histone cluster: A very tight cluster. Repeated SCB motif in promoter. Induced by Cln3. Unaffected by Clb2. Peak during S phase.

Example I Results Genes with known functionality: Cell cycle regulated functions The MET cluster. Genes involved in secretion and lipid synthesis. Known genes discovered as cell cycle regulated.

Example I Results New binding sites for regulators. The CLB cluster is highly regulated. Aligning the genes in the cluster. New consensus for MCM1+SFF binding site.

Example I Results MCM1:T-T-A-C-C-N-A-A-T-T-N-G-G-T-A-A SFF: G-T-M-A-A-C-A-A New motif: T-T-W-C-C-Y-A-A-W-N-N-G-G-W-A-A-W-W-N-R- T-A-A-A-Y-A-A

Example II Gasch AP. et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000; 11(12): Main Goal: Characterize the yeast response to environmental changes, and particularly to stress conditions.

Example II Methods Yeast cells responding to diverse environmental stresses. Microarray contained all yeast genes. Results were organized by hierarchical clustering.

Example II General features of the stress response Massive and rapid changes. Transient changes. Correlated with the magnitude of the shift: Duration Amplitude Steady-state difference.

Example II General features of the stress response Some genes responded in a stereotypical manner. Some genes had unique response. No two expression programs were identical.

Example II The Environmental Stress Response (ESR) About 900 genes responded in a stereotypical manner. ESR – Environmental Stress Response. Two large clusters of genes: repressed genes (~ 600) induced genes (~ 300) Showed reciprocal response.

Example II The Environmental Stress Response (ESR) Response to different shift in: Temperature Osmolarity.

osmolarityHeat shock Example II The Environmental Stress Response (ESR) The ESR is not a response to all environmental changes.

Example II The Environmental Stress Response (ESR) Shift between two equally stressful environments: 29 o C and hyper-osmotic medium. 33 o C with normal osmolarity. sum of the responses. Independent response to each of the changes.

Example II The Environmental Stress Response (ESR) Previously known: STRE promoter. Recognized by Msn2p and Msn4p. One all-purpose regulatory system ?

Example II The Environmental Stress Response (ESR) TRX2 cluster genes: Dependent on Msn2/Msn4p in response to heat shock. Unaffected from Msn2/Msn4p in response to H 2 O 2. Contained binding site for Yap1p. Yap1p deletion strain.

Example II The Environmental Stress Response (ESR) Revealed that TRX2 cluster genes: Induced by Yap1p in response to H 2 O 2 treatment Unaffected by the deletion in response to heat shock. ESR regulated by different transcription factors. Regulation is condition-specific and gene- specific.

Example II Specific Response Response to stress: Stereotypic response (ESR). Specific response. Character cell’s response to specific stress. Example: Heat-shock response ESR initiated fast (minutes). Induction of chaperones. Alternative carbon source utilization.

Conclusions

Hierarchical clustering Conclusion Difficulty: Post transcriptional regulation. Solution: Use the method in cases the main regulation is in transcription level (example – Yeast cell cycle).

Hierarchical clustering Conclusion Difficulty: No statistical foundation for the decision of where to cut the dendogram. Solution: Split a tree in such a way which will produce clusters of genes with homogeneity. Such a split is considered to be evidence that the grouping was correct.

Hierarchical clustering Conclusion Difficulty: The algorithm will produce clusters in any case. Solution: Introduces a small amount of random to the data, re-cluster the data and compare the results to the original clustering. If the results are the same, then the clustering is not representing true biological meaning.

Hierarchical clustering Conclusion Discover gene’s function. Status of cellular processes. Information on regulatory mechanisms. General cell behaviors. Assign genes to pathways. Unknown biological pathways.

References Eisen M. B., Spellman P. T., Brown R. O., Botstein D. Cluster analysis and display of genome-wide expression pattern. Proc. Natl. Acad. Sci. USA, 95: , 1998 Spellman, P.T. et al. Comprehensive identification of cell cycle- regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, (1998). Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000; 11(12): Shannon William, Culverhouse Robert, Duncan Jill. Analyzing microarray data using cluster analysis. Pharmacogenomics, 2003, 4(1): Review. Kaminski Naftali, Friedman Nir. Practical Approaches to Analyzing Results of Microarray Experiments. American Journal of Respiratory and Cell Molecular Biology, 2002, 27: Reviwe.