Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
Clustering II.
Modelling and Identification of dynamical gene interactions Ronald Westra, Ralf Peeters Systems Theory Group Department of Mathematics Maastricht University.
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.
1 A Framework for Modelling Short, High-Dimensional Multivariate Time Series: Preliminary Results in Virus Gene Expression Data Analysis Paul Kellam 1,
Clustering II.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Mutual Information Mathematical Biology Seminar
Evaluation and optimization of clustering in gene expression data analysis A. Fazel Famili, Ganming Liu and Ziying Liu National Research Council of Canada.
Microarray GEO – Microarray sets database
1 Grouping Multivariate Time Series Variables: Applications to Chemical Process and Visual Field Data Allan Tucker- Birkbeck College Stephen Swift- Brunel.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Learning Dynamic Bayesian Networks with Changing Dependencies Allan Tucker Xiaohui Liu IDA 2003.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Sai Moturu. Introduction Current approaches to microarray data analysis –Analysis of experimental data followed by a posterior process where biological.
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
The AutoSimOA Project Katy Hoad, Stewart Robinson, Ruth Davies Warwick Business School OR49 Sept 07 A 3 year, EPSRC funded project in collaboration with.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Who am I and what am I doing here? Allan Tucker A brief introduction to my research
Fuzzy K means.
Explaining Multivariate Time Series to Detect Early Problem Signs Architectures and Efficient Learning Algorithms for Dynamic Bayesian Networks Allan Tucker,
Making the Most of Small Sample High Dimensional Micro-Array Data Allan Tucker, Veronica Vinciotti, Xiaohui Liu; Brunel University Paul Kellam; Windeyer.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Probabilistic methods for phylogenetic trees (Part 2)
Lorena Postiglione, M. Biomedical Eng. Tutor: Dr. Diego di Bernardo XXIX Cycle – 1 st year presentation Towards Microfluidics-based Automatic Control of.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Health and CS Philip Chan. DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Community Detection by Modularity Optimization Jooyoung Lee
Turing Clusters into Patterns: Rectangle-based Discriminative Data Description Byron J. Gao and Martin Ester IEEE ICDM 2006 Adviser: Koh Jia-Ling Speaker:
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Scenario 6 Distinguishing different types of leukemia to target treatment.
1 Adaptive Control Neural Networks 13(2000): Neural net based MRAC for a class of nonlinear plants M.S. Ahmed.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
1 Genetic Algorithms K.Ganesh Introduction GAs and Simulated Annealing The Biology of Genetics The Logic of Genetic Programmes Demo Summary.
Latin Square Designs KNNL – Sections Description Experiment with r treatments, and 2 blocking factors: rows (r levels) and columns (r levels)
Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Cluster validation Integration ICES Bioinformatics.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Genetic algorithms: A Stochastic Approach for Improving the Current Cadastre Accuracies Anna Shnaidman Uri Shoshani Yerach Doytsher Mapping and Geo-Information.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
A stochastic scheduling algorithm for precedence constrained tasks on Grid Future Generation Computer Systems (2011) Xiaoyong Tang, Kenli Li, Guiping Liao,
Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie,
Cluster Analysis II 10/03/2012.
Principal Component Analysis (PCA)
Multivariate Statistical Methods
Latin Square Designs KNNL – Sections
Boltzmann Machine (BM) (§6.4)
Clustering.
Presentation transcript:

Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c, Paul Kellam c abc

Introduction Many different clustering algorithms used for gene expression analysis Little work on inter-method consistency or cross-comparison Important due to differing results (each algorithm implicitly forces a structure on data) Obtaining a consensus across methods should improve confidence

The Talk Compare a number of existing methods for clustering gene expression data Algorithms for generating robust clusters and consensus clusters Tested on a set of Amersham Scorecard data with known structure and experimentally obtained virus B-Cell data Provides specific advantages in the analysis of array based gene expression data

Clustering Methods Hierarchical Clustering (R) PAM (R) CAST (C++) Simulated Annealing (C++)

Datasets Amersham Scorecard –597 genes, 24 blocks with 32 columns and 12 rows under 30 experimental conditions –Repeated experiments which we assume should cluster together B Cell Data –1987 genes

Comparison of Methods

The Agreement Matrix

Robust Clustering Takes agreement matrix as input Place all genes into robust clusters that have full agreement Deterministic algorithm Should give higher degree of confidence in clusters Not all genes will be assigned

Robust Clustering DatasetASC B- cell No. of Robust Clusters % of variables assigned 79%25% Max. Robust Cluster size 4414 Min. Robust Cluster size 22 Mean Robust Cluster size

Consensus Clustering “Full agreement” requirement for robust clusters can be too restrictive Algorithm for generating consensus clusters given minimum agreement parameter Approximate stochastic algorithm

Consensus Clustering Agreement Matrix Consensus Clusters Input Cluster Results

Consensus Clustering ASC Dataset B-Cell Dataset

Consensus Clustering

Summary Clustering biological data is very useful Biases in clustering algorithms can mean success in identification of patterns vary Consensus algorithms used in protein secondary structure prediction We apply similar strategy with robust and consensus clustering

Conclusions Robust clusters good for identifying common transcriptional modules Also for identifying genes with common functional pathway Useful for creating clusters of genes with high confidence Can be restrictive in discarding genes that do not have full agreement.

Conclusions Consensus clustering relaxes full agreement requirement Resembles defined clusters in synthetic data very well Reliably picks out features in the virus gene expression data Fulfils desire not to rely on one clustering algorithm during gene expression analysis

Acknowledgements The Biotechnology and Biological Sciences Research Council (BBSRC), UK The Engineering and Physical Sciences Research Council (EPSRC), UK