Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,

Similar presentations


Presentation on theme: "Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,"— Presentation transcript:

1 Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c, Paul Kellam c abc

2 Introduction Many different clustering algorithms used for gene expression analysis Little work on inter-method consistency or cross-comparison Important due to differing results (each algorithm implicitly forces a structure on data) Obtaining a consensus across methods should improve confidence

3 The Talk Compare a number of existing methods for clustering gene expression data Algorithms for generating robust clusters and consensus clusters Tested on a set of Amersham Scorecard data with known structure and experimentally obtained virus B-Cell data Provides specific advantages in the analysis of array based gene expression data

4 Clustering Methods Hierarchical Clustering (R) PAM (R) CAST (C++) Simulated Annealing (C++)

5 Datasets Amersham Scorecard –597 genes, 24 blocks with 32 columns and 12 rows under 30 experimental conditions –Repeated experiments which we assume should cluster together B Cell Data –1987 genes

6 Comparison of Methods

7 The Agreement Matrix

8 Robust Clustering Takes agreement matrix as input Place all genes into robust clusters that have full agreement Deterministic algorithm Should give higher degree of confidence in clusters Not all genes will be assigned

9 Robust Clustering DatasetASC B- cell No. of Robust Clusters 24154 % of variables assigned 79%25% Max. Robust Cluster size 4414 Min. Robust Cluster size 22 Mean Robust Cluster size 10.23.2

10 Consensus Clustering “Full agreement” requirement for robust clusters can be too restrictive Algorithm for generating consensus clusters given minimum agreement parameter Approximate stochastic algorithm

11 Consensus Clustering Agreement Matrix Consensus Clusters Input Cluster Results

12 Consensus Clustering ASC Dataset B-Cell Dataset

13 Consensus Clustering

14

15 Summary Clustering biological data is very useful Biases in clustering algorithms can mean success in identification of patterns vary Consensus algorithms used in protein secondary structure prediction We apply similar strategy with robust and consensus clustering

16 Conclusions Robust clusters good for identifying common transcriptional modules Also for identifying genes with common functional pathway Useful for creating clusters of genes with high confidence Can be restrictive in discarding genes that do not have full agreement.

17 Conclusions Consensus clustering relaxes full agreement requirement Resembles defined clusters in synthetic data very well Reliably picks out features in the virus gene expression data Fulfils desire not to rely on one clustering algorithm during gene expression analysis

18 Acknowledgements The Biotechnology and Biological Sciences Research Council (BBSRC), UK The Engineering and Physical Sciences Research Council (EPSRC), UK


Download ppt "Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,"

Similar presentations


Ads by Google