Presentation is loading. Please wait.

Presentation is loading. Please wait.

Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Similar presentations


Presentation on theme: "Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,"— Presentation transcript:

1 Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University, Burnaby Brinkman’s Lab, Terry Fox Laboratory, BC Cancer Agency, Vancouver

2 Outline: Flow Cytometry (FCM) Data Clustering of FCM data Spectral Clustering Faithful Sampling for Spectral Clustering Result Summary

3 Basics of Flow Cytometry Technique 3 Sample Wave Length Intensity MHC-II CD-11c Int-1 Int-2 CD-11c MHC-II Int-1Int-2

4 Cell Population Identification in Flow Cytometry (FCM) X% Adapted from the Science Creative Quarterly (2) Parameter 3 Parameter 4 Parameter 2 Parameter 1

5 Importance of FCM Data Clustering Manual Gating is – Subjective – Error-prone – Time-Consuming – It ignores the multi-variation nature of the data Analyzing large size FCM data sets (with up to 19 dimensions and 1000,000 points) is impractical without the aim of automated techniques

6 Which Clustering Algorithm Is Suitable? Model-Based algorithms like FlowClust, FlowMerge and FLAME are not suitable for non-elliptical shape clusters. 6 FlowMerge A Good Clustering GFP

7 Our Motivation for Using Spectral Clustering Spectral clustering does not require any priori assumption on cluster size, shape or distribution It is not sensitive to outliers, noise and shape of clusters 7

8 Spectral Clustering in One Slide Represent data sets by a similarity graph Construct the Graph: Vertices: data points p 1, p 2, …, p n Weights of edges: similarity values S i, j as Clustering: Find a cut through the graph Define a cut objective function Solve it

9 The Bottleneck of Spectral Clustering Serious empirical barriers when applying this algorithm to large datasets Time complexity: O(n 3 ) ---- > 2 years for 300,000 data points (cells) Required memory: O(n 2 ) ---- > 5 terabytes for 300,000 data points (cells) 9

10 Faithful Sampling: Our Solution for Applying Spectral Clustering to Large Data Uniform Sampling: Low density populations close to dense ones may not remain distinguishable 10 Faithful Sampling: Tends to choose more samples from non-dense parts of the data.

11 How Does Our Faithful Sampling Preserve Information? 1.Space Uniform Sampling: 1.Space Uniform Sampling: It preserves low-density parts of the data by selecting more samples from them compared to the uniform sampling. 2.Keeping the list of points in neighbourhood of samples: 2.Keeping the list of points in neighbourhood of samples: This will be used to define similarities between communities.

12 Clustering Result Low density populations surrounded by dense ones

13 Clustering Result Populations with Non-elliptical Shapes Subpopulations of a major population 13 SamSPECTRAL flowMergeFLAME

14 Dependency of SamSPECTRAL Results to Scaling Factor (σ) 14 σ = 100σ = 200 σ = 300 σ = 400 Monocytes Dendritic Cells B Cells

15 Block Diagram of Clustering Ensemble Method 15 SamSPECTRAL σ1σ1 σ2σ2 σrσr..... Build New Feature Vectors Compute Similarities Between Categorical Feature Vectors SamSPECTRAL for Categorical Data Final Results

16 Results After Applying Clustering Ensemble Method 16 CD14 MHC-II Manual Gating CD14 MHC-II Final Result after Applying Clustering Ensemble Method Monocytes Dendritic Cells B Cells Monocytes Dendritic Cells B Cells

17 Advantages of Using Clustering Ensemble Method No need for manual setting of initial parameters Higher quality and stability of clustering results – F-measure between manual gating and original SamSPECTRAL is in average 0.77 (sd=0.07) – F-measure between manual gating and our clustering ensemble method is

18 Summary Spectral clustering can now be applied to large size data by our proposed Faithful (Information Preserving) sampling. This sampling method can be used in combination with other graph-based clustering algorithms with different objective functions to reduce size of the data. We have shown that SamSPECTRAL has advantage over model-based clusterings in identification of – Cell populations with non-elliptical shapes – Low-density populations surrounded by dense ones – Sub-populations of a major population

19 Acknowledgement Committee: – Dr. Arvind Gupta – Dr. Ryan Brinkman – Dr. Tobias Kollman Co-authors on SamSPECTRAL – Habil Zare Data Providers – Connie Eaves – Peter Landsdrop – Keith Humphries

20 Thanks for Your Attention!

21 Cell Population Identification in Flow Cytometry (FCM) X% Adapted from the Science Creative Quarterly (2) Parameter 3 Parameter 4 Parameter 2 Parameter 1

22 SamSPECTRAL Algorithm 22

23 SamSPECTRAL Algorithm 23

24


Download ppt "Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,"

Similar presentations


Ads by Google