Presentation is loading. Please wait.

Presentation is loading. Please wait.

Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Similar presentations


Presentation on theme: "Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,"— Presentation transcript:

1 Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data
Parisa Shooshtari School of Computing Science, Simon Fraser University, Burnaby Brinkman’s Lab, Terry Fox Laboratory, BC Cancer Agency, Vancouver

2 Outline: Flow Cytometry (FCM) Data Clustering of FCM data
Spectral Clustering Faithful Sampling for Spectral Clustering Result Summary

3 Basics of Flow Cytometry Technique
Sample Int-1 MHC-II MHC-II Intensity MHC-II CD-11c Wave Length CD-11c Intensity Int-2 MHC-II Int-2 Int-1 MHC-II CD-11c Wave Length

4 Cell Population Identification in Flow Cytometry (FCM)
Parameter 2 Parameter 1 X% Parameter 3 Parameter 4 Now think that this cell is just one of thousands of cells flowing pass through a tube one cell at a time. These cells can be differentiated using the fluorescence intensity indicating, for example, presence or absence of a particular cell surface protein. CLICK Here each dot represent individual cell. Axes indicate intensity at different wavelengths. A gate can then be drawn to select a particular subset of cell population with common intensities. Further sub-setting can be done based on 1-D and 2-D projections of data Adapted from the Science Creative Quarterly (2)

5 Importance of FCM Data Clustering
Manual Gating is Subjective Error-prone Time-Consuming It ignores the multi-variation nature of the data Analyzing large size FCM data sets (with up to 19 dimensions and 1000,000 points) is impractical without the aim of automated techniques

6 Which Clustering Algorithm Is Suitable?
Model-Based algorithms like FlowClust, FlowMerge and FLAME are not suitable for non-elliptical shape clusters. A Good Clustering FlowMerge GFP

7 Our Motivation for Using Spectral Clustering
Spectral clustering does not require any priori assumption on cluster size, shape or distribution It is not sensitive to outliers, noise and shape of clusters

8 Spectral Clustering in One Slide
Represent data sets by a similarity graph Construct the Graph: Vertices: data points p1, p2, …, pn Weights of edges: similarity values Si, j as Clustering: Find a cut through the graph Define a cut objective function Solve it

9 The Bottleneck of Spectral Clustering
Serious empirical barriers when applying this algorithm to large datasets Time complexity: O(n3) ---- > 2 years for 300,000 data points (cells) Required memory: O(n2) ---- > 5 terabytes for 300,000 data points (cells)

10 Faithful Sampling: Our Solution for Applying Spectral Clustering to Large Data
Uniform Sampling: Low density populations close to dense ones may not remain distinguishable Faithful Sampling: Tends to choose more samples from non-dense parts of the data.

11 How Does Our Faithful Sampling Preserve Information?
Space Uniform Sampling: It preserves low-density parts of the data by selecting more samples from them compared to the uniform sampling. Keeping the list of points in neighbourhood of samples: This will be used to define similarities between communities.

12 Clustering Result Low density populations surrounded by dense ones

13 Clustering Result Populations with Non-elliptical Shapes
Subpopulations of a major population SamSPECTRAL flowMerge FLAME

14 Dependency of SamSPECTRAL Results to Scaling Factor (σ)
Monocytes Dendritic Cells σ = 100 σ = 200 B Cells σ = 300 σ = 400

15 Block Diagram of Clustering Ensemble Method
σ1 σ2 σr SamSPECTRAL SamSPECTRAL SamSPECTRAL Build New Feature Vectors Compute Similarities Between Categorical Feature Vectors SamSPECTRAL for Categorical Data Final Results

16 Results After Applying Clustering Ensemble Method
CD14 MHC-II Final Result after Applying Clustering Ensemble Method Manual Gating Monocytes Monocytes CD14 B Cells B Cells Dendritic Cells Dendritic Cells MHC-II

17 Advantages of Using Clustering Ensemble Method
No need for manual setting of initial parameters Higher quality and stability of clustering results F-measure between manual gating and original SamSPECTRAL is in average 0.77 (sd=0.07) F-measure between manual gating and our clustering ensemble method is 0.91

18 Summary Spectral clustering can now be applied to large size data by our proposed Faithful (Information Preserving) sampling. This sampling method can be used in combination with other graph-based clustering algorithms with different objective functions to reduce size of the data. We have shown that SamSPECTRAL has advantage over model-based clusterings in identification of Cell populations with non-elliptical shapes Low-density populations surrounded by dense ones Sub-populations of a major population

19 Acknowledgement Committee: Co-authors on SamSPECTRAL Data Providers
Dr. Arvind Gupta Dr. Ryan Brinkman Dr. Tobias Kollman Co-authors on SamSPECTRAL Habil Zare Data Providers Connie Eaves Peter Landsdrop Keith Humphries

20 Thanks for Your Attention!

21 Cell Population Identification in Flow Cytometry (FCM)
Parameter 2 Parameter 1 X% Parameter 3 Parameter 4 Now think that this cell is just one of thousands of cells flowing pass through a tube one cell at a time. These cells can be differentiated using the fluorescence intensity indicating, for example, presence or absence of a particular cell surface protein. CLICK Here each dot represent individual cell. Axes indicate intensity at different wavelengths. A gate can then be drawn to select a particular subset of cell population with common intensities. Further sub-setting can be done based on 1-D and 2-D projections of data Adapted from the Science Creative Quarterly (2)

22 SamSPECTRAL Algorithm

23 SamSPECTRAL Algorithm

24


Download ppt "Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,"

Similar presentations


Ads by Google