Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1.

Similar presentations


Presentation on theme: "Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1."— Presentation transcript:

1 Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

2 Contents 1.Introduction to Flow Cytometry 2.The Problem 3.Current Approaches & Results 4.Future Work 2

3 Flow Cytometry MEDICAL TECHNIQUE USED FOR CELL COUNTING AND CELL SORTING 3

4 How it Works Picture from: Abcam http://www.abcam.com/index.html?pageconfig=resource&rid=11446 4

5 Flow Cytometry Application  Determine whether a person has b-cell lymphoma  Based on the number of clusters that result from flow cytometry Two clusters : cancer patient Three clusters : healthy individual 5

6 Example: Flow Cytometry Results 6 Cancer PatientHealthy Patient

7 Problems with Current Methods  The process for determining if there are two or three clusters is manual  Doctors’ time could be better spent on other tasks 7

8 The Problem CREATING AN AUTOMATED METHOD TO DETERMINING THE NUMBER OF CLUSTERS 8

9 Past Approaches  Many ways to determine number of clusters Most need to know the number of clusters ahead of time  Most popular is k-means, but there are some problems Need to give the algorithm the number of clusters beforehand Has difficulty when clusters are close, different sizes, etc. 9

10 Further Defining the Problem We want to be able to determine the number of clusters when:  The distance between clusters is very small  The ratio of cluster sizes is large (100:1 to 1000:1) We decided to further constrain the problem such that we could determine:  1 cluster vs 2 clusters when the size ratio was up to 1000:1 10

11 Current Approaches & Results 11

12 Two Approaches Approach #1: Transformation  Find the center of the data  Take each point and find its angle from the horizontal line located at the center (new x-value) and distance from the center (new y-value)  Use transformed data to determine number of clusters Approach #2: Testing Normal Fit  Project 2D data onto line to create 1D data  Apply normal distribution fit  Compare the Bayesian Information Criterion (BIC) of the fit to a cut-off limit  If the BIC is above the limit, there are two clusters; otherwise, there is one 12

13 Approach #1: Transformation 13

14 Approach #1: Transformation 14

15 Approach #1: Transformation Process 15

16 Approach #1: Transformation 16

17 Approach #2: Testing Normal Fit 17

18 Approach #2: Testing Normal Fit 3 standard deviations apart, ratio 1:99 ONE CLUSTER BEST FITSTWO CLUSTER BEST FITS 18

19 Approach #2: Testing Normal Fit  Comparing BIC of the one cluster versus two clusters  All data was generated using 100000 points and the same standard deviations  The ratios between clusters and distance between two clusters (if applicable) was varied Ratios: 199:1 to 63:1 Distance: 1.5 to 5 Standard Deviations apart 19

20 Approach #2: Testing Normal Fit 20  Comparing BIC of the one cluster versus two clusters  All data was generated using 100000 points and the same standard deviations  The ratios between clusters and distance between two clusters (if applicable) was varied Ratios: 199:1 to 63:1 Distance: 1.5 to 5 Standard Deviations apart

21 Future Work 21

22 Future Work  Approach #1: Determine if there is a way to detect the second cluster in the transformation  Approach #2: Use real data to see if a cut-off can be determined  Overall: After figuring out how to distinguish one and two clusters, extend the method to two versus three clusters 22

23 Limitations  Assume the data will have Gaussian distribution  Number of clusters limited to two or three 23

24 Acknowledgements I would like to thank my research advisor, Dr. Stephen Huang, and Mitch Shih for their guidance on this project. I would also like to thank the University of Houston Computer Science Department and the National Science Foundation for providing me with the opportunity to participate in the REU. 24


Download ppt "Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1."

Similar presentations


Ads by Google