Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 Faculty of Computer Science.

Similar presentations


Presentation on theme: "Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 Faculty of Computer Science."— Presentation transcript:

1 Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 matt.boardman@dal.ca Faculty of Computer Science

2 Discussions Axotomy ERG Data Sets Axotomy ERG Data Sets Classification using Support Vector Machines (SVM) Classification using Support Vector Machines (SVM) Assessing Waveform Significance Assessing Waveform Significance Probability Density Estimation Probability Density Estimation Confidence Measures Confidence Measures

3 Axotomy ERG Data Sets (from F. Tremblay, Retinal Electrophysiology) Data Set A: Data Set A: 19 axotomy subjects, 19 control subjects (total 38) 19 axotomy subjects, 19 control subjects (total 38) time between control & axotomy? time between control & axotomy? Multifocal ERG: 145 data points (mean of all locations) Multifocal ERG: 145 data points (mean of all locations) 1000 Hz (?) sample rate 1000 Hz (?) sample rate Data Set B: Data Set B: 6 axotomy subjects, 8 control subjects (total 14) 6 axotomy subjects, 8 control subjects (total 14) measurements approximately six weeks after axotomy measurements approximately six weeks after axotomy Multifocal ERG: 14,935 data points (103 locations x 145 ms) Multifocal ERG: 14,935 data points (103 locations x 145 ms) Corneal and Optic Nerve readings (control subjects only) Corneal and Optic Nerve readings (control subjects only)

4 Classification using Support Vector Machines SVM use statistical machine learning SVM use statistical machine learning Constrained optimization problem: Constrained optimization problem: Objective: Find a hyperplane which maximizes margin Objective: Find a hyperplane which maximizes margin Higher dimensional mappings provide flexibility Higher dimensional mappings provide flexibility Non-separable data: a “cost” parameter controls the tradeoff between outlier detection and generalization performance Non-separable data: a “cost” parameter controls the tradeoff between outlier detection and generalization performance Non-linear SVM (Polynomial, Sigmoid, Gaussian kernels) Non-linear SVM (Polynomial, Sigmoid, Gaussian kernels)

5 Data Normalization Balanced training data Balanced training data Number of positive samples = number of negative samples Number of positive samples = number of negative samples Data set A is already balanced Data set A is already balanced Keep data set B balanced through combination, i.e. 8 C 6 =28 Keep data set B balanced through combination, i.e. 8 C 6 =28 Independently and identically distributed (iid) data Independently and identically distributed (iid) data Independence not true: Independence not true: e.g. value of point x 17 most likely depends on x 16 e.g. value of point x 17 most likely depends on x 16 Not Identically distributed: Not Identically distributed: e.g. x 26 is always positive (P1 wave), but x 40 is always negative (N2 wave) e.g. x 26 is always positive (P1 wave), but x 40 is always negative (N2 wave) Approximate iid data by subtracting mean from each dimension, then dividing each dimension by its maximum magnitude Approximate iid data by subtracting mean from each dimension, then dividing each dimension by its maximum magnitude results in zero mean for all dimensions, with all values between -1 and +1 results in zero mean for all dimensions, with all values between -1 and +1 No zero-setting necessary! No zero-setting necessary! e.g. subtracting mean tail value does not affect classification accuracy! e.g. subtracting mean tail value does not affect classification accuracy!

6 Parameter Selection for Classification Selection of best gamma (γ) and cost ( c ) values obtained by exhaustive search of log e -space Selection of best gamma (γ) and cost ( c ) values obtained by exhaustive search of log e -space try all possible parameter values, choose best points (red circles) try all possible parameter values, choose best points (red circles) accuracy-weighted centre of mass gives optimal point (green circle) accuracy-weighted centre of mass gives optimal point (green circle) Training / Testing: Training / Testing: 75% / 25% 75% / 25% “Leave one out” “Leave one out” Better searches: Better searches: “3 strikes” “3 strikes” Simulated annealing (?) Simulated annealing (?)

7 Classification Results Data set A (38 samples x 145 data points): Data set A (38 samples x 145 data points):94.7% Data set B (14 samples x 145 data points): Data set B (14 samples x 145 data points):99.4% Data set B (14 samples x 14,935 data points): Data set B (14 samples x 14,935 data points):90.8%

8 Classification Benchmarks How does this method perform on industry-standard classification benchmark data sets? How does this method perform on industry-standard classification benchmark data sets? Wisconsin Breast Cancer Database Wisconsin Breast Cancer Database O.L. Mangasarian, W.H. Wolberg, “Cancer diagnosis via linear programming,” SIAM News, 23(5):1-18, 1990. O.L. Mangasarian, W.H. Wolberg, “Cancer diagnosis via linear programming,” SIAM News, 23(5):1-18, 1990. Iris Plants Database Iris Plants Database R.A. Fisher, “The use of multiple measurements in taxonomic problems,” Annual Eugenics, 7(2):179-88, 1936. R.A. Fisher, “The use of multiple measurements in taxonomic problems,” Annual Eugenics, 7(2):179-88, 1936.

9 Classification Benchmarks Wisconsin: 96.9%, σ=0.18Iris (Class 1 or not): 100.0% Iris (Class 2 or not): 96.9%, σ=0.55Iris (Class 3 or not): 97.1%, σ=0.77

10 Assessing Waveform Significance Which are the most important parts of the waveform, with respect to classification accuracy? Which are the most important parts of the waveform, with respect to classification accuracy? Fisher Ratio Fisher Ratio distance between means over sum of variance (linear) distance between means over sum of variance (linear) Pearson Correlation Coefficients Pearson Correlation Coefficients strength of association between variables (linear) strength of association between variables (linear) Kolmogorov-Smirnoff Kolmogorov-Smirnoff distance between cumulative distributions (non-linear) distance between cumulative distributions (non-linear) Linear SVM Linear SVM classification on one dimension only(linear) classification on one dimension only(linear) Cross-Entropy Cross-Entropy mutual information measure (non-linear) mutual information measure (non-linear) SVM Sensitivity SVM Sensitivity Monte Carlo simulation using SVM (non-linear) Monte Carlo simulation using SVM (non-linear)

11 Comparison of All Measures (Dataset B)

12 Probability Density Estimation Goal: define a measure to show how “sure” the classifier is with the result Goal: define a measure to show how “sure” the classifier is with the result Density Estimation is known to be a “hard” problem Density Estimation is known to be a “hard” problem Generally need large number of samples for accuracy Generally need large number of samples for accuracy Small deviations in sample points have magnified effect Small deviations in sample points have magnified effect How do we estimate a probability distribution? How do we estimate a probability distribution? Best-Fit Gaussian Best-Fit Gaussian Assume Gaussian distribution, find sigmoid that fits best Assume Gaussian distribution, find sigmoid that fits best Kernel Smoothing Kernel Smoothing Part of MATLAB’s Statistics Toolbox Part of MATLAB’s Statistics Toolbox SVM Density Estimation (RSDE method) SVM Density Estimation (RSDE method) Special case of SVM Regression Special case of SVM Regression

13 Comparison of Estimation Techniques

14 Confidence Measures “Support” is the overall distribution of the sample “Support” is the overall distribution of the sample Denote p(x) Denote p(x) Density: H p(x) dx = 1 Density: H p(x) dx = 1 “Confidence” is defined as the posterior probability “Confidence” is defined as the posterior probability Probability that sample x is of class C Probability that sample x is of class C Denote p(C|x) Denote p(C|x) Can we combine these measures somehow? Can we combine these measures somehow?

15 Confidence Measures

16

17

18 References SVM Tutorial (mathematical but practical): SVM Tutorial (mathematical but practical): C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, 2(2):121-67, 1998. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, 2(2):121-67, 1998. SVM Density Estimation (RSDE algorithm): SVM Density Estimation (RSDE algorithm): Mark Girolami, Chao He, “Probability Density Estimation from Optimally Condensed Data Samples,” IEEE Trans. Pattern Analysis and Machine Intelligence, 25(10):1253-64, 2003. Mark Girolami, Chao He, “Probability Density Estimation from Optimally Condensed Data Samples,” IEEE Trans. Pattern Analysis and Machine Intelligence, 25(10):1253-64, 2003. MATLAB versions: MATLAB versions: LIBSVM: LIBSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvmhttp://www.csie.ntu.edu.tw/~cjlin/libsvm SVM light : SVM light : http://svmlight.joachims.org/http://svmlight.joachims.org/ An excellent online SVM demo (Java applet): An excellent online SVM demo (Java applet): http://www.csie.ntu.edu.tw/~cjlin/libsvm/#GUI

19 Data Representation We can represent the input data in many ways: We can represent the input data in many ways: Unprocessed vector (145 dimensions as is) Unprocessed vector (145 dimensions as is) Second order information (first time derivative) Second order information (first time derivative) Third order information (second time derivative) Third order information (second time derivative) Frequency information (Power Spectral Density) Frequency information (Power Spectral Density) Wavelet transforms (Daubechies, Symlet) Wavelet transforms (Daubechies, Symlet) Result: Only small differences in accuracy! Result: Only small differences in accuracy!

20 Data Representation Example: Wavelet representations Example: Wavelet representations i.e. some indications, but nothing statistically significant (±5%) i.e. some indications, but nothing statistically significant (±5%)

21 Cross Entropy

22 SVM Sensitivity Analysis

23 SVM Sensitivity Analysis (Windowed)

24 Comparison of Estimation Techniques

25

26


Download ppt "Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 Faculty of Computer Science."

Similar presentations


Ads by Google