Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.

Similar presentations


Presentation on theme: "A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The."— Presentation transcript:

1 A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The title of my thesis is Supervisors: Dr. M. Kamel Dr. M. M. A. Salama

2 Highlight Proteomic Pattern Analysis for Prostate Cancer Detection
Significance Test-Based Feature Selection (STFS): STFS can be generally used for any problems of supervised pattern recognition Very good performances have been obtained on several benchmark datasets, especially with a large number of features STFS Neural Networks ROC analysis The work in the thesis consists of two parts. One part is to develop a novel feature selection based on statistical ST, that is STFS. The other part uses the STFS proposed in the first part to conduct the PPA for.... The system consists three parts:... STFS used for feature selection, NN used to classification, ROC analysis used for optimization of the system The system has obtained high sen and spe. The result of pattern analysis suggests the mistaken diagnosis by biopsy Sensitivity 97.1%, Specificity 96.8% Suggestion of mistaken label by prostatic biopsy

3 Outline of Part I Significance Test-Based Feature Selection (STFS) on Supervised Pattern Recognition Introduction Methodology Experiment Results on Benchmark Datasets Comparison with MIFS The 1st part is STFS on supervised pattern recognition, in the part, I will talk about the ...

4 Introduction Problems on Features Increasing computational complexity
Large number Irrelevant Noise Correlation Features, as the input data of classifiers, is important on pattern recognition system. In real-world problems, usually there are very large number of features, and many features in the initial feature set are irrelevant to the classification task and correlated with each other, which will increasing the computational complexity and reduce the recognition rate as well. Reducing recognition rate

5 Mutual Information Feature Selection
One of most important heuristic feature selection methods, it can be very useful in any classification systems. But estimation of the mutual information is difficult: Large number of features and the large number of classes Continuous data

6 Problems on Feature Selection Methods
Two key issues: Computational complexity Optimal deficiency Two key problems, one is the computational complexity, and the other is the selected features can not achieved desired pattern recognition rate.

7 Proposed Method Criterion of Feature Selection Significance of feature
Significant difference = X Independence The proposed FS method is based on this equation. the feature significance is used as the criterion of feature selection, and it is estimated by the product of significant difference between classes and independent. The significant difference is used to estimated the pattern separability in individual candidate features. The independence is used to quantified the redundancy in selected feature subset. Pattern separability on individual candidate features Noncorrelation between candidate feature and already-selected features

8 Measurement of Pattern Separability of Individual Features
Statistical Significant Difference Continuous data with normal distribution Continuous data with non-normal distribution or rank data Categorical data Chi-square test Two classes More than two classes Two classes More than two classes t-test ANOVA Mann-Whitney test Kruskal-Wallis test

9 Independence Independence Continuous data with normal distribution
Continuous data with non-normal distribution or rank data Categorical data Pearson contingency coefficient Pearson correlation Spearman rank correlation

10 Selecting Procedure MSDI: Maximum Significant Difference and Independence Algorithm MIC: Monotonically Increasing Curve Strategy

11 Maximum Significant Difference and Independence (MSDI) Algorithm
Compute the significance difference (sd) of every initial features Select the feature with maximum sd as the first feature Computer the independence level (ind) between every candidate feature and the already-selected feature(s) MSDI the feature with maximum significant difference as the first feature, And every new feature is added in the feature subset by maximum the product of significant difference and independence. Select the feature with maximum feature significance (sf = sd x ind) as the new feature

12 Monotonically Increasing Curve (MIC) Strategy
Performance Curve The feature subset selected by MSDI 1 Plot performance curve 0.8 Rate of recognition Delete the features that have “no good” contribution to the increasing of recognition 0.6 The features subset selected by MSDI may still have some “not good” features. This is performance curve of feature subset selected by MSDI, y-axis is the rate of recognition, x-axis is the number of features used. There are some bad features, when these features are used, the rate of recognition will decrease. 0.4 10 20 30 Number of features Until the curve is monotonically increasing

13 Example I: Handwritten Digit Recognition
Every image has 32-by-32 bitmaps, they are divided into 8X8=64 blocks, that is 64 features 32-by-32 bitmaps are divided into 8X8=64 blocks The pixels in each block is counted Thus 8x8 matrix is generated, that is 64 features

14 Performance Curve Battiti’s MIFS: It is need to determined β
MSDI: Maximum Significant Difference and Independence MIFS: Mutual Information Feature Selector Performance Curve 1 MSDI MIFS(β=1.0) MIFS(β=0.8) MIFS(β=0.6) MIFS(β=0.4) MIFS(β=0.2) 0.9 Battiti’s MIFS: 0.8 Rate of recognition 0.7 This is... If 5 values of beita are searched, we can compare the computational time between MSDI and Battiti’s MIFS It is need to determined β 0.6 Random ranking 0.5 0.4 10 20 30 40 50 60 Number of features

15 Computational Complexity
Selecting 15 features from the 64 original feature set MSDI: seconds Battiti’s MIFS: seconds If... Time that they spent are likes these, the MSDI is much computational effective than MIFS (5 vales of β are searched in the range of 0-1)

16 Example II: Handwritten digit recognition
The 649 features that distribute over the following six feature sets: 76 Fourier coefficients of the character shapes, 216 profile correlations, 64 Karhunen-Love coefficients, 240 pixel averages in 2 x 3 windows, 47 Zernike moments, 6 morphological features. Previous example has 64 feature, this example has more 6 hundreds features.

17 MSDI: Maximum Significant difference and independence
MIC: Monotonically Increasing Curve Performance Curve 1 MSDI + MIC 0.8 Random ranking Rate of recognition 0.6 MSDI The curve is not strictly monotonically increasing, so we can use MIC strategy. The performance is improved after MIC. 0.4 0.2 10 20 30 40 50 Number of features

18 Comparison with MIFS MSDI is much better on large number of features
MSDI: Maximum Significant Difference and Independence MIFS: Mutual Information Feature Selector Comparison with MIFS MSDI is much better on large number of features 1 0.9 MSDI 0.8 MIFS (β=0.2) Rate of recognition 0.7 MIFS (β=0.5) These are the performance curves obtained by MSDI and MIFS, We can see that the 0.6 MIFS is better on small number of features 0.5 0.4 10 20 30 40 50 Number of features

19 Summary on Comparing MSDI with MIFS
MSDI is much more computational effective MIFS need to calculate the pdfs The computational effective criterion (Battiti’s MIFS) still need to determine β MSDI only involves the simple statistical calculation Probability density functions MSDI can select more optimal feature subset from a large number of feature, because it is based on relevant statistical models MIFS is more suitable on small volume of data and small feature subset

20 Outline of Part II Mass Spectrometry-Based Proteomic Pattern Analysis for Detection of Prostate Cancer Problem Statement Methods Feature Classification optimization Results and Discussion

21 Problem Statement Very large number of features
15154 points (features) Very large number of features Electronic and chemical noise Biological variability of human disease Little knowledge in the proteomic mass spectrum

22 The system of Proteomic Pattern Analysis
STFS: Significance Test-Based Feature Selection PNN: Probabilistic Neural Network RBFNN: Radial Basis Function Neural Network Training dataset (initial features > 104) Most significant features selected by STFS Optimization of the size of feature subset and the parameters of classifier by minimizing ROC distance RBFNN / PNN learning Trained neural classifier Mature classifier

23 Feature Selection: STFS
Significance of feature Significant difference MSDI = x Independence Student Test Pearson correlation MIC STFS: Significance Test-Based Feature Selection MSDI: Maximum Significant Difference and Independence Algorithm MIC: Monotonically Increasing Curve Strategy

24 Classification: PNN / RBFNN
PNN is a standard structure with four layers RBFNN is a modified four-layer structure x y yd 1 S1 x y(1) x1 2 Pool 1 x 3 x2 y(2) The fourth layer is to make pattern decision x n xn Pool 2 S2 PNN: Probabilistic Neural Network RBFNN: Radial Basis Function Neural Network

25 Optimization: ROC Distance
1 dROC a True positive rate (sensitivity) b Minimizing the ROC distance to optimize: - Feature subset numbers m - Gaussian spread σ - RBFNN pattern decision weight λ The sensitivity or specificity is only one aspect of performance of a classifier. The ROC distance is a reasonable model describing overall performance of pattern recognition. We minimizing the … False positive rate (1-specificity) 1 ROC: Receiver Operating Characteristic

26 Results: Sensitivity and Specificity
Our results 97.1% 96.8% Petricoin (2002) 94.7% 75.9% DRE 55-68% 6-33% This... our results is obviously better than the NCI’s result, we use same data set as Petricoin NCI, This is the accuracy by the standard test currently used in clinics for early detection of prostate cancer. These is high sen and spe would greatly helpful for early and accuracy detection of prostate cancer. PSA 29-80% --

27 Pattern Distribution Pattern recognized by RBFNN Labelled by Biopsies
Non-Cancer Cancer Pattern recognized by RBFNN Pattern Distribution -0.4 -0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 10 20 30 40 50 60 70 True negative 96.8% True positive 97.1% False negative 2.9% False positive 3.2% -0.4 -0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 10 20 30 40 50 60 Non-Cancer Cancer Labelled by Biopsies This is the histogram in third layer of RBFNN. The left part is non-cancer recognized by RBFNN, The right side part is cancer recognized by RBFNN, The upper part is non-cancer labelled by Biopsy,... We can see more than 90% samples are recognized by the classifier. Only about 3% of cancer and non-cancer can not be recognized. Cut-point

28 The possible causes on the unrecognizable samples
The algorithm of the classifier is not able to recognize all the samples The proteomics is not able to provide enough information Prostatic biopsies mistakenly label the cancer

29 Possibility of Mistaken Diagnosis of Prostatic Biopsy
Biopsy has limited sensitivity and specificity Proteomic classifier has very high sensitivity and specificity correlated with biopsy The results of proteomic classifier are not exactly the same as biopsy All unrecognizable sample are outliers -0.4 -0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 10 20 30 40 50 60 70 True non-cancer False non-cancer -0.4 -0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 10 20 30 40 50 60 All the samples that can not be recognized are apart from their main part of histograms. Especially, there is a non-cancer sample is very far away from the center value of non-cancer, and even runs over to the other side of cancer center value. False cancer True cancer Cut-point

30 Why Accuracy of Biopsy is limited?
Limited sensitivity (to detection cancer) Biopsy is impossible to reach all areas of prostate, and the small sample volume will never exactly represent the entire organ. 83.3% for the sextant biopsies. Limited Specificity (to detection non-cancer) Biopsy may detect low-volume tumours (clinically insignificant prostate cancer) that may not threaten a man's future health. 97.3% if cutoff volume > 2 cc considered as cancer.

31 Summary (1) Significance Test-Based Feature Selection (STFS):
STFS selects features by maximum significant difference and independence (MSDI), it aims to determine minimum possible feature subset to achieve maximum recognition rate Feature significance (selecting criterion ) is estimated based on the optimal statistical models in accordance with the properties of the data Advantages: Computationally effective Optimality

32 Summary (2) Proteomic Pattern Analysis for Detection of Prostate Cancer The system consists of three parts: feature selection by STFS, classification by PNN/RBFNN, optimization and evaluation by minimum ROC distance Sensitivity 97.1%, Specificity 96.8%, it would be an asset to early and accurately detect prostate, and to prevent a large number of aging men from undergoing unnecessary prostatic biopsies Suggestion of mistaken label by prostatic biopsy through pattern analysis may lead to a novel direction in the diagnostic research of prostate cancer

33 Thanks for your time Questions?


Download ppt "A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The."

Similar presentations


Ads by Google