Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved classification of medical data Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : R.E. Abdel-Aal 2005. BI.456-468

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2  Motivation  Objective  Method  Material  Results  Conclusions Outline

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  Accuracy is very important in classifiers used for medical application.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective Improved classification performance of medical data.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Method  First stage – ranked feature ─ GMDH algorithm z1z1 Z m(m-1)/2 1. representation 2.Selection and stopping x1x1 x2x2 x3x3 x4x4 y An increasing r min : model becoming complex, 1.Overfitting the estimation data 2.Performing poorly on the new selection data. Iteration Square error r12r12 r m(m-1) 2 r min r22r22

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Method  First stage – ranked feature ─ AIM abductive network 2.Selection and stopping 1.repesentation  First stage – ranked feature ─ AIM abductive network 2.Selection and stopping Avoid overfitting Using CPM control 1.CPM>1,simpler model that are less accurate but generalize. 2.CPM<1,complex model, overfit training data and decrease actual prediction performance.

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Method  Second stage – selected feature ─ Selected k, performance on an evaluation dataset would first improve and starts to deteriorate due to the model overfitting the training data. ─ A compact m-feature subset can be obtained by taking the first m features starting from top of the ranking list. Ex: ranking list{2,6,7,8,1,5,3,4,9}, selected 6-features is {2,6,7,8,1,5}. ─ The optimum subset of features is determined by repeatedly forming subset of k features, starting from the top of the ranking list. Ex: ranking list{2,6,7,8,1,5,3,4,9}, {2,6,7,8,1,5},{6,7,8,1,5,3}… 中選出最佳的 subset

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Material  Two standard medical diagnosis datasets from the UCI Machine Learning Repository were used for this study. ─ Wisconsin breast cancer dataset ─ Cleveland heart disease dataset 70% 30%

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Results  The breast cancer data ─ Ranking for the feature set {2,6,7,8,1,5,3,4,9} 7 5 9 Feature selectedFeature ranked

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Results Rough set data analysis of dataset Overfitting 3%

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Results Standard error↓ AUC↑ 3%

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Results  The heart disease data ─ Ranking for the feature set {13,12,9,3,2,10,8,4,5,11,1,7,6} Feature selectedFeature ranked

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Results 3%6% Overfitting

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Results AUC↑ Requires less than half the number of input features Models using the reduced feature set will be more efficient.

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Conclusions Improved implementation and performance of classifiers for medical screening and diagnosis. Feature reduction is particularly useful with high-dimensional data characterized by a large number of feature and a relatively few training example.

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 My opinion Advantage: Preprocess Disadvantage: Apply : Clustering, Association Rule……


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved."

Similar presentations


Ads by Google