Presentation is loading. Please wait.

Presentation is loading. Please wait.

V ARIANCE R EDUCTION FOR S TABLE F EATURE S ELECTION Presenter: Yue Han Advisor: Lei Yu Department of Computer Science 10/27/10.

Similar presentations


Presentation on theme: "V ARIANCE R EDUCTION FOR S TABLE F EATURE S ELECTION Presenter: Yue Han Advisor: Lei Yu Department of Computer Science 10/27/10."— Presentation transcript:

1 V ARIANCE R EDUCTION FOR S TABLE F EATURE S ELECTION Presenter: Yue Han Advisor: Lei Yu Department of Computer Science 10/27/10

2 O UTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks

3 O UTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks

4 I NTRODUCTION AND M OTIVATION F EATURE S ELECTION A PPLICATIONS D1D1 D2D2 Sports T 1 T 2 ….…… T N 12 0 ….…… 6 DMDM C Travel Jobs … …… Terms Documents 3 10 ….…… 28 0 11 ….…… 16 … Features(Genes or Proteins) Samples Pixels Vs Features

5 I NTRODUCTION AND M OTIVATION F EATURE S ELECTION FROM H IGH - DIMENSIONAL D ATA p : # of features n : # of samples High-dimensional data: p >> n Feature Selection:  Alleviating the effect of the curse of dimensionality.  Enhancing generalization capability.  Speeding up learning process.  Improving model interpretability. Curse of Dimensionality: Effects on distance functions In optimization and learning In Bayesian statistics High-Dimensional Data Feature Selection Algorithm MRMR, SVMRFE, Relief-F, F-statistics, etc. Low-Dimensional Data Learning Models Classification, Clustering, etc. Knowledge Discovery on High-dimensional Data

6 I NTRODUCTION AND M OTIVATION S TABILITY OF F EATURE S ELECTION Training Data Feature Subset Training Data Feature Subset Training Data Feature Subset Feature Selection Method Consistent or not??? Stability of Feature Selection : the insensitivity of the result of a feature selection algorithm to variations to the training set. Training Data Learning Model Training Data Learning Model Training Data Learning Model Learning Algorithm Stability of Learning Algorithm is firstly examined by Turney in 1995 Stability of feature selection was relatively neglected before and attracted interests from researchers in data mining recently. Stability Issue of Feature Selection

7 I NTRODUCTION AND M OTIVATION M OTIVATION FOR S TABLE F EATURE S ELECTION D1 D2 Features Samples Given Unlimited Sample Size of D : Feature selection results from D1 and D2 are the same Size of D is limited: ( n<<p for high dimensional data) Feature selection results from D1 and D2 are different Challenge : Increasing #of samples could be very costly or impractical Experts from Biology and Biomedicine are interested in: not only the prediction accuracy but also the consistency of feature subsets; validating stable genes or proteins less sensitive to variations to training data; biomarkers to explain the observed phenomena.

8 O UTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks

9 B ACKGROUND AND R ELATED W ORK F EATURE S ELECTION M ETHODS Subset Generation Subset Evaluation Stopping Criterion Result Validation Original set Subset Goodness of subset no Yes Evaluation Criteria  Filter Model  Wrapper Model  Embedded Model Search Strategies : Complete Search Sequential Search Random Search Representative Algorithms  Relief, SFS, MDLM, etc.  FSBC, ELSA, LVW, etc.  BBHFS, Dash-Liu’s, etc.

10 B ACKGROUND AND R ELATED W ORK S TABLE F EATURE S ELECTION Comparison of Feature Selection Algorithms w.r.t. Stability ( Davis et al. Bioinformatics, vol. 22, 2006; Kalousis et al. KAIS, vol. 12, 2007 ) Quantify the stability in terms of consistency on subset or weight; Algorithms varies on stability and equally well for classification; Choose the best with both stability and accuracy. Bagging-based Ensemble Feature Selection (Saeys et al. ECML07) Different bootstrapped samples of the same training set; Apply a conventional feature selection algorithm; Aggregates the feature selection results. Group-based Stable Feature Selection (Yu et al. KDD08; Loscalzo et al. KDD09) Explore the intrinsic feature correlations; Identify groups of correlated features; Select relevant feature groups.

11 B ACKGROUND AND R ELATED W ORK M ARGIN BASED F EATURE S ELECTION Sample Margin : how much can an instance travel before it hits the decision boundary Hypothesis Margin : how much can the hypothesis travel before it hits an instance (Distance between the hypothesis and the opposite hypothesis of an instance) Representative Algorithms: Relief, Relief-F, G-flip, Simba, etc. margin is used for feature weighting or feature selection (totally different use in our study)

12 O UTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks

13 P UBLICATIONS Yue Han and Lei Yu. An Empirical Study on Stability of Feature Selection Algorithms. Technical Report from Data Mining Research Laboratory, Binghamton University, 2009. Yue Han and Lei Yu. Margin Based Sample Weighting for Stable Feature Selection. In Proceedings of the 11th International Conference on Web-Age Information Management ( WAIM2010 ), pages 680-691, Jiuzhaigou, China, July 15-17, 2010. Yue Han and Lei Yu. A Variance Reduction Framework for Stable Feature Selection. In Proceedings of the 10th IEEE International Conference on Data Mining ( ICDM2010 ), Sydney, Australia, December 14-17, 2010, To Appear. Lei Yu, Yue Han and Michael E. Berens. Stable Gene Selection from Microarray Data via Sample Weighting. IEEE/ACM Transactions on Computational Biology and Bioinformatics ( TCBB ), 2010, Major Revision Under Review.

14 O UTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks

15 T HEORETICAL F RAMEWORK B IAS - VARIANCE D ECOMPOSITION OF F EATURE S ELECTION E RROR Expected Loss(Error) : Training Data: D ; Data Space: ; FS Result: r(D) ; True FS Result: r* Bias : Variance : Bias-Variance Decomposition of Feature Selection Error : o Reveals relationship between accuracy(opposite of loss) and stability (opposite of variance); o Suggests a better trade-off between the bias and variance of feature selection.

16 T HEORETICAL F RAMEWORK V ARIANCE R EDUCTION VIA I MPORTANCE S AMPLING Feature Selection (Weighting)  Monte Carlo Estimator Relevance Score : Monte Carlo Estimator : Variance of Monte Carlo Estimator : Impact Factor: feature selection algorithm and sample size ? Increasing sample size impractical and costly Importance Sampling A good importance sampling function h(x) Instance Weighting Intuition behind h(x) : More instances draw from important regions Less instances draw from other regions Intuition behind instance weight : Increase weights for instances from important regions Decrease weights for instances from other regions

17 O UTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks

18 E MPIRICAL F RAMEWORK O VERALL F RAMEWORK Challenges :  How to produce weights for instances from the point view of feature selection stability;  How to present weighted instances to conventional feature selection algorithms. Margin Based Instance Weighting for Stable Feature Selection

19 E MPIRICAL F RAMEWORK M ARGIN V ECTOR F EATURE S PACE Original Space For each Margin Vector Feature Space Hypothesis Margin: hitmiss Nearest Hit Nearest Miss captures the local profile of feature relevance for all features at  Instances exhibit different profiles of feature relevance;  Instances influence feature selection results differently.

20 E MPIRICAL F RAMEWORK A N I LLUSTRATIVE E XAMPLE Hypothesis-Margin based Feature Space Transformation: (a) Original Feature Space (b) Margin Vector Feature Space. (a) (b)

21 E MPIRICAL F RAMEWORK M ARGIN B ASED I NSTANCE W EIGHTING A LGORITHM Instance exhibits different profiles of feature relevance influence feature selection results differently Instance Weighting Higher Outlying Degree Lower Weight Lower Outlying Degree Higher Weight Review : Variance reduction via Importance Sampling  More instances draw from important regions  Less instances draw from other regions Weighting:Outlying Degree:

22 E MPIRICAL F RAMEWORK A LGORITHM I LLUSTRATION Time Complexity Analysis : o Dominated by Instance Weighting: o Efficient for High-dimensional Data with small sample size (n<<d)

23 O UTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks

24 E MPIRICAL S TUDY S UBSET S TABILITY M EASURES Average Pair-wise Similarity: Kuncheva Index: Feature Subset  Jaccard Index;  nPOGR;  SIMv. Feature Ranking :  Spearman Coefficient Feature Weighting :  Pearson Correlation Coefficient Training Data Feature Subset Training Data Feature Subset Training Data Feature Subset Feature Selection Method Consistent or not??? Stability of Feature Selection

25 E MPIRICAL S TUDY E XPERIMENTS ON S YNTHETIC D ATA Synthetic Data Generation : Feature Value: two multivariate normal distributions Covariance matrix is a 10*10 square matrix with elements 1 along the diagonal and 0.8 off diagonal. 100 groups and 10 feature each Class label: a weighted sum of all feature values with optimal feature weight vector 500 Training Data : 100 instances with 50 from and 50 from Leave-one-out Test Data : 5000 instances Method in Comparison : SVM-RFE: Recursively eliminate 10% features of previous iteration till 10 features remained. Measures : Variance, Bias, Error Subset Stability (Kuncheva Index) Accuracy (SVM)

26 E MPIRICAL S TUDY E XPERIMENTS ON S YNTHETIC D ATA Observations :  Error is equal to the sum of bias and variance for both versions of SVM-RFE;  Error is dominated by bias during early iterations and is dominated by variance during later iterations ;  IW SVM-RFE exhibits significantly lower bias, variance and error than SVM-RFE when the number of remaining features approaches 50.

27 E MPIRICAL S TUDY E XPERIMENTS ON S YNTHETIC D ATA Conclusion : Variance Reduction via Margin Based Instance Weighting  better bias-variance tradeoff  increased subset stability  improved classification accuracy

28 E MPIRICAL S TUDY E XPERIMENTS ON R EAL - WORLD D ATA Microarray Data : Experiment Setup : Methods in Comparison : SVM-RFE Ensemble SVM-RFE Instance Weighting SVM-RFE Measures : Variance Subset Stability Accuracies (KNN, SVM) 10 fold... Training Data Test Data 10-fold Cross-Validation Bootstrapped Training Data Feature Subset Aggregated Feature Subset 20... Bootstrapped Training Data... Feature Subset 20-Ensemble SVM-RFE

29 E MPIRICAL S TUDY E XPERIMENTS ON R EAL - WORLD D ATA Note : 40 iterations starting from about 1000 features till 10 features remain Observations :  Non-discriminative during early iterations;  SVM-RFE sharply increase as # of features approaches 10;  IW SVM-RFE shows significantly slower rate of increase.

30 E MPIRICAL S TUDY E XPERIMENTS ON R EAL - WORLD D ATA Observations :  Both ensemble and instance weighting approaches improve stability consistently;  Ensemble is not as significant as instance weighting;  As # of features increases, stability score decreases because of the larger correction factor.

31 E MPIRICAL S TUDY E XPERIMENTS ON R EAL - WORLD D ATA Conclusions :  Improves stability of feature selection without sacrificing prediction accuracy;  Performs much better than ensemble approach and more efficient;  Leads to significantly increased stability with slight extra cost of time.

32 O UTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks

33 P LANNED T ASKS O VERALL F RAMEWORK Theoretical Framework of Feature Selection Stability Empirical Instance Weighting Framework Margin-based Instance Weighting Representative FS Algorithms SVM-RFE Relief-F F-statistics HHSVM Various Real- world Data Set Gene Data Text Data Iterative Approach State-of-the-art Weighting Schemes Relationship Between Feature Selection Stability and Classification Accuracy

34 P LANNED T ASKS L ISTED T ASKS A Extensive Study on Instance Weighting Framework A1 Extension to Various Feature Selection Algorithms A2 Study on Datasets from Different Domains B Development of Algorithms under Instance Weighting Framework B1 Development of Instance Weighting Schemes B2 Iterative Approach for Margin Based Instance Weighting C Investigation on the Relationship between Stable Feature Selection and Classification Accuracy C1 How Bias-Variance Properties of Feature Selection Affect Classification Accuracy C2 Study on Various Factors for Stability of Feature Selection Oct-Dec 2010Jan-Mar 2011April-June2011July-Aug 2011 A1 A2 B1 B2 C1 C2

35 Thank you and Questions?


Download ppt "V ARIANCE R EDUCTION FOR S TABLE F EATURE S ELECTION Presenter: Yue Han Advisor: Lei Yu Department of Computer Science 10/27/10."

Similar presentations


Ads by Google