Presentation is loading. Please wait.

Presentation is loading. Please wait.

NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies.

Similar presentations


Presentation on theme: "NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies."— Presentation transcript:

1 NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies

2 7:30-8:00: Welcome and introduction to the problem of feature/variable selection - Isabelle Guyon - 8:00-8:20 a.m. Dimensionality Reduction via Sparse Support Vector Machines - Jinbo Bi, Kristin P. Bennett, Mark Embrechts and, Curt Breneman - 8:20-8:40 a.m. Feature selection for non-linear SVMs using a gradient descent algorithm - Olivier Chapelle and Jason Weston - 8:40-9:00 a.m. When Rather Than Whether: Developmental Variable Selection - Melissa Dominguez - 9:00-9:20 a.m. Pause, free discussions. 9:20-9:40 How to recycle your SVM code to do feature selection - Andre Elisseeff and Jason Weston - 9:40-10:00 Lasso-type estimators for variable selection - Yves Grandvalet and Stéphane Canu - 10:00-10:30 a.m. Discussion. What are the various statements of the variable selection problem? 4:00-4:20 p.m. Using DRCA to see the effects of variable combinations on classifiers - Ofer Melnik - 4:20-4:40 p.m.Feature selection in the setting of many irrelevant features - Andrew Y. Ng and Michael I. Jordan 4:40-5:00 p.m. Relevant coding and information bottlenecks: A principled approach to multivariate feature selection - Naftali Tishby - 5:00-5:20 p.m. Learning discriminative feature transforms may be an easier problem than feature selection - Kari Torkkola 5:20-5:30 p.m. Pause. 5:30-6:30 p.m. Discussion. Organization of a future workshop with benchmark. 6:30-7:00 p.m. Impromptu talks. Schedule

3 Outline Relevance to the “concept” Usefulness to the predictor Vocabulary Variable vs. feature

4 Relevance to the concept System or “Concept” Output 1- Eliminate distracters 2 - Rank (combinations of ) relevant variables Objectives

5 A big search problem Definition of distracter: if tweaked, no change in input/output relationship for any position of all other knobs. “Exhaustive search”: Check all knob positions. One knob at a time does not work if one variable alone does not control the output. For continuous variables: need experimental design. Greedy “query” strategies.

6 More difficulties Noisy/bad data (imprecise knobs, reading errors, systematic errors). Lack of data: cannot perform optimum experimental design. Probabilistic definition of a distracter: P(distractor)=fraction of times everything else equal, a change in the position of the knob does not result in a change in output. Continuous case: need to measure state space areas in which a knob has little or no effect.

7 Yet harder Output Uncontrollable variables Controllable variables Unobservable variables

8 x1x1 0 1 2 y=[x 1 +2(x 2 -1)]  (x 1 )  (x 2 ) 00 +  2 +  Tiny example x2x2 x3x3 0 34 12 x1x1 x2x2 x2x2 x3x3 0 1 2 0 1 2

9 2,0 Theory and practice x1x1 x2x2 x3x3 x2x2 x3x3 0 1 2 0 1 2 0 1 2 x1x1 x2x2 x3x3 y y y 012 012 012 1 2 3 4 1 2 3 4 1 2 3 4 x 2 =0 x 2 =1 x 2 =2 x 1 =0 x 1 =1 x 1 =2 Any x 3 0,0 0,1 0,2 1,0 1,1 1,2 2,1 2,2 x 1,x 2 0 34 12 x1x1 x2x2

10 Use of a predictor If the system is observed only through given examples of input/outputs or if it is expensive to get a lot of data points: build a predictor. Define criterion of relevance, e.g.  (f(x 1, x 2, x 3 )-f(x 1, x 2 )) 2 dP(x 3 |x 1,x 2 ) and approximate it using empirical data.

11 Relevance to the concept: weak and strong relevance Kohavi et al.: classification problem. x i is strongly relevant if its removal yields a deterioration of the performance of the Bayes Optimum Classifier. x i is weakly relevant if not strongly relevant and there exists a subset of variables S such that the performance on S  {x i } is better than the performance on S. Features that are neither strongly or weakly relevant are irrelevant.

12 Usefulness to the predictor New objective: make good predictions. Find a subset of variables that min. an estimate of the generalization error E. Find a subset of size n or less that min. E. Find a subset of min. size for which E  E_all_var +  Model selection pb.: CV, perf. bounds, etc.

13 Relevance to the concept vs. usefulness to the predictor A relevant variable may not contribute to getting a better predictor (e.g. case of redundant variables). Reciprocally, a variable that helps improving the performance of a predictor may be irrelevant (e.g. a bias value).

14 Algorithms Filters vs. wrappers. Exhaustive search. Backward elimination vs. forward selection. Other greedy search methods (e.g. best first, beam search, compound operators). Organization of results. Overfitting problems. H64807 R55310 T62947 H08393 T62947 U09564 R88740 M59040 R88740 T94579 H81558 T64012 T86444 H06524 H81558 H06524 U19969 H06524 T94579 T58861 M59040 L08069 H08393 M82919 L03840 U19969 D14812 M82919 L06895 10 20 30 40 50 60

15 Classical statistics (compare with random data). Machine Learning (predict accuracy w. test data). Validation with other data (e.g. medical literature). Validation 0100020003000400050006000 -1000 0 1000 2000 3000 4000 5000 6000 Genes called significant Estimated falsely significant genes

16 Epilogue Do not confuse relevance to the concept and usefulness to the predictor. Do not confuse correlation and causality. Q1: what are good statements of the variable/feature selection problem? Q2: what are good benchmarks?


Download ppt "NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies."

Similar presentations


Ads by Google