Feature Selection in Classification and R Packages Houtao Deng 1Data Mining with R12/13/2011.

Slides:



Advertisements
Similar presentations
Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
Advertisements

Data Mining For Credit Card Fraud: A Comparative Study
Chapter 5 Multiple Linear Regression
Does one size really fit all? Evaluating classifiers in Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
Computational Learning An intuitive approach. Human Learning Objects in world –Learning by exploration and who knows? Language –informal training, inputs.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.
SVM—Support Vector Machines
Support Vector Machines and Margins
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Lecture 4: Embedded methods
Feature Selection Presented by: Nafise Hatamikhah
Reduced Support Vector Machine
Ensemble Learning: An Introduction
Feature Selection Lecture 5
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Intelligible Models for Classification and Regression
Ensemble Learning (2), Tree and Forest
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
A REVIEW OF FEATURE SELECTION METHODS WITH APPLICATIONS Alan Jović, Karla Brkić, Nikola Bogunović {alan.jovic, karla.brkic,
Presented By Wanchen Lu 2/25/2013
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments.
R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
CpSc 881: Machine Learning
Kaggle Competition Prudential Life Insurance Assessment
Why preprocessing? Learning method needs data type: numerical, nominal,.. Learning method cannot deal well enough with noisy / incomplete data Too many.
Using Classification Trees to Decide News Popularity
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
1.5. Gaussian Processes Examples An introductory regression example Fitting Noisy Data XIAO LIYING.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Logistic Regression & Elastic Net
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Machine Learning in CSC 196K
Copyright  2004 limsoon wong Using WEKA for Classification (without feature selection)
Dr. Gheith Abandah 1.  Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.
Regularized Least-Squares and Convex Optimization.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Introduction to Machine Learning
Presented by Jingting Zeng 11/26/2007
Boosting and Additive Trees (2)
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
CSE 4705 Artificial Intelligence
Usman Roshan Machine Learning
Estimating Link Signatures with Machine Learning Algorithms
Waikato Environment for Knowledge Analysis
Students: Meiling He Advisor: Prof. Brain Armstrong
Feature selection Usman Roshan.
Lasso/LARS summary Nasimeh Asgarian.
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
Usman Roshan Machine Learning
Machine Learning in Practice Lecture 22
Support Vector Machine _ 2 (SVM)
Support Vector Machine I
Feature Selection Methods
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Feature Selection in Classification and R Packages Houtao Deng 1Data Mining with R12/13/2011

Agenda  Concept of feature selection  Feature selection methods  The R packages for feature selection 12/13/2011Data Mining with R2

The need of feature selection An illustrative example: online shopping prediction 3  Difficult to understand  Maybe only a small number of pages are needed, e.g. pages related to books and placing orders Features (predictive variables, attributes) Class CustomerPage 1Page 2Page 3….Page 10,000Buy a Book 1131….1Yes 2210….2Yes 3200….0No ………………… Data Mining with R12/13/2011

Feature selection 4 Feature selection Benefits  Easier to understand  Less overfitting  Save time and space Data Mining with R12/13/2011 All features Feature subset Classifier Applications  Genomic Analysis  Text Classification  Marketing Analysis  Image Classification  … Accuracy is often used to evaluate the feature election method used

Feature selection methods  Univariate Filter Methods  Consider one feature’s contribution to the class at a time, e.g.  Information gain, chi-square  Advantages  Computationally efficient and parallelable  Disadvantages  May select low quality feature subsets 12/13/2011Data Mining with R5

Feature selection methods  Multivariate Filter methods  Consider the contribution of a set of features to the class variable, e.g.  CFS (correlation feature selection) [M Hall, 2000]  FCBF (fast correlation-based filter) [Lei Yu, etc. 2003]  Advantages:  Computationally efficient  Select higher-quality feature subsets than univariate filters  Disadvantages:  Not optimized for a given classifier 12/13/2011Data Mining with R6

Feature selection methods  Wrapper methods  Select a feature subset by building classifiers e.g.  LASSO (least absolute shrinkage and selection operator) [R Tibshirani, 1996]  SVM-RFE (SVM with recursive feature elimination) [I Guyon, etc. 2002]  RF-RFE (random forest with recursive feature elimination) [ R Uriarte, etc ]  RRF (regularized random forest) [H Deng, etc. 2011]  Advantages:  Select high-quality feature subsets for a particular classifier  Disadvantages:  RFE methods are relatively computationally expensive. 12/13/2011Data Mining with R7

Feature selection methods Select an appropriate wrapper method for a given classifier 8Data Mining with R12/13/2011 LASSOLogistic Regression RRF RF-RFE Tree models such as random forest, boosted trees, C4.5 SVM-RFESVM Feature selection methodClassifier

R packages  Rweka package  An R Interface to Weka  A large number of feature selection algorithms  Univariate filters: information gain, chi-square, etc.  Multivarite filters: CFS, etc.  Wrappers: SVM-RFE  Fselector package  Inherits a few feature selection methods from Rweka. 12/13/2011Data Mining with R9

R packages  Glmnet package  LASSO (least absolute shrinkage and selection operator)  Main parameter: penalty parameter ‘lambda’  RRF package  RRF (Regularized random forest)  Main parameter: coefficient of regularization ‘coefReg’  varSelRF package  RF-RFE (Random forest with recursive feature elimination)  Main parameter: number of iterations ‘ntreeIterat’ 12/13/2011Data Mining with R10

Examples  Consider LASSO, CFS (correlation features selection), RRF (regularized random forest), RF-RFE (random forest with RFE)  In all data sets, only 2 out of 100 features are needed for classification. 12/13/2011Data Mining with R11 Linear Separable LASSO, CFS, RF-RFE, RRF XOR data RRF, RF-RFE Nonlinear CFS, RF-RFE, RRF