Feature Selection: Algorithms and Challenges

Feature Selection: Algorithms and Challenges
Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University of Technology, China 合肥工业大学计算机应用长江学者讲座教授

Deduction Induction: My Research Background
1988 Expert Systems 2004 1990 Expert Systems 1995 …

Outlines Why feature selection What is feature selection
Components of feature selection Some research efforts by myself Challenges in feature selection

1. Why Feature Selection? High-dimensional data often contain irrelevant or redundant features reduce the accuracy of data mining algorithms slow down the mining process be a problem in storage and retrieval hard to interpret

2. What Is Feature Selection?
Select the most “relevant” subset of attributes according to some selection criteria.

Traditional Taxonomy Wrapper approach Filter approach
Features are selected as part of the mining algorithm Filter approach Features selected before a mining algorithm,using heuristics based on general characteristics of the data, rather than a learning algorithm to evaluate the merit of feature subsets Wrapper approach is generally more accurate but also more computationally expensive.

Components of Feature Selection
Feature selection is actually a search problem, including four basic components: an initial subset one or more selection criteria (＊) a search strategy (＊) some given stopping conditions

Feature Selection Criteria
Selection criteria generally use “relevance” to estimate the goodness of a selected feature subset in one way or another: Distance Measure Information Measure Inconsistency Measure Relevance Estimation Selection Criteria related to Learning Algorithms (wrapper approach) Some unified framework for relevance has been proposed recently.

Search Strategy Exhaustive Search A modified approach: B&B
Every possible subset is evaluated and the best one is chosen Guarantee the optimal solution Low efficiency A modified approach: B&B

Search Strategy (2) Heuristic search
Sequential search, including SFS,SFFS,SBS and SBFS SFS: Start with empty attribute set Add “best” of attributes Add “best” of remaining attributes Repeat until the maximum performance is reached SBS: Start with the entire attribute set Remove “worst” of attributes Repeat until the maximum performance has been reached.

Search Strategy (3) Random search It proceeds in two different ways
Inject randomness into classical sequential approaches (simulated annealing, beam search, the genetic algorithm , and random-start hill-climbing) Generate the next subset randomly The use of randomness can help to escape local optima in the search space, and the optimality of the selected subset would depend on the available resources.

RITIO: Rule Induction Two In One
Feature selection using the information gain in a reverse order Delete features that are lest informative Results are significant compared to forward selection [Wu et al 1999, TKDE].

Induction as Pre-processing
Use one induction algorithm to select attributes for another induction algorithm Can be a decision-tree method for rule induction, or vice versa Accuracy results are not as good as expected Reason: feature selection normally causes information loss Details: [Wu 1999, PAKDD].

Subspacing with Asysmetric Bagging
When the number of examples is less than the number of attributes When the number of positive examples is smaller than the number of negative examples An example: content-based information retrieval Details: [Tao et al., 2006, TPAMI].

Challenges in Feature Selection (1)
Dealing with ultra-high dimensional data and feature interactions Traditional feature selection encounter two major problems when the dimensionality runs into tens or hundreds of thousands: curse of dimensionality the relative shortage of instances. 1 curse of dimensionality As most existing feature selection algorithms have quadratic or higher time complexity about N, it is difficult to scale up with high dimensionality. 2 the relative shortage of instances. That is, the dimensionality N can sometimes greatly exceed the number of instances I

Dealing with active instances (Liu et al., 2005) When the dataset is huge, feature selection performed on the whole dataset is inefficient, so instance selection is necessary: Random sampling (pure random sampling without exploiting any data characteristics) Active feature selection (selective sampling using data characteristics achieves better or equally good results with a significantly smaller number of instances). Random sampling (pure random sampling without exploiting any data characteristic) Active feature selection (selective sampling by using data characteristics achieves better or equally good results with a significantly smaller number of instances)

Dealing with new data types (Liu et al., 2005) traditional data type: an N*M data matrix Due to the growth of computer and Internet/Web techniques, new data types are emerging: text-based data (e.g., s, online news, newsgroups) semistructure data (e.g., HTML, XML) data streams.

Unsupervised feature selection Feature selection vs classification: almost every classification algorithm Subspace method with the curse of dimensionality in classification Subspace clustering.

Dealing with predictive-but-unpredictable attributes in noisy data Attribute noise is difficult to process, and removing noisy instances is dangerous Predictive attributes: essential to classification Unpredictable attributes: cannot be predicted by the class and other attributes Noise identification, cleansing, and measurement need special attention [Yang et al., 2004]

Deal with inconsistent and redundant features Redundancy can indicate reliability Inconsistency can also indicate a problem for handling Researchers in Rough Set Theory: What is the purpose of feature selection? Can you really demonstrate the usefulness of reduction, in data mining accuracy, or what? Removing attributes can well result in information loss When the data is very noisy, removals can cause a very different data distribution Discretization can possibly bring new issues.

Concluding Remarks Feature selection is and will remain an important issue in data mining, machine learning, and related disciplines Feature selection has a price in accuracy for efficiency Researchers need to have the bigger picture in mind, not just doing selection for the purpose of feature selection.

Feature Selection: Algorithms and Challenges

Similar presentations

Presentation on theme: "Feature Selection: Algorithms and Challenges"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Feature Selection: Algorithms and Challenges

Similar presentations

Presentation on theme: "Feature Selection: Algorithms and Challenges"— Presentation transcript:

Similar presentations

About project

Feedback