Data Mining 2, 2006 1. 2 Filter methods T statistic Information Distance Correlation Separability …

Data Mining 2, 2006 1

2 Filter methods T statistic Information Distance Correlation Separability …

Data Mining 2, 2006 3 FSS Algorithms Exponential Exhaustiva search Branch & Bound Beam search Sequential SFS and SBS Plus-l / Minus-r Bidirectional Floating (exponential in worst case) Randomized Sequential + randomness Genetic

Data Mining 2, 2006 4 SFS performs best when the optimal subset has a small number of features When the search is near the empty set, a large number of states can be potentially evaluated Towards the full set, the region examined by SFS is narrower since most of the features have already been selected

Data Mining 2, 2006 5 Example The optimal feature subset turns out to be {x1, x4}, because x4 provides the only information that x1 needs: discrimination between classes ω4 and ω5

Data Mining 2, 2006 6 SBS works best when the optimal feature subset has a large number of features, since it spends most of its time visiting large subsets The main limitation of SBS is its inability to reevaluate the usefulness of a feature after it has been discarded

Data Mining 2, 2006 7 SFS is performed from the empty set SBS is performed from the full set Features selected by SFS are not removed by SBS Features removed by SBS are not selected by SFS Guarantee convergence

9 Some backtracking ability Main limitation is that there is no theoretical way of predicting the optimal values of l and r

Data Mining 2, 2006 12 Monotonicity in J:

Data Mining 2, 2006 13 The value of A is updated when a greater one is found in a leaf Stop whenever every leaf has been purged or evaluated A=J({1,2}) B A >= B  purge

Data Mining 2, 2006 16 With a proper queue size, Beam Search can avoid getting trapped in local minimal by preserving solutions from varying regions in the search space The optimal is 2-3-4 (J=9), which is never explored

Data Mining 2, 2006 21 The RELIEF algorithm (1)

Data Mining 2, 2006 22 The RELIEF algorithm (2)

Data Mining 2, 2006 23 Conclusions Dificult and pervasive problem! Lack of accepted and useful definitions for relevance, redundancy and irrelevance Nesting problem Abundance of algorithms and filters Lack of proper comparative benchmarks Obligatory step, usually well worth the pain

Data Mining 2, 2006 1. 2 Filter methods T statistic Information Distance Correlation Separability …

Similar presentations

Presentation on theme: "Data Mining 2, 2006 1. 2 Filter methods T statistic Information Distance Correlation Separability …"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Mining 2, 2006 1. 2 Filter methods T statistic Information Distance Correlation Separability …

Similar presentations

Presentation on theme: "Data Mining 2, 2006 1. 2 Filter methods T statistic Information Distance Correlation Separability …"— Presentation transcript:

Similar presentations

About project

Feedback