Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.

Similar presentations


Presentation on theme: "Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis."— Presentation transcript:

1 Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis

2 Outline Motivation/Feature Selection (FS) Rough set theory Fuzzy-rough feature selection Feature grouping Experimentation

3 The problem: too much data The amount of data is growing exponentially – Staggering 4300% annual growth in global data Therefore, there is a need for FS and other data reduction methods – Curse of dimensionality: a problem for machine learning techniques The complexity of the problem is vast – (e.g. the powerset of features for FS)

4 Feature selection Remove features that are: – Noisy – Irrelevant – Misleading Task: find a subset that – Optimises a measure of subset goodness – Has small/minimal cardinality In rough set theory, this is a search for reducts – Much research in this area

5 Rough set theory (RST) For a subset of features P Upper approximation Set X Lower approximation Equivalence class [x] P

6 Rough set feature selection By considering more features, concepts become easier to define…

7 Rough set theory Problems: – Rough set methods (usually) require data discretization beforehand – Extensions require thresholds, e.g. tolerance rough sets – Also no flexibility in approximations E.g. objects either belong fully to the lower (or upper) approximation, or not at all

8 Fuzzy-rough sets Extends rough set theory – Use of fuzzy tolerance instead of crisp equivalence – Approximations are fuzzified – Collapses to traditional RST when data is crisp New definitions: Fuzzy upper approximation: Fuzzy lower approximation:

9 Fuzzy-rough feature selection Search for reducts – Minimal subsets of features that preserve the fuzzy lower approximations for all decision concepts Traditional approach – Greedy hill-climbing algorithm used – Other search techniques have been applied (e.g. PSO) Problems – Complexity is problematic for large data (e.g. over several thousand features) – No explicit handling of redundancy

10 Feature grouping Idea: don’t need to consider all features – Those that are highly correlated with each other carry the same or similar information – Therefore, we can group these, and work on a group by group basis This paper: based on greedy hill-climbing – Group-then-rank approach Relevancy and redundancy handled by – Correlation: similar features grouped together – Internal ranking (correlation with decision feature) F1F1

11 Forming groups of features Calculate correlations F1F1 F1F1 F2F2 F2F2 F3F3 F3F3 FnFn FnFn... #1 f 3 #2 f 12 #3 f 1 … #m f n #1 f 3 #2 f 12 #3 f 1 … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n Feature groups Internally-ranked feature groups Correlation measure Threshold : Redundancy Relevancy Data τ

12 ... Selecting features Feature subset search and selection Search mechanism Subset evaluation Selected subset(s)

13 Fuzzy-rough feature grouping

14 Initial experimentation Setup: – 10 datasets (9-2557 features) – 3 classifiers – Stratified 5 x 10-fold cross-validation Performance evaluation in terms of – Subset size – Classification accuracy – Execution time FRFG compared with – Traditional greedy hill-climber (GHC) – GA & PSO (200 generations, population size: 40)

15 Results: average subset size

16 Results: classification accuracy JRip IBk (k=3)

17 Results: execution times (s)

18 Conclusion FRFG – Motivation: reduce computational overhead; improve consideration of redundancy – Group-then-rank approach – Parameter determines granularity of grouping – Weka implementation available: http://bit.ly/1oic2xMhttp://bit.ly/1oic2xM Future work – Automatic determination of parameter τ – Experimentation using much larger data, other FS methods, etc – Clustering of features – Unsupervised selection?

19 Thank you!

20 Simple example Dataset of six features After initialisation, the following groups are formed Within each group, rank determines relevance: e.g. f 4 more relevant than f 3 Ordering of groups Greedy hill-climber F1F1 F2F2 F3F3 F4F4 etc… {F 4, F 1, F 3, F 5, F 2, F 6 }F =

21 Simple example... First group to be considered: F 4 – Feature f 4 is preferable over others – So, add this to current (initially empty) subset R – Evaluate M(R + {f 4 }): If better score than the current best evaluation, store f 4 Current best evaluation = M(R + {f 4 }) – Set of features which appear in F 4 : ({f 1, f 4, f 5 }) Add to the set Avoids Next feature group with elements that do not appear in Avoids: F 1 And so on… F4F4 F1F1


Download ppt "Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis."

Similar presentations


Ads by Google