Presentation is loading. Please wait.

Presentation is loading. Please wait.

Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University,

Similar presentations


Presentation on theme: "Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University,"— Presentation transcript:

1 Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University, UK Extending Propositional Satisfiability to Determine Minimal Fuzzy-Rough Reducts

2 Richard Jensen, Andrew Tuson and Qiang Shen Outline The importance of feature selection Rough set theory Fuzzy-rough feature selection (FRFS) FRFS-SAT Experimentation Conclusion

3 Richard Jensen, Andrew Tuson and Qiang Shen Why dimensionality reduction/feature selection? Growth of information - need to manage this effectively Curse of dimensionality - a problem for machine learning High dimensional data Dimensionality Reduction Low dimensional data Processing System Intractable Feature selection

4 Richard Jensen, Andrew Tuson and Qiang Shen Rough set theory Rx is the set of all points that are indiscernible with point x in terms of feature subset B Upper Approximation Set A Lower Approximation Equivalence class Rx

5 Richard Jensen, Andrew Tuson and Qiang Shen Discernibility approach Decision-relative discernibility matrix Compare objects Examine attribute values For attributes that differ: If decision values differ, include attributes in matrix Else leave slot blank Construct discernibility function:

6 Richard Jensen, Andrew Tuson and Qiang Shen Example Remove duplicates f C (a,b,c,d) ={a ⋁ b ⋁ c ⋁ d} ⋀ {a ⋁ c ⋁ d} ⋀ {b ⋁ c} ⋀ {d} ⋀ {a ⋁ b ⋁ c} ⋀ {a ⋁ b ⋁ d} ⋀ {b ⋁ c ⋁ d} ⋀ {a ⋁ d} Remove supersets f C (a,b,c,d) = {b ⋁ c} ⋀ {d}

7 Richard Jensen, Andrew Tuson and Qiang Shen Finding reducts Usually too expensive to search exhaustively for reducts with minimal cardinality Reducts found through: Converting from CNF to DNF (expensive) Hill-climbing search using clauses (non-optimal) Other search methods - GAs etc (non-optimal) RSAR-SAT Solve directly in SAT formulation. DPLL approach is both fast and ensures optimal reducts

8 Richard Jensen, Andrew Tuson and Qiang Shen Fuzzy discernibility matrices Extension of crisp approach Previously, attributes had {0,1} membership to clauses Now have membership in [0,1] Allows real-coded data as well as nominal. Fuzzy DMs can be used to find fuzzy-rough reducts

9 Richard Jensen, Andrew Tuson and Qiang Shen Formulation Fuzzy satisfiability In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true

10 Richard Jensen, Andrew Tuson and Qiang Shen Experimentation: setup 9 benchmark datasets Features – 10 to 39 Objects – 120 to 690 Methods used: FRFS-SAT Greedy hill-climbing: fuzzy dependency, fuzzy boundary region and fuzzy discernibility. Evolutionary algorithms: genetic algorithms (GA) and particle swarm optimization (PSO) using fuzzy dependency 10x10-fold cross validation FS performed on the training folds, test folds reduced using discovered reducts

11 Richard Jensen, Andrew Tuson and Qiang Shen Experimentation: results

12 Richard Jensen, Andrew Tuson and Qiang Shen Conclusion Extended propositional satisfiability to enable search for fuzzy-rough reducts New framework for fuzzy satisfiability New DPLL algorithm Fuzzy clause simplification Future work: Non-chronological backtracking Better heuristics Unsupervised FS Other extensions in propositional satisfiability

13 Richard Jensen, Andrew Tuson and Qiang Shen WEKA implementations of all fuzzy-rough feature selectors and classifiers can be downloaded from:

14 Richard Jensen, Andrew Tuson and Qiang Shen Feature selection Feature selection (FS) is a DR technique that preserves data semantics (meaning of data) Subset generationSubset generation: forwards, backwards, random… Evaluation functionEvaluation function: determines ‘goodness’ of subsets Stopping criterionStopping criterion: decide when to stop subset search Generation Evaluation Stopping Criterion Validation

15 Richard Jensen, Andrew Tuson and Qiang Shen Algorithm

16 Richard Jensen, Andrew Tuson and Qiang Shen Example


Download ppt "Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University,"

Similar presentations


Ads by Google