Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth

Similar presentations


Presentation on theme: "Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth"— Presentation transcript:

1 Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk http://users.aber.ac.uk/rkj

2 Outline Knowledge discovery process Fuzzy-rough methods – Feature selection and extensions – Instance selection – Classification/prediction – Semi-supervised learning

3 Knowledge discovery The process The problem of too much data – Requires storage – Intractable for data mining algorithms – Noisy or irrelevant data is misleading/confounding

4 Feature Selection

5 Feature selection Why dimensionality reduction/feature selection? Growth of information - need to manage this effectively Curse of dimensionality - a problem for machine learning and data mining Data visualisation - graphing data High dimensional data Dimensionality Reduction Low dimensional data Processing System Intractable

6 Why do it? Case 1: We’re interested in features – We want to know which are relevant – If we fit a model, it should be interpretable Case 2: We’re interested in prediction – Features are not interesting in themselves – We just want to build a good classifier (or other kind of predictor)

7 Feature selection process Feature selection (FS) preserves data semantics by selecting rather than transforming Subset generation: forwards, backwards, random… Evaluation function: determines ‘goodness’ of subsets Stopping criterion: decide when to stop subset search Generation Evaluation Stopping Criterion Validation Feature set Subset suitability ContinueStop

8 Fuzzy-rough feature selection

9 Fuzzy-rough set theory Problems: – Rough set methods (usually) require data discretization beforehand – Extensions, e.g. tolerance rough sets, require thresholds – Also no flexibility in approximations E.g. objects either belong fully to the lower (or upper) approximation, or not at all

10 Fuzzy-rough sets implicator t-norm Fuzzy-rough set Rough set

11 Fuzzy-rough feature selection Based on fuzzy similarity Lower/upper approximations (e.g.)

12 FRFS: evaluation function Fuzzy positive region #1 Fuzzy positive region #2 (weak) Dependency function

13 FRFS: finding reducts Fuzzy-rough QuickReduct – Evaluation: use the dependency function (or other fuzzy-rough measure) – Generation: greedy hill-climbing – Stopping criterion: when maximal evaluation function is reached (or to degree α)

14 FRFS Other search methods – GAs, PSO, EDAs, Harmony Search, etc – Backward elimination, plus-L minus-R, floating search, SAT, etc Other subset evaluations – Fuzzy boundary region – Fuzzy entropy – Fuzzy discernibility function

15 Ant-based FS

16 Boundary region UpperApproximation Set X LowerApproximation Equivalence class [x] B

17 FRFS: boundary region Fuzzy lower and upper approximation define fuzzy boundary region For each concept, minimise the boundary region – (also applicable to crisp RSFS) Results seem to show this is a more informed heuristic (but more computationally complex)

18 Finding smallest reducts Usually too expensive to search exhaustively for reducts with minimal cardinality Reducts found via discernibility matrices through, e.g.: – Converting from CNF to DNF (expensive) – Hill-climbing search using clauses (non-optimal) – Other search methods - GAs etc (non-optimal) SAT approach – Solve directly in SAT formulation – DPLL approach ensures optimal reducts

19 Fuzzy discernibility matrices Extension of crisp approach – Previously, attributes had {0,1} membership to clauses – Now have membership in [0,1] Fuzzy DMs can be used to find fuzzy-rough reducts

20 Formulation Fuzzy satisfiability In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true

21 Example

22 DPLL algorithm

23 Experimentation: results

24 FRFS: issues Problem – noise tolerance!

25 Vaguely quantified rough sets y belongs to the lower approximation of A iff all elements of Ry belong to A y belongs to the upper approximation of A iff at least one element of Ry belongs to A y belongs to the lower approximation of A iff all elements of Ry belong to A y belongs to the upper approximation of A iff at least one element of Ry belongs to A y belongs to the lower approximation of A iff most elements of Ry belong to A y belongs to the upper approximation of A iff at least some elements of Ry belong to A y belongs to the lower approximation of A iff most elements of Ry belong to A y belongs to the upper approximation of A iff at least some elements of Ry belong to A Pawlak rough set VQRS

26 VQRS-based feature selection Use the quantified lower approximation, positive region and dependency degree – Evaluation: the quantified dependency (can be crisp or fuzzy) – Generation: greedy hill-climbing – Stopping criterion: when the quantified positive region is maximal (or to degree α) Should be more noise-tolerant, but is non- monotonic

27 Progress Rough set theory Fuzzy rough set theory VQRSVQRS OWA-FRFSOWA-FRFS Fuzzy VPRS...... Qualitative data Quantitative data Noisy data Monotonic

28 More issues... Problem #1: how to choose fuzzy similarity? Problem #2: how to handle missing values?

29 Interval-valued FRFS Answer #1: Model uncertainty in fuzzy similarity by interval-valued similarity IV fuzzy rough set IV fuzzy similarity

30 Interval-valued FRFS When comparing two object values for a given attribute – what to do if at least one is missing? Answer #2: Model missing values via the unit interval

31 Other measures Boundary region Discernibility function

32 Initial experimentation Original Dataset Cross-validation folds Type-1 FRFS Reduced folds JRip Data corruption IV-FRFS methods Reduced folds JRip

33 Initial experimentation

34 Initial results: lower approx

35 Instance Selection

36 Instance selection: basic ideas Not needed Remove objects to keep the underlying approximations unchanged

37 Instance selection: basic ideas Remove objects whose positive region membership is < 1 Noisy objects

38 FRIS-I

39 FRIS-II

40 FRIS-III

41 Fuzzy rough instance selection Time complexity is a problem for FRIS-II and FRIS-III Less complex: Fuzzy rough prototype selection – More on this later...

42 Fuzzy-rough classification and prediction

43 FRNN/VQNN

44

45 Further developments FRNN and VQNN have limitations (for classification problems) – FRNN only uses one neighbour – VQNN equivalent to FNN if the same similarity relation is used POSNN uses the positive region to also consider the quality of neighbours – E.g. instances in overlapping class regions are less interesting – More on this later...

46 Discovering rules via RST Equivalence classes – Form the antecedent part of a rule – The lower approximation tells us if this is predictive of a given concept (certain rules) Typically done in one of two ways: – Overlaying reducts – Building rules by considering individual equivalence classes (e.g. LEM2)

47 QuickRules framework The fuzzy tolerance classes used during this process can be used to create fuzzy rules When a reduct is found the resulting rules cover all instances Generation Evaluation and Stopping Criterion Validation Feature set Subset suitability ContinueStop Rule Induction

48 Harmony search approach R. Diao and Q. Shen. A harmony search based approach to hybrid fuzzy-rough rule induction, Proceedings of the 21st International Conference on Fuzzy Systems, 2012.

49 abc score 2313 2324 4429 34521 Minimise ( a – 2 ) 2 + ( b – 3 ) 4 + ( c – 1 ) 2 + 3 Musicians Notes Harmony Harmony Memory Harmony search approach Fitness

50 Key notion mapping HarmonyRule set FitnessCombined evaluation MusicianFuzzy rule r x NoteFeature subset Solution Evaluation Variable Value Harmony Search Hybrid Rule Induction Numerical Optimisation

51 Comparison vs QuickRules HarmonyRules 56.33±10.00 QuickRules 63.1±11.89 Rule cardinality distribution for dataset web of 2556 features

52 Fuzzy-rough semi-supervised learning

53 Semi-supervised learning (SSL) Lies somewhere between supervised and unsupervised learning Why use it? – Data is expensive to label/classify – Labels can also be difficult to obtain – Large amounts of unlabelled data available When is SSL useful? – Small number of labelled objects but large number of unlabelled objects

54 Semi-supervised learning A number of methods for SSL – self-learning, generative models etc. – Labelled data objects – usually small in number – Unlabelled data objects – usually large in number – A set of features describe the objects – Class label tells us only which labelled objects belong to SSL therefore attempts to learn labels (or structure) for data which has no labels – Labelled data provides ‘clues’ for the unlabelled data

55 Co-training Labelled Dataset Learner 1Learner 2 subset 1subset 2 Predictions Unlabelled Data

56 Self-learning Labelled Dataset Learner Predictions Unlabelled Data Labelled data objects

57 Fuzzy-rough self learning (FRSL) Basic idea is to propagate labels using the upper and lower approximations – Label only those objects which belong to the lower approximation of a class to a high degree – Can use upper approximation to decide on ties Attempts to minimise mis-labelling and subsequent reinforcement Paper: N. Mac Parthalain and R. Jensen. Fuzzy-Rough Set based Semi-Supervised Learning. Proceedings of the 20th International Conference on Fuzzy Systems (FUZZ- IEEE’11), pp. 2465-2471, 2011.Fuzzy-Rough Set based Semi-Supervised Learning

58 FRSL Labelled dataset Fuzzy-rough learner Predictions Unlabelled Data Labelled data objects Lower approximation membership = 1? Yes No

59 Experimentation (Problem 1)

60 SS-FCM

61 FNN

62 FRSL

63 Experimentation (Problem 2)

64 SS-FCM

65 FNN

66 FRSL

67 Conclusion Looked at fuzzy-rough methods for data mining – Feature selection, finding optimal reducts – Handling missing values and other problems – Classification/prediction – Instance selection – Semi-supervised learning Future work – Imputation, better rule induction and instance selection methods, more semi-supervised methods, optimizations, instance/feature weighting

68 FR methods in Weka Weka implementations of all fuzzy-rough methods can be downloaded from: KEEL version available soon (hopefully!) http://users.aber.ac.uk/rkj/book/wekafull.jar


Download ppt "Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth"

Similar presentations


Ads by Google