Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth

Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk http://users.aber.ac.uk/rkj

Outline Knowledge discovery process Fuzzy-rough methods – Feature selection and extensions – Instance selection – Classification/prediction – Semi-supervised learning

Knowledge discovery The process The problem of too much data – Requires storage – Intractable for data mining algorithms – Noisy or irrelevant data is misleading/confounding

Feature Selection

Feature selection Why dimensionality reduction/feature selection? Growth of information - need to manage this effectively Curse of dimensionality - a problem for machine learning and data mining Data visualisation - graphing data High dimensional data Dimensionality Reduction Low dimensional data Processing System Intractable

Why do it? Case 1: We’re interested in features – We want to know which are relevant – If we fit a model, it should be interpretable Case 2: We’re interested in prediction – Features are not interesting in themselves – We just want to build a good classifier (or other kind of predictor)

Feature selection process Feature selection (FS) preserves data semantics by selecting rather than transforming Subset generation: forwards, backwards, random… Evaluation function: determines ‘goodness’ of subsets Stopping criterion: decide when to stop subset search Generation Evaluation Stopping Criterion Validation Feature set Subset suitability ContinueStop

Fuzzy-rough feature selection

Fuzzy-rough set theory Problems: – Rough set methods (usually) require data discretization beforehand – Extensions, e.g. tolerance rough sets, require thresholds – Also no flexibility in approximations E.g. objects either belong fully to the lower (or upper) approximation, or not at all

Fuzzy-rough sets implicator t-norm Fuzzy-rough set Rough set

Fuzzy-rough feature selection Based on fuzzy similarity Lower/upper approximations (e.g.)

FRFS: evaluation function Fuzzy positive region #1 Fuzzy positive region #2 (weak) Dependency function

FRFS: finding reducts Fuzzy-rough QuickReduct – Evaluation: use the dependency function (or other fuzzy-rough measure) – Generation: greedy hill-climbing – Stopping criterion: when maximal evaluation function is reached (or to degree α)

FRFS Other search methods – GAs, PSO, EDAs, Harmony Search, etc – Backward elimination, plus-L minus-R, floating search, SAT, etc Other subset evaluations – Fuzzy boundary region – Fuzzy entropy – Fuzzy discernibility function

Ant-based FS

Boundary region UpperApproximation Set X LowerApproximation Equivalence class [x] B

FRFS: boundary region Fuzzy lower and upper approximation define fuzzy boundary region For each concept, minimise the boundary region – (also applicable to crisp RSFS) Results seem to show this is a more informed heuristic (but more computationally complex)

Finding smallest reducts Usually too expensive to search exhaustively for reducts with minimal cardinality Reducts found via discernibility matrices through, e.g.: – Converting from CNF to DNF (expensive) – Hill-climbing search using clauses (non-optimal) – Other search methods - GAs etc (non-optimal) SAT approach – Solve directly in SAT formulation – DPLL approach ensures optimal reducts

Fuzzy discernibility matrices Extension of crisp approach – Previously, attributes had {0,1} membership to clauses – Now have membership in [0,1] Fuzzy DMs can be used to find fuzzy-rough reducts

Formulation Fuzzy satisfiability In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true

Example

DPLL algorithm

Experimentation: results

FRFS: issues Problem – noise tolerance!

Vaguely quantified rough sets y belongs to the lower approximation of A iff all elements of Ry belong to A y belongs to the upper approximation of A iff at least one element of Ry belongs to A y belongs to the lower approximation of A iff all elements of Ry belong to A y belongs to the upper approximation of A iff at least one element of Ry belongs to A y belongs to the lower approximation of A iff most elements of Ry belong to A y belongs to the upper approximation of A iff at least some elements of Ry belong to A y belongs to the lower approximation of A iff most elements of Ry belong to A y belongs to the upper approximation of A iff at least some elements of Ry belong to A Pawlak rough set VQRS

VQRS-based feature selection Use the quantified lower approximation, positive region and dependency degree – Evaluation: the quantified dependency (can be crisp or fuzzy) – Generation: greedy hill-climbing – Stopping criterion: when the quantified positive region is maximal (or to degree α) Should be more noise-tolerant, but is non- monotonic

Progress Rough set theory Fuzzy rough set theory VQRSVQRS OWA-FRFSOWA-FRFS Fuzzy VPRS...... Qualitative data Quantitative data Noisy data Monotonic

More issues... Problem #1: how to choose fuzzy similarity? Problem #2: how to handle missing values?

Interval-valued FRFS Answer #1: Model uncertainty in fuzzy similarity by interval-valued similarity IV fuzzy rough set IV fuzzy similarity

Interval-valued FRFS When comparing two object values for a given attribute – what to do if at least one is missing? Answer #2: Model missing values via the unit interval

Other measures Boundary region Discernibility function

Initial experimentation Original Dataset Cross-validation folds Type-1 FRFS Reduced folds JRip Data corruption IV-FRFS methods Reduced folds JRip

Initial experimentation

Initial results: lower approx

Instance Selection

Instance selection: basic ideas Not needed Remove objects to keep the underlying approximations unchanged

Instance selection: basic ideas Remove objects whose positive region membership is < 1 Noisy objects

FRIS-I

FRIS-II

FRIS-III

Fuzzy rough instance selection Time complexity is a problem for FRIS-II and FRIS-III Less complex: Fuzzy rough prototype selection – More on this later...

Fuzzy-rough classification and prediction

FRNN/VQNN

Further developments FRNN and VQNN have limitations (for classification problems) – FRNN only uses one neighbour – VQNN equivalent to FNN if the same similarity relation is used POSNN uses the positive region to also consider the quality of neighbours – E.g. instances in overlapping class regions are less interesting – More on this later...

Discovering rules via RST Equivalence classes – Form the antecedent part of a rule – The lower approximation tells us if this is predictive of a given concept (certain rules) Typically done in one of two ways: – Overlaying reducts – Building rules by considering individual equivalence classes (e.g. LEM2)

QuickRules framework The fuzzy tolerance classes used during this process can be used to create fuzzy rules When a reduct is found the resulting rules cover all instances Generation Evaluation and Stopping Criterion Validation Feature set Subset suitability ContinueStop Rule Induction

Harmony search approach R. Diao and Q. Shen. A harmony search based approach to hybrid fuzzy-rough rule induction, Proceedings of the 21st International Conference on Fuzzy Systems, 2012.

abc score 2313 2324 4429 34521 Minimise ( a – 2 ) 2 + ( b – 3 ) 4 + ( c – 1 ) 2 + 3 Musicians Notes Harmony Harmony Memory Harmony search approach Fitness

Key notion mapping HarmonyRule set FitnessCombined evaluation MusicianFuzzy rule r x NoteFeature subset Solution Evaluation Variable Value Harmony Search Hybrid Rule Induction Numerical Optimisation

Comparison vs QuickRules HarmonyRules 56.33±10.00 QuickRules 63.1±11.89 Rule cardinality distribution for dataset web of 2556 features

Fuzzy-rough semi-supervised learning

Semi-supervised learning (SSL) Lies somewhere between supervised and unsupervised learning Why use it? – Data is expensive to label/classify – Labels can also be difficult to obtain – Large amounts of unlabelled data available When is SSL useful? – Small number of labelled objects but large number of unlabelled objects

Semi-supervised learning A number of methods for SSL – self-learning, generative models etc. – Labelled data objects – usually small in number – Unlabelled data objects – usually large in number – A set of features describe the objects – Class label tells us only which labelled objects belong to SSL therefore attempts to learn labels (or structure) for data which has no labels – Labelled data provides ‘clues’ for the unlabelled data

Co-training Labelled Dataset Learner 1Learner 2 subset 1subset 2 Predictions Unlabelled Data

Self-learning Labelled Dataset Learner Predictions Unlabelled Data Labelled data objects

Fuzzy-rough self learning (FRSL) Basic idea is to propagate labels using the upper and lower approximations – Label only those objects which belong to the lower approximation of a class to a high degree – Can use upper approximation to decide on ties Attempts to minimise mis-labelling and subsequent reinforcement Paper: N. Mac Parthalain and R. Jensen. Fuzzy-Rough Set based Semi-Supervised Learning. Proceedings of the 20th International Conference on Fuzzy Systems (FUZZ- IEEE’11), pp. 2465-2471, 2011.Fuzzy-Rough Set based Semi-Supervised Learning

FRSL Labelled dataset Fuzzy-rough learner Predictions Unlabelled Data Labelled data objects Lower approximation membership = 1? Yes No

Experimentation (Problem 1)

SS-FCM

Experimentation (Problem 2)

SS-FCM

Conclusion Looked at fuzzy-rough methods for data mining – Feature selection, finding optimal reducts – Handling missing values and other problems – Classification/prediction – Instance selection – Semi-supervised learning Future work – Imputation, better rule induction and instance selection methods, more semi-supervised methods, optimizations, instance/feature weighting

FR methods in Weka Weka implementations of all fuzzy-rough methods can be downloaded from: KEEL version available soon (hopefully!) http://users.aber.ac.uk/rkj/book/wekafull.jar

Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth

Similar presentations

Presentation on theme: "Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth

Similar presentations

Presentation on theme: "Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth"— Presentation transcript:

Similar presentations

About project

Feedback