_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.

Slides:



Advertisements
Similar presentations
Protein Secondary Structure Prediction Using BLAST and Relaxed Threshold Rule Induction from Coverings Leong Lee Missouri University of Science and Technology,
Advertisements

Data Mining Classification: Alternative Techniques
Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006.
Introduction The concept of transform appears often in the literature of image processing and data compression. Indeed a suitable discrete representation.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Rough Sets Tutorial.
Artificial Intelligence Knowledge Representation Problem.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Fast Algorithms For Hierarchical Range Histogram Constructions
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
© by Kenneth H. Rosen, Discrete Mathematics & its Applications, Sixth Edition, Mc Graw-Hill, 2007 Chapter 1: (Part 2): The Foundations: Logic and Proofs.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Fuzzy Sets and Fuzzy Logic Theory and Applications
Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.
Rough Set Strategies to Data with Missing Attribute Values Jerzy W. Grzymala-Busse Department of Electrical Engineering and Computer Science University.
Rough Sets Theory Speaker:Kun Hsiang.
Planning under Uncertainty
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core.
Lecture 21 Rule discovery strategies LERS & ERID.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Uncertainty Measure and Reduction in Intuitionistic Fuzzy Covering Approximation Space Feng Tao Mi Ju-Sheng.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Theory and Applications
ECE 667 Synthesis and Verification of Digital Systems
Experimental Evaluation
Chapter 7 Reasoning about Knowledge by Neha Saxena Id: 13 CS 267.
Basic Data Mining Techniques
On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO MAYAGÜEZ CAMPUS
Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth
CSE & CSE6002E - Soft Computing Winter Semester, 2011 More Rough Sets.
Classifying Attributes with Game- theoretic Rough Sets Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
From Rough Set Theory to Evidence Theory Roman Słowiński Laboratory of Intelligent Decision Support Systems Institute of Computing Science Poznań University.
Reducing the Response Time for Data Warehouse Queries Using Rough Set Theory By Mahmoud Mohamed Al-Bouraie Yasser Fouad Mahmoud Hassan Wesam Fathy Jasser.
Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University,
Overview Concept Learning Representation Inductive Learning Hypothesis
College Algebra Sixth Edition James Stewart Lothar Redlin Saleem Watson.
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Rough Set Theory. 2 Introduction _ Fuzzy set theory –Introduced by Zadeh in 1965 [1] –Has demonstrated its usefulness in chemistry and in other disciplines.
Data Mining Spring 2007 Noisy data Data Discretization using Entropy based and ChiMerge.
Summary „Rough sets and Data mining” Vietnam national university in Hanoi, College of technology, Feb.2006.
Data Mining and Decision Support
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Rough Sets, Their Extensions and Applications 1.Introduction  Rough set theory offers one of the most distinct and recent approaches for dealing with.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
More Rough Sets.
Review: Discrete Mathematics and Its Applications
CS 9633 Machine Learning Concept Learning
Rough Sets.
Rough Sets.
Machine Learning: Lecture 3
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Review: Discrete Mathematics and Its Applications
Rough Set Theory.
Dependencies in Structures of Decision Tables
Data Transformations targeted at minimizing experimental variance
Rough Sets (Theoretical Aspects of Reasoning about Data)
Implementation of Learning Systems
Presentation transcript:

_ Rough Sets

Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership _ Dependency of Attributes

Information Systems/Tables _ IS is a pair (U, A) _ U is a non-empty finite set of objects. _ A is a non-empty finite set of attributes such that for every _ is called the value set of a. Age LEMS x 1 x x x x x x

Decision Systems/Tables _ DS: _ is the decision attribute (instead of one we can consider more decision attributes). _ The elements of A are called the condition attributes. Age LEMS Walk x 1 yes x no x no x yes x no x yes x no

Indiscernibility _ The equivalence relation A binary relation which is reflexive (xRx for any object x), symmetric (if xRy then yRx), and transitive (if xRy and yRz then xRz). _ The equivalence class of an element consists of all objects such that xRy.

Indiscernibility (2) _ Let IS = (U, A) be an information system, then with any there is an associated equivalence relation: where is called the B-indiscernibility relation. _ If then objects x and x’ are indiscernible from each other by attributes from B. _ The equivalence classes of the B-indiscernibility relation are denoted by

An Example of Indiscernibility _ The non-empty subsets of the condition attributes are {Age}, {LEMS}, and {Age, LEMS}. _ IND({Age}) = {{x1,x2,x6}, {x3,x4}, {x5,x7}} _ IND({LEMS}) = {{x1}, {x2}, {x3,x4}, {x5,x6,x7}} _ IND({Age,LEMS}) = {{x1}, {x2}, {x3,x4}, {x5,x7}, {x6}}. Age LEMS Walk x 1 yes x no x no x yes x no x yes x no

Observations _ An equivalence relation induces a partitioning of the universe. _ The partitions can be used to build new subsets of the universe. _ Subsets that are most often of interest have the same value of the decision attribute. It may happen, however, that a concept such as “Walk” cannot be defined in a crisp manner.

Set Approximation _ Let T = (U, A) and let and We can approximate X using only the information contained in B by constructing the B-lower and B-upper approximations of X, denoted and respectively, where

Set Approximation (2) _ B-boundary region of X, consists of those objects that we cannot decisively classify into X in B. _ B-outside region of X, consists of those objects that can be with certainty classified as not belonging to X. _ A set is said to be rough if its boundary region is non-empty, otherwise the set is crisp.

An Example of Set Approximation _ Let W = {x | Walk(x) = yes}. _ The decision class, Walk, is rough since the boundary region is not empty. Age LEMS Walk x 1 yes x no x no x yes x no x yes x no

An Example of Set Approximation (2) yes yes/no no {{x1},{x6}} {{x3,x4}} {{x2}, {x5,x7}} AWAW

U set X U/R R : subset of attributes Lower & Upper Approximations

Lower & Upper Approximations (2) Lower Approximation: Upper Approximation:

Lower & Upper Approximations (3) X1 = {u | Flu(u) = yes} = {u2, u3, u6, u7} RX1 = {u2, u3} = {u2, u3, u6, u7, u8, u5} X2 = {u | Flu(u) = no} = {u1, u4, u5, u8} RX2 = {u1, u4} = {u1, u4, u5, u8, u7, u6} The indiscernibility classes defined by R = {Headache, Temp.} are {u1}, {u2}, {u3}, {u4}, {u5, u7}, {u6, u8}.

Lower & Upper Approximations (4) R = {Headache, Temp.} U/R = { {u1}, {u2}, {u3}, {u4}, {u5, u7}, {u6, u8}} X1 = {u | Flu(u) = yes} = {u2,u3,u6,u7} X2 = {u | Flu(u) = no} = {u1,u4,u5,u8} RX1 = {u2, u3} = {u2, u3, u6, u7, u8, u5} RX2 = {u1, u4} = {u1, u4, u5, u8, u7, u6} u1 u4 u3 X1 X2 u5u7 u2 u6u8

Properties of Approximations impliesand

Properties of Approximations (2) where -X denotes U - X.

Four Basic Classes of Rough Sets _ X is roughly B-definable, iff and _ X is internally B-undefinable, iff and _ X is externally B-undefinable, iff and _ X is totally B-undefinable, iff and

Accuracy of Approximation where |X| denotes the cardinality of Obviously If X is crisp with respect to B. If X is rough with respect to B.

Issues in the Decision Table _ The same or indiscernible objects may be represented several times. _ Some of the attributes may be superfluous (redundant). That is, their removal cannot worsen the classification.

Reducts _ Keep only those attributes that preserve the indiscernibility relation and, consequently, set approximation. _ There are usually several such subsets of attributes and those which are minimal are called reducts.

Dispensable & Indispensable Attributes Let Attribute c is dispensable in T if, otherwise attribute c is indispensable in T. The C-positive region of D :

Independent _ T = (U, C, D) is independent if all are indispensable in T.

Reduct & Core _ The set of attributes is called a reduct of C, if T’ = (U, R, D) is independent and _ The set of all the condition attributes indispensable in T is denoted by CORE(C). where RED(C) is the set of all reducts of C.

An Example of Reducts & Core Reduct1 = {Muscle-pain,Temp.} Reduct2 = {Headache, Temp.} CORE = {Headache,Temp} {MusclePain, Temp} = {Temp}

Discernibility Matrix (relative to positive region) _ Let T = (U, C, D) be a decision table, with By a discernibility matrix of T, denoted M(T), we will mean matrix defined as: for i, j = 1,2,…,n such that or belongs to the C-positive region of D. _ is the set of all the condition attributes that classify objects ui and uj into different classes.

Discernibility Matrix (relative to positive region) (2) _ The equation is similar but conjunction is taken over all non-empty entries of M(T) corresponding to the indices i, j such that or belongs to the C-positive region of D. _ denotes that this case does not need to be considered. Hence it is interpreted as logic truth. _ All disjuncts of minimal disjunctive form of this function define the reducts of T (relative to the positive region).

Discernibility Function (relative to objects) _ For any where (1) is the disjunction of all variables a such that if (2) if (3) if Each logical product in the minimal disjunctive normal form (DNF) defines a reduct of instance

Examples of Discernibility Matrix No a b c d u1 a0 b1 c1 y u2 a1 b1 c0 n u3 a0 b2 c1 n u4 a1 b1 c1 y C = {a, b, c} D = {d} In order to discern equivalence classes of the decision attribute d, to preserve conditions described by the discernibility matrix for this table u1 u2 u3 u2 u3 u4 a,c b c a,b Reduct = {b, c}

Examples of Discernibility Matrix (2) u1 u2 u3 u4 u5 u6 u2 u3 u4 u5 u6 u7 b,c,d b,c b b,d c,d a,b,c,d a,b,c a,b,c,d a,b,c,d a,b c,d c,d Core = {b} Reduct1 = {b,c} Reduct2 = {b,d}

Rough Membership _ The rough membership function quantifies the degree of relative overlap between the set X and the equivalence class to which x belongs. _ The rough membership function can be interpreted as a frequency-based estimate of where u is the equivalence class of IND(B).

Rough Membership (2) _ The formulae for the lower and upper approximations can be generalized to some arbitrary level of precision by means of the rough membership function _ Note: the lower and upper approximations as originally formulated are obtained as a special case with

Dependency of Attributes _ Discovering dependencies between attributes is an important issue in KDD. _ Set of attribute D depends totally on a set of attributes C, denoted if all values of attributes from D are uniquely determined by values of attributes from C.

Dependency of Attributes (2) _ Let D and C be subsets of A. We will say that D depends on C in a degree k denoted by if where called C-positive region of D.

Dependency of Attributes (3) _ Obviously _ If k = 1 we say that D depends totally on C. _ If k < 1 we say that D depends partially (in a degree k) on C.

A Rough Set Based KDD Process _ Discretization based on RS and Boolean Reasoning (RSBR). _ Attribute selection based RS with Heuristics (RSH). _ Rule discovery by GDT-RS.

What Are Issues of Real World ? _ Very large data sets _ Mixed types of data (continuous valued, symbolic data) _ Uncertainty (noisy data) _ Incompleteness (missing, incomplete data) _ Data change

Probability Logic Set Soft Techniques for KDD

Deduction InductionAbduction GDT GrC RS&ILP RS TM A Hybrid Model

GDT : Generalization Distribution Table RS : Rough Sets TM: Transition Matrix ILP : Inductive Logic Programming GrC : Granular Computing

A Rough Set Based KDD Process _ Discretization based on RS and Boolean Reasoning (RSBR). _ Attribute selection based RS with Heuristics (RSH). _ Rule discovery by GDT-RS.

Observations _ A real world data set always contains mixed types of data such as continuous valued, symbolic data, etc. _ When it comes to analyze attributes with real values, they must undergo a process called discretization, which divides the attribute’s value into intervals. _ There is a lack of the unified approach to discretization problems so far, and the choice of method depends heavily on data considered.

Discretization based on RSBR _ In the discretization of a decision table T = where is an interval of real values, we search for a partition of for any _ Any partition of is defined by a sequence of the so-called cuts from _ Any family of partitions can be identified with a set of cuts.

Discretization Based on RSBR (2) In the discretization process, we search for a set of cuts satisfying some natural conditions. U a b d x x x x x x x U a b d x x x x x x x PP P = {(a, 0.9), (a, 1.5), (b, 0.75), (b, 1.5)}

A Geometrical Representation of Data a b x1 x2 x3 x4 x7 x5 x6

A Geometrical Representation of Data and Cuts a b x1 x2 x3 x4 x5 x6 x7

Discretization Based on RSBR (3) _ The sets of possible values of a and b are defined by _ The sets of values of a and b on objects from U are given by a(U) = {0.8, 1, 1.3, 1.4, 1.6}; b(U) = {0.5, 1, 2, 3}.

Discretization Based on RSBR (4) _ The discretization process returns a partition of the value sets of condition attributes into intervals.

A Discretization Process _ Step 1: define a set of Boolean variables, where corresponds to the interval [0.8, 1) of a corresponds to the interval [1, 1.3) of a corresponds to the interval [1.3, 1.4) of a corresponds to the interval [1.4, 1.6) of a corresponds to the interval [0.5, 1) of b corresponds to the interval [1, 2) of b corresponds to the interval [2, 3) of b

The Set of Cuts on Attribute a a

A Discretization Process (2) _ Step 2: create a new decision table by using the set of Boolean variables defined in Step 1. Let be a decision table, be a propositional variable corresponding to the interval for any and

A Sample Defined in Step 2 U* (x1,x2) (x1,x3) (x1,x5) (x4,x2) (x4,x3) (x4,x5) (x6,x2) (x6,x3) (x6,x5) (x7,x2) (x7,x3) (x7,x5)

The Discernibility Formula _ The discernibility formula means that in order to discern object x1 and x2, at least one of the following cuts must be set, a cut between a(0.8) and a(1) a cut between b(0.5) and b(1) a cut between b(1) and b(2).

The Discernibility Formulae for All Different Pairs

The Discernibility Formulae for All Different Pairs (2)

A Discretization Process (3) _ Step 3: find the minimal subset of p that discerns all objects in different decision classes. The discernibility boolean propositional formula is defined as follows,

The Discernibility Formula in CNF Form

The Discernibility Formula in DNF Form _ We obtain four prime implicants, is the optimal result, because it is the minimal subset of P.

The Minimal Set Cuts for the Sample DB a b x1 x2 x3 x4 x5 x6 x7

A Result U a b d x x x x x x x U a b d x x x x x x x PP P = {(a, 1.2), (a, 1.5), (b, 1.5)}

A Rough Set Based KDD Process _ Discretization based on RS and Boolean Reasoning (RSBR). _ Attribute selection based RS with Heuristics (RSH). _ Rule discovery by GDT-RS.

Observations _ A database always contains a lot of attributes that are redundant and not necessary for rule discovery. _ If these redundant attributes are not removed, not only the time complexity of rule discovery increases, but also the quality of the discovered rules may be significantly depleted.

The Goal of Attribute Selection Finding an optimal subset of attributes in a database according to some criterion, so that a classifier with the highest possible accuracy can be induced by learning algorithm using information about data available only from the subset of attributes.

Attribute Selection

The Filter Approach _ Preprocessing _ The main strategies of attribute selection: –The minimal subset of attributes –Selection of the attributes with a higher rank _ Advantage –Fast _ Disadvantage –Ignoring the performance effects of the induction algorithm

The Wrapper Approach _ Using the induction algorithm as a part of the search evaluation function _ Possible attribute subsets (N-number of attributes) _ The main search methods: –Exhaustive/Complete search –Heuristic search –Non-deterministic search _ Advantage –Taking into account the performance of the induction algorithm _ Disadvantage –The time complexity is high

Basic Ideas: Attribute Selection using RSH _ Take the attributes in CORE as the initial subset. _ Select one attribute each time using the rule evaluation criterion in our rule discovery system, GDT-RS. _ Stop when the subset of selected attributes is a reduct.

Why Heuristics ? _ The number of possible reducts can be where N is the number of attributes. Selecting the optimal reduct from all of possible reducts is time-complex and heuristics must be used.

The Rule Selection Criteria in GDT-RS _ Selecting the rules that cover as many instances as possible. _ Selecting the rules that contain as little attributes as possible, if they cover the same number of instances. _ Selecting the rules with larger strengths, if they have same number of condition attributes and cover the same number of instances.

Attribute Evaluation Criteria _ Selecting the attributes that cause the number of consistent instances to increase faster –To obtain the subset of attributes as small as possible _ Selecting an attribute that has smaller number of different values –To guarantee that the number of instances covered by rules is as large as possible.

Main Features of RSH _ It can select a better subset of attributes quickly and effectively from a large DB. _ The selected attributes do not damage the performance of induction so much.

An Example of Attribute Selection Condition Attributes: a: Va = {1, 2} b: Vb = {0, 1, 2} c: Vc = {0, 1, 2} d: Vd = {0, 1} Decision Attribute: e: Ve = {0, 1, 2}

Searching for CORE Removing attribute a Removing attribute a does not cause inconsistency. Hence, a is not used as CORE.

Searching for CORE (2) Removing attribute b Removing attribute b cause inconsistency. Hence, b is used as CORE.

Searching for CORE (3) Removing attribute c does not cause inconsistency. Hence, c is not used as CORE.

Searching for CORE (4) Removing attribute d does not cause inconsistency. Hence, d is not used as CORE.

Searching for CORE (5) CORE(C)={b} Initial subset R = {b} Attribute b is the unique indispensable attribute.

R={b} The instances containing b0 will not be considered. TT’

Attribute Evaluation Criteria _ Selecting the attributes that cause the number of consistent instances to increase faster –To obtain the subset of attributes as small as possible _ Selecting the attribute that has smaller number of different values –To guarantee that the number of instances covered by a rule is as large as possible.

Selecting Attribute from {a,c,d} 1. Selecting {a} R = {a,b} u3,u5,u6 u4 u7 U/{e} u3 u4 u7 U/{a,b} u5 u6

Selecting Attribute from {a,c,d} (2) 2. Selecting {c} R = {b,c} u3,u5,u6 u4 u7 U/{e}

Selecting Attribute from {a,c,d} (3) 3. Selecting {d} R = {b,d} u3,u5,u6 u4 u7 U/{e}

Selecting Attribute from {a,c,d} (4) 3. Selecting {d} R = {b,d} Result: Subset of attributes = {b, d} u3,u5,u6 u4 u7 U/{e} u3, u4 u7 U/{b,d} u5,u6

A Heuristic Algorithm for Attribute Selection _ Let R be a set of the selected attributes, P be the set of unselected condition attributes, U be the set of all instances, X be the set of contradictory instances, and EXPECT be the threshold of accuracy. _ In the initial state, R = CORE(C), k = 0.

A Heuristic Algorithm for Attribute Selection (2) _ Step 1. If k >= EXPECT, finish, otherwise calculate the dependency degree, k, _ Step 2. For each p in P, calculate where max_size denotes the cardinality of the maximal subset.

A Heuristic Algorithm for Attribute Selection (3) _ Step 3. Choose the best attribute p with the largest and let _ Step 4. Remove all consistent instances u in from X. _ Step 5. Go back to Step 1.

Experimental Results