Fuzzy-Rough Feature Significance for Fuzzy Decision Trees Advanced Reasoning Group Department of Computer Science The University of Wales, Aberystwyth.

Slides:



Advertisements
Similar presentations
Learning from Observations Chapter 18 Section 1 – 3.
Advertisements

Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006.
Data Mining Lecture 9.
DECISION TREES. Decision trees  One possible representation for hypotheses.
CHAPTER 9: Decision Trees
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
IT 433 Data Warehousing and Data Mining
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.
Rough Sets Theory Speaker:Kun Hsiang.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
A Study on Feature Selection for Toxicity Prediction*
Decision Tree Algorithm
More on Decision Trees. Numerical attributes Tests in nodes can be of the form x j > constant Divides the space into rectangles.
Induction of Decision Trees
August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Richard Jensen and Qiang Shen Prof Qiang Shen Aberystwyth University, UK Dr. Richard Jensen Aberystwyth University, UK Interval-valued.
Richard Jensen, Chris Cornelis and Qiang Shen Dr. Chris Cornelis Ghent University, Belgium Dr. Richard Jensen Aberystwyth University,
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
Decision Tree Learning
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Mohammad Ali Keyvanrad
Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
1 CO Games Development 2 Week 19 Probability Trees + Decision Trees (Learning Trees) Gareth Bellaby.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University,
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Fuzzy Versus Quantitative Association Rules: A Fair Data-Driven Comparison Shih-Ming Bai and Shyi-Ming Chen Department of Computer Science and Information.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Richard Jensen and Chris Cornelis Chris Cornelis Chris Cornelis Ghent University, Belgium Richard Jensen Richard Jensen Aberystwyth University, UK Fuzzy-Rough.
1 By: Ashmi Banerjee (125186) Suman Datta ( ) CSE- 3rd year.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
DECISION TREES An internal node represents a test on an attribute.
k-Nearest neighbors and decision tree
CO Games Development 2 Week 22 Trees
Decision Trees.
Rule Induction for Classification Using
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Communication and Memory Efficient Parallel Decision Tree Construction
A task of induction to find patterns
Presentation transcript:

Fuzzy-Rough Feature Significance for Fuzzy Decision Trees Advanced Reasoning Group Department of Computer Science The University of Wales, Aberystwyth Richard Jensen Qiang Shen

Outline Utility of decision tree inductionUtility of decision tree induction Importance of attribute selectionImportance of attribute selection Introduction of fuzzy-rough conceptsIntroduction of fuzzy-rough concepts Evaluation of the fuzzy-rough metricEvaluation of the fuzzy-rough metric Results of F-ID3 vs FR-ID3Results of F-ID3 vs FR-ID3 ConclusionsConclusions

Decision Trees Popular classification algorithm in data mining and machine learningPopular classification algorithm in data mining and machine learning Fuzzy decision trees (FDTs) follow similar principles to crisp decision treesFuzzy decision trees (FDTs) follow similar principles to crisp decision trees FDTs allow greater flexibilityFDTs allow greater flexibility Partitioning of the instance space; attributes are selected to derive partitionsPartitioning of the instance space; attributes are selected to derive partitions Hence, attribute selection is an important factor in decision tree qualityHence, attribute selection is an important factor in decision tree quality

Fuzzy Decision Trees Object membershipObject membership Traditionally, node membership of {0,1}Traditionally, node membership of {0,1} Here, membership is any value in the range [0,1]Here, membership is any value in the range [0,1] Calculated from conjunction of membership degrees along path to the nodeCalculated from conjunction of membership degrees along path to the node Fuzzy tests Fuzzy tests Carried out within nodes to determine the membership of feature values to fuzzy setsCarried out within nodes to determine the membership of feature values to fuzzy sets Stopping criteriaStopping criteria Measure of feature significanceMeasure of feature significance

Training set S and (optionally) depth of decision tree l Start to form decision tree from the top level, Do loop until (1)the depth of the tree gets to l or (2)there is no node to expand a) Gauge significance of each attribute of S not already expanded in this branch b) Expand the attribute with the most significance c) Stop expansion of the leaf node of attribute if maximum significance obtained End do loop Decision Tree Algorithm

Feature Significance Previous FDT inducers use fuzzy entropyPrevious FDT inducers use fuzzy entropy Little research in the area of alternativesLittle research in the area of alternatives Fuzzy-rough feature significance has been used previously in feature selection with much successFuzzy-rough feature significance has been used previously in feature selection with much success This can also be used to gauge feature importance within FDT constructionThis can also be used to gauge feature importance within FDT construction The fuzzy-rough measure extends concepts from crisp rough set theoryThe fuzzy-rough measure extends concepts from crisp rough set theory

Crisp Rough Sets [x] B is the set of all points which are indiscernible with point x in terms of feature subset B. UpperApproximation Set X LowerApproximation Equivalence class [x] B

Fuzzy Equivalence Classes Image: Rough Fuzzy Hybridization: A New Trend in Decision Making, S. K. Pal and A. Skowron (eds), Springer-Verlag, Singapore, 1999 Incorporate vaguenessIncorporate vagueness Handle real valued dataHandle real valued data Cope with noisy dataCope with noisy data Crisp equivalence class Fuzzy equivalence class At the centre of Fuzzy-Rough Feature Selection

Fuzzy-Rough Significance Deals with real-valued features via fuzzy setsDeals with real-valued features via fuzzy sets Fuzzy lower approximation:Fuzzy lower approximation: Fuzzy positive region:Fuzzy positive region: Evaluation function:Evaluation function: Feature importance is estimated with thisFeature importance is estimated with this

Evaluation Is the γ metric a useful gauger of feature significance?Is the γ metric a useful gauger of feature significance? γ metric compared with leading feature rankers:γ metric compared with leading feature rankers: Information Gain, Gain Ratio, Chi 2, Relief, OneRInformation Gain, Gain Ratio, Chi 2, Relief, OneR Applied to test data:Applied to test data: 30 random feature values for 400 objects30 random feature values for 400 objects 2 or 3 features used to determine classification2 or 3 features used to determine classification Task: locate those features that affect the decisionTask: locate those features that affect the decision

Evaluation… Results for x*y*z 2 > 0.125Results for x*y*z 2 > Results for (x + y) 3 < 0.125Results for (x + y) 3 < FR, IG and GR perform bestFR, IG and GR perform best FR metric locates the most important featuresFR metric locates the most important features

FDT Experiments Fuzzy ID3 (F-ID3) compared with Fuzzy-Rough ID3 (FR-ID3)Fuzzy ID3 (F-ID3) compared with Fuzzy-Rough ID3 (FR-ID3) Only difference between methods is the choice of feature significance measureOnly difference between methods is the choice of feature significance measure Datasets used taken from the machine learning repositoryDatasets used taken from the machine learning repository Data split into two equal halves: training and testingData split into two equal halves: training and testing Resulting trees converted to equivalent rulesetsResulting trees converted to equivalent rulesets

Results Real-valued dataReal-valued data Average ruleset sizeAverage ruleset size 56.7 for F-ID356.7 for F-ID for FR-ID388.6 for FR-ID3 F-ID3 performs marginally better than FR-ID3F-ID3 performs marginally better than FR-ID3

Results… Crisp dataCrisp data Average ruleset sizeAverage ruleset size 30.2 for F-ID330.2 for F-ID for FR-ID328.8 for FR-ID3 FR-ID3 performs marginally better than F-ID3FR-ID3 performs marginally better than F-ID3

Conclusion Decision trees are a popular means of classificationDecision trees are a popular means of classification The selection of branching attributes is key toThe selection of branching attributes is key to resulting tree quality The use of a fuzzy-rough metric for this purpose looks promisingThe use of a fuzzy-rough metric for this purpose looks promising Future workFuture work Further experimental evaluationFurther experimental evaluation Fuzzy-rough feature reduction pre-processorFuzzy-rough feature reduction pre-processor