Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Data Mining: A Closer Look Chapter Data Mining Strategies.
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Lecture 5 (Classification with Decision Trees)
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Tree methods: Dependent variable is categorical
Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.
Classification.
1 An Excel-based Data Mining Tool Chapter The iData Analyzer.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Chapter 6 Decision Trees
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
DATA MINING AND MACHINE LEARNING Addison Euhus and Dan Weinberg.
Enterprise systems infrastructure and architecture DT211 4
Chapter 7 Decision Tree.
Basic Data Mining Techniques
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Inductive learning Simplest form: learn a function from examples
Decision Trees.
Mohammad Ali Keyvanrad
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 Neural Network.
Chapter 9 – Classification and Regression Trees
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Structured Analysis.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 4 An Excel-based Data Mining Tool (iData Analyzer) Jason C. H. Chen, Ph.D. Professor of MIS.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification Today: Basic Problem Decision Trees.
Bootstrapped Optimistic Algorithm for Tree Construction
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining : Basic Data Mining Techniques Database Lab 김성원.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Data Science Algorithms: The Basic Methods
Artificial Intelligence (CS 370D)
An Excel-based Data Mining Tool
MIS2502: Data Analytics Classification using Decision Trees
Data Mining – Chapter 3 Classification
Decision Trees.
©Jiawei Han and Micheline Kamber
Presentation transcript:

Basic Data Mining Techniques Chapter 3-A

3.1 Decision Trees

An Algorithm for Building Decision Trees 1. Let T be the set of training instances. 2. Choose an attribute that best differentiates the instances in T. 3. Create a tree node whose value is the chosen attribute. -Create child links from this node where each link represents a unique value for the chosen attribute. -Use the child link values to further subdivide the instances into subclasses. 4. For each subclass created in step 3: -If the instances in the subclass satisfy predefined criteria or if the set of remaining attribute choices for this path is null, specify the classification for new instances following this decision path. -If the subclass does not satisfy the criteria and there is at least one attribute to further subdivide the path of the tree, let T be the current set of subclass instances and return to step 2.

Main Goal:  Minimize the number of tree levels and tree nodes  Maximize data generalization C4.5 selects the attributes that splits the data so as to show the largest amount of gain in information

Figure 3.1 A partial decision tree with root node = income range Candidate for top level node; Set Accuracy: 11/15 Goodness Score: 11/15 ÷ 4

Figure 3.2 A partial decision tree with root node = credit card insurance Candidate for top level node;

Figure 3.3 A partial decision tree with root node = age Candidate for top level node;

Decision Trees for the Credit Card Promotion Database

Figure 3.4 A three-node decision tree for the credit card database

Figure 3.5 A two-node decision treee for the credit card database

Decision Tree Rules

 Simplifying Rule by Removing Attribute “Age” IF Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No Rule accuracy = 3 / 4 Rule accuracy = 5 / 6 IF Age <=43 & Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No A Rule for the Tree in Figure 3.4

Other Methods for Building Decision Trees CART CHAID

Advantages of Decision Trees Easy to understand. Map nicely to a set of production rules. Applied to real problems. Make no prior assumptions about the data. Able to process both numerical and categorical data.

Disadvantages of Decision Trees Output attribute must be categorical. Limited to one output attribute. Decision tree algorithms are unstable. Trees created from numeric datasets can be complex.