Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.

Similar presentations


Presentation on theme: "Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees."— Presentation transcript:

1 Basic Data Mining Techniques Chapter 3-A

2 3.1 Decision Trees

3 An Algorithm for Building Decision Trees 1. Let T be the set of training instances. 2. Choose an attribute that best differentiates the instances in T. 3. Create a tree node whose value is the chosen attribute. -Create child links from this node where each link represents a unique value for the chosen attribute. -Use the child link values to further subdivide the instances into subclasses. 4. For each subclass created in step 3: -If the instances in the subclass satisfy predefined criteria or if the set of remaining attribute choices for this path is null, specify the classification for new instances following this decision path. -If the subclass does not satisfy the criteria and there is at least one attribute to further subdivide the path of the tree, let T be the current set of subclass instances and return to step 2.

4 Main Goal:  Minimize the number of tree levels and tree nodes  Maximize data generalization C4.5 selects the attributes that splits the data so as to show the largest amount of gain in information

5

6 Figure 3.1 A partial decision tree with root node = income range Candidate for top level node; Set Accuracy: 11/15 Goodness Score: 11/15 ÷ 4

7 Figure 3.2 A partial decision tree with root node = credit card insurance Candidate for top level node;

8 Figure 3.3 A partial decision tree with root node = age Candidate for top level node;

9 Decision Trees for the Credit Card Promotion Database

10 Figure 3.4 A three-node decision tree for the credit card database

11 Figure 3.5 A two-node decision treee for the credit card database

12

13 Decision Tree Rules

14  Simplifying Rule by Removing Attribute “Age” IF Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No Rule accuracy = 3 / 4 Rule accuracy = 5 / 6 IF Age <=43 & Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No A Rule for the Tree in Figure 3.4

15 Other Methods for Building Decision Trees CART CHAID

16 Advantages of Decision Trees Easy to understand. Map nicely to a set of production rules. Applied to real problems. Make no prior assumptions about the data. Able to process both numerical and categorical data.

17 Disadvantages of Decision Trees Output attribute must be categorical. Limited to one output attribute. Decision tree algorithms are unstable. Trees created from numeric datasets can be complex.


Download ppt "Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees."

Similar presentations


Ads by Google