Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)

Similar presentations


Presentation on theme: "Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)"— Presentation transcript:

1 Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)

2 Example ageincomeveteran college_ educated support_ hillary youthlowno youthlowyesno middle_agedlowno yes seniorlowno yes seniormediumnoyesno seniormediumyesnoyes middle_agedmediumnoyesno youthlownoyesno youthlownoyesno seniorhighnoyes youthlowno middle_agedhighnoyesno middle_agedmediumyes seniorhighnoyesno

3 Example ageincomeveteran college_ educated support_ hillary youthlowno youthlowyesno middle_agedlowno yes seniorlowno yes seniormediumnoyesno seniormediumyesnoyes middle_agedmediumnoyesno youthlownoyesno youthlownoyesno seniorhighnoyes youthlowno middle_agedhighnoyesno middle_agedmediumyes seniorhighnoyesno Class- labels

4 Example ageincomeveteran college_ educated support_ hillary middle_agedmediumno ????? no age youth middle_aged college_ educated income yes lowmediumhigh no senior noyes noyes Inner nodes are ATTRIBUTES Branches are attribute VALUES Leaves are class-label VALUES

5 Example ageincomeveteran college_ educated support_ hillary middle_agedmediumno yes (predicted) no age youth middle_aged college_ educated income yes lowmediumhigh no senior noyes noyes Inner nodes are ATTRIBUTES Branches are attribute VALUES Leaves are class-label VALUES ANSWER

6 Example no age youth middle_aged college_ educated income yes lowmediumhigh no senior noyes noyes Induced Rules: The youth do not support Hillary. All who are middle- aged and low-income support Hillary. Seniors support Hillary. Etc…A rule is generated for each leaf.

7 Example Induced Rules: The youth do not support Hillary. All who are middle-aged and low-income support Hillary. Seniors support Hillary. Nested IF-THEN: IF age == youth THEN support_hillary = no ELSE IF age == middle_aged & income == low THEN support_hillary = yes ELSE IF age = senior THEN support_hillary = yes

8 How do you construct one? 1.Select an attribute to place at the root node and make one branch for each possible value. 14 tuples; Entire Training Set 5 tuples4 tuples5 tuples age youth middle_aged senior

9 How do you construct one? 2.For each branch, recursively process the remaining training examples by choosing an attribute to split them on. The chosen attribute cannot be one used in the ancestor nodes. If at anytime all the training examples have the same class, stop processing that part of the tree.

10 How do you construct one? age=youthIncomeveteran college_ educated support_ hillary youthlowno youthlowyesno youthlownoyesno youthlownoyesno youthlowno age youth middle_aged senior

11 How do you construct one? age= middle_agedincomeveteran college_ educated supports_ hillary middle_agedlowno yes middle_agedmediumnoyesno middle_agedhighnoyesno middle_agedmediumyes no veteran age youth middle_aged senior yesno

12 veteran age youth middle_aged senior yes no age= middle_agedincomeveteran college_ educated supports_ hillary middle_agedlowno yes middle_agedmediumnoyesno middle_agedhighnoyesno middle_agedmediumyes

13 no veteran age youth middle_aged senior yes no age= middle_agedincomeveteran college_ educated supports_ hillary middle_agedlowno yes middle_agedmediumnoyesno middle_agedhighnoyesno middle_agedmediumyes college_ educated yes no

14 age= middle_agedincomeveteran=no college_ educated supports_ hillary middle_agedlowno yes middle_agedmediumnoyesno middle_agedhighnoyesno veteran age youth middle_aged yes no college_ educated yes no senior no

15 age= middle_agedincomeveteran=no college_ educated supports_ hillary middle_agedlowno yes middle_agedmediumnoyesno middle_agedhighnoyesno veteran age youth middle_aged yes no college_ educated yes no senior noyes

16 no veteran age youth middle_aged yes no college_ educated yes no senior noyes age=seniorincomeveteran college_ educated supports_ hillary seniorlowno yes seniormediumnoyesno seniormediumyesnoyes seniorhighnoyes seniorhighnoyesno

17 veteran age youth middle_aged yes no college_ educated yes no senior noyes age=seniorincomeveteran college_ educated supports_ hillary seniorlowno yes seniormediumnoyesno seniormediumyesnoyes seniorhighnoyes seniorhighnoyesno college_ educated yesno

18 veteran age youth middle_aged yes no college_ educated yes no senior noyes college_ educated yesno age=seniorincomeveteran college_ educated=yes supports_ hillary seniormediumnoyesno seniorhighnoyes seniorhighnoyesno income low medium high

19 no veteran age youth middle_aged yes no college_ educated yes no senior noyes college_ educated yesno age=seniorincomeveteran college_ educated=yes supports_ hillary seniormediumnoyesno seniorhighnoyes seniorhighnoyesno income low medium high No low-income college- educated seniors…

20 no veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yesno age=seniorincomeveteran college_ educated=yes supports_ hillary seniormediumnoyesno seniorhighnoyes seniorhighnoyesno income low medium high No low-income college- educated seniors… no “Majority Vote”

21 no veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yesno age=senior income= mediumveteran college_ educated=yes supports_ hillary seniormediumnoyesno income low medium high no

22 veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yesno age=senior income= mediumveteran college_ educated=yes supports_ hillary seniormediumnoyesno income low medium high no

23 veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yesno income low medium high no age=seniorincome=highveteran college_ educated=yes supports_ hillary seniorhighnoyes seniorhighnoyesno

24 veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yes no income low medium high no age=seniorincome=highveteran college_ educated=yes supports_ hillary seniorhighnoyes seniorhighnoyesno veteran yes no

25 veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yes no income low medium high no age=seniorincome=highveteran college_ educated=yes supports_ hillary seniorhighnoyes seniorhighnoyesno veteran yes no “Majority Vote” split… No Veterans ???

26 no veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yes no income low medium high no veteran yes no ??? age=seniorincomeveteran college_ educated=no supports_ hillary seniorlowno yes seniormediumyesnoyes

27 no veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yesno income low medium high no veteran yes no ??? age=seniorincomeveteran college_ educated=no supports_ hillary seniorlowno yes seniormediumyesnoyes

28 no veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yesno income low medium high no veteran yesno ??? yes

29 Cost to grow? n = number of Attributes D = Training Set of tuples O( n * |D| * log|D| )

30 Cost to grow? n = number of Attributes D = Training Set of tuples O( n * |D| * log|D| ) Amount of work at each tree level Max height of the tree

31 How do we minimize the cost? Optimal decision trees are NP-complete (shown by Hyafil and Rivest)

32 How do we minimize the cost? Optimal decision trees are NP-complete (shown by Hyafil and Rivest) Need Heuristic to pick “best” attribute to split on.

33 no veteran age youth middle_aged yes no college_ educated yes senior noyes college_ educated yesno income low medium high no veteran yesno ??? yes

34 How do we minimize the cost? Optimal decision trees are NP-complete (shown by Hyafil and Rivest) Most common approach is “greedy” Need Heuristic to pick “best” attribute to split on. “Best” attribute results in “purest” split Pure = all tuples belong to the same class

35 ….A good split increase purity of all children nodes

36 Three Heuristics 1. Information gain 2. Gain Ratio 3. Gini Index

37 Information Gain Ross Quinlan’s ID3 (iterative dichotomizer 3rd) uses info gain as its heuristic. Heuristic based on Claude Shannon’s information theory.

38 HIGH ENTROPY LOW ENTROPY

39 Calculate Entropy for D D = Training SetD=14 m = num. of classesm=2 i = 1,…,m C i = distinct classC 1 = yes, C 2 = no C i,D = tuples in D of class C i C 1,D = yes, C 2,D = no p i = prob. a random tuple in p 1 = 5/14, p 2 = 9/14 D belongs to class C i =|C i,D |/|D|

40 = -[ 5/14 * log(5/14) + 9/14 * log(9/14)] = -[.3571 * -1.4854 +.6428 * -.6374] = -[ -.5304 + -.4097] =.9400 bits Extremes: = -[ 7/14 * log(7/14) + 7/14 * log(7/14)] = 1 bit = -[ 1/14 * log(1/14) + 13/14 * log(13/14)] =.3712 bits = -[ 0/14 * log(0/14) + 14/14 * log(14/14)] = 0 bits

41 Entropy for D split by A A = attribute to split D onE.g. age v = distinct values of AE.g. youth, middle_aged, senior j = 1,…,v D j = subset of D where A=jE.g. All tuples where age=youth

42 Entropy age (D)= 5/14 * -[0/5*log(0/5) + 5/5*log(5/5)] + 4/14 * -[2/4*log(2/4) + 2/4*log(2/4)] + 5/14 * -[3/5*log(3/5) + 2/5*log(2/5)] =.6324 bits Entropy income (D)= 7/14 * -[2/7*log(2/7) + 5/7*log(5/7)] + 4/14 * -[2/4*log(2/4) + 2/4*log(2/4)] + 3/14 * -[1/3*log(1/3) + 2/3*log(2/3)] =.9140 bits Entropy veteran (D)= 3/14 * -[2/3*log(2/3) + 1/3*log(1/3)] + 11/14 * -[3/11*log(3/11) + 8/11*log(8/11)] =.8609 bits Entropy college_educated (D)= 8/14 * -[6/8*log(6/8) + 2/8*log(2/8)] + 6/14 * -[3/6*log(3/6) + 3/6*log(3/6)] =.8921 bits

43 Information Gain Gain(A) = Entropy(D) - Entropy A (D) Set of tuples D Subset of D split on attribute A Choose the A with the highest Gain. decreases Entropy

44 Gain(A) = Entropy(D) - Entropy A (D) Gain(age) = Entropy(D) - Entropy age (D) =.9400 -.6324 =.3076 bits Gain(income) =.0259 bits Gain(veteran) =.0790 bits Gain(college_educated) =.0479 bits

45 Entropy with values >2 Entropy = -[7/13*log(7/13) + 2/13*log(2/13) + 2/13*log(2/13) + 2/13*log(2/13)] = 1.7272 bits Entropy = -[5/13*log(5/13) + 1/13*log(1/13) + 6/13*log(6/13) + 1/13*log(1/13)] = 1.6143 bits

46 ssageincomeveteran college_ educated support_ hillary 215-98-9343youthlowno 238-34-3493youthlowyesno 234-28-2434middle_agedlowno yes 243-24-2343seniorlowno yes 634-35-2345seniormediumnoyesno 553-32-2323seniormediumyesnoyes 554-23-4324middle_agedmediumnoyesno 523-43-2343youthlownoyesno 553-23-1223youthlownoyesno 344-23-2321seniorhighnoyes 212-23-1232youthlowno 112-12-4521middle_agedhighnoyesno 423-13-3425middle_agedmediumyes 423-53-4817seniorhighnoyesno Added social security number attribute

47 ss no yes no yes no yes no 215-98-9343……..423-53-4817 Will Information Gain split on ss?

48 ss no yes no yes no yes no 215-98-9343……..423-53-4817 Will Information Gain split on ss? Yes, because Entropy ss (D) = 0. * Entropy ss (D) = 1/14 * -14[1/1*log(1/1) + 0/1*log(0/1)]

49 Gain ratio C4.5, a successor of ID3, uses this heuristic. Attempts to overcome Information Gain’s bias in favor of attributes with large number of values.

50 Gain ratio

51 Gain(ss) =.9400 SplitInfo ss (D) = 3.9068 GainRatio(ss) =.2406 Gain(age) =.3076 SplitInfo age (D) = 1.5849 GainRatio(age) =.1940

52 Gini Index CART uses this heuristic. Binary splits. Not biased toward multi-value attributes like Info Gain. age youth middle_aged senior age senior youth, middle_aged

53 Gini Index For the attribute age the possible subsets are: {youth, middle_aged, senior}, {youth, middle_aged}, {youth, senior}, {middle_aged, senior}, {youth}, {middle_aged}, {senior} and {}. We exclude the powerset and the empty set. So we have to examine 2 v – 2 subsets.

54 Gini Index For the attribute age the possible subsets are: {youth, middle_aged, senior}, {youth, middle_aged}, {youth, senior}, {middle_aged, senior}, {youth}, {middle_aged}, {senior} and {}. We exclude the powerset and the empty set. So we have to examine 2 v – 2 subsets. CALCULATE GINI INDEX ON EACH SUBSET

55 Gini Index

56 Miscellaneous thoughts Widely applicable to data exploration, classification and scoring tasks Generate understandable rules Better for predicting discrete outcomes than continuous (lumpy) Error-prone when # of training examples for a class is small Most business cases trying to predict few broad categories

57


Download ppt "Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)"

Similar presentations


Ads by Google