Download presentation

Presentation is loading. Please wait.

Published byAlicia Craton Modified over 2 years ago

1
Decision Trees References: "Artificial Intelligence: A Modern Approach, 3 rd ed" (Pearson) 18.3-18.4 http://onlamp.com/pub/a/python/2006/02/09/ai_decision_trees.html http://chem- eng.utoronto.ca/~datamining/dmc/decision_tree_overfitting.htm

2
What are they? A "flowchart" of logic Example: – If my health is low: run to cover – Else: if an enemy is nearby: – Shoot it else: – scavenge for treasure

3
Another Example Goal: Decide if we'll wait for a table at a restaurant Factors: – Alternate: Is there another restaurant nearby? – Bar: Does the restaurant have a bar? – Fri / Sat: Is it a Friday or Saturday? – Hungry: Are we hungry? – Patrons: How many people {None, Some, Full} – Price: Price Range {$, $$, $$$} – Raining: Is it raining? – Reservation: Do we have a reservation? – Type: {French, Italian, Thai, Burger} – Wait: {0-10, 10-30, 30-60, >60}

4
Possible decision tree Patrons Wait AlternateHungry N Y None Some Full NY >60 30-60 10-30 0-10 Reservation Fri/SatAlternate Y NoYes No Yes Bar Raining YY Y NoYes No Yes N NoYes Y N Y N NoYes NoYes

5
Analysis Pluses: – Easy to traverse – Naturally expressed as if/else's Negatives: – how do we build an optimal tree?

6
Sample Input #AltBarFriHunPatPrRanResTypeWait?? 1YNNYS$$$NYFr0-10Y 2YNNYF$NNTh30-60N 3NYNNS$NNBu0-10Y 4YNYYF$YNTh10-30Y 5YNYNF$$$NYFr>60N 6NYNYS$$YYIt0-10Y 7NYNNN$YNBu0-10N 8NNNYS$$YYTh0-10Y 9NYYNF$YNBu>60N 10YYYYF$$$NYIt10-30N 11NNNNN$NNTh0-10N 12YYYYF$NNBu30-60Y

7
Sample Input, cont We can also think of these as "training data" – For a decision tree we want to model – In this context, the input: is that of "Experts" exemplifies the thinking you want to encode is raw data we want to mine … Note: – Doesn't contain all possibilities – There might be noise

8
Building a tree So how do we build a decision tree from input? A lot of possible trees: – O(2 n ) – Some are good, some are bad: good == shallowest bad == deepest – Intractable to find the best Using a greedy algorithm, we can find a pretty-good one…

9
ID3 algorithm By Ross Quinlan (RuleQuest Research) Basic idea: – Choose the best attribute, i – Create a tree with n children n is the number of values for attribute i – Divide the training set into n sub-sets Where all items in a subset have the same value for attribute i. If all items in the subset have the same output value, make this a leaf node. If not, recursively create a new sub-tree – Only use those training examples in this subset – Don't consider attribute i any more.

10
"Best" attribute Entropy (in information theory) – A measure of uncertainty. – Gaining info == lowering entropy A fair coin = 1 bit of entropy A loaded coin (always heads) = 0 bits of entropy – No uncertainty A fair roll of a d4 = 2 bits of entropy A fair roll of a d8 = 3 bits of entropy

11
Entropy, cont.

12
Example: – We have a loaded 4-sided dice – We get a {1:10%, 2:5%, 3:25%, 4:60%} Recall: The entropy of a fair d4 is 2.0, so this dice is slightly more predictable.

13
Information Gain The reduction in entropy In the ID3 algorithm, – We want to split the training cases based on attribute i – Where attribute i gives us the most information i.e. lowers entropy the most

14
Information Gain, cont.

15
Original Example

16
Original Example, cont. StepA2: Calculate H(E wait ) – 4 possible values, so we'd end up with 4 branches "0-10": {1, 3, 6, 7, 8, 11}; 4 Yes, 2 No "10-30": {4, 10}; 1 Yes, 1 No "30-60": {2, 12}; 1 Yes, 1 No ">60": {5, 9}; 2 No – Calculate the entropy of this split group

17
Original Example, cont. StepA3: Calculate H(E pat ) – 3 possible values, so we'd end up with 3 branches "Some": {1,3,6,8}; 4 Yes "Full": {2,4,5,9,10,12}; 2 Yes, 4 No "None": {7,11}; 2 No – Calculate the entropy of this split group So…which is better: splitting on wait, or pat?

18
Original Example, cont. Pat is much better (0.541 gain vs. 0.21 gain) Here is the tree so far: Now we need a subtree to handle the case where Patrons==Full – Note: The training set is smaller now (6 vs. 12) N Y Patrons Some Full None {1,3,6,8} {7,11} {2,4,5,9,10,12}

19
Original Example, cont. Look at two alternatives: Alt & Type Calculate entropy of remaining group: – We actually already calculated it (H("Full") in StepA3) – The value becomes H(E) for this recursive application of ID3. – H(E)≈0.918

20
Original Example, cont. Calculate entropy if we split on Alt – Two possible values: "Yes" and "No" "Yes“ (Alt): {2,4,5,10,12}; 2 Yes, 3 No (Result) "No“ (Alt): {9}; 1 No (Result)

21
Original Example, cont. Calculate entropy if we split on Type – 4 possible values: "French", "Thai", "Burger", and "Italian" "French": {5}; 1 No "Thai": {2,4}; 1 Yes, 1 No "Burger": {9,12}; 1 Yes, 1 No "Italian": {10}; 1 No Which is better: alt or type?

22
Original Example, cont. Type is better (0.251 gain vs. 0.109 gain) – Hungry, Price, Reservation, Est would give you same gain. Here is the tree so far: Recursively make two more sub-trees… Type N Y Patrons Some Full None {1,3,6,8} {7,11} {2,4,5,9,10,12} French Thai Italian Burger N N {5} {10} {2,4} {9,12}

23
Original Example, cont. Here's one possibility (skipping the details): N N Fri Type N Y Patrons Some Full None {1,3,6,8} {7,11} {2,4,5,9,10,12} French Thai Italian Burger {5} {10} {2,4} Alt {9,12} Yes No Yes No {4}{2}{12} {9} N N Y Y

24
Using a decision tree This algorithm will perfectly match all training cases. The hope is that this will generalize to novel cases. Let's take a new case (not found in training) – Alt="No", Bar="Yes", Fri="No", Pat="Full" – Hungry="Yes", Price="$$", Rain=Yes – Reservation="Yes", Type="Italian", Est="30-60" Will we wait?

25
N N Fri Type N Y Original Example, cont. Here's the decision process: Patrons Some Full None {1,3,6,8} {7,11} {2,4,5,9,10,12} French Thai Italian Burger {5} {10} {2,4} Alt {9,12} Yes No Yes No {4}{2}{12} {9} N N Y Y Alt="No" Bar="Yes" Fri="No" Pat="Full" Hungry="Yes" Price="$$" Rain=Yes Reservation="Yes" Type="Italian" Est="30-60" So…No, we won't wait.

26
Pruning Sometimes an exact fit is not necessary – The tree is too big (deep) – The tree isn't generalizing well to new cases (overfitting) – We don't have a lot of training cases: We would get close to the same results removing the attr node, and labeling it as a leaf (r1) Attr r1r2r1 {47} {98} {11, 41} v1 v2v3

27
Chi-Squared Test The chi-squared test can be used to determine if a decision node is statistically significant. Example1: Is there a strong significance between hair color and eye color? RAW DATA Hair Color LightDark Brown3212 Eye ColorGreen/Blue1422 Other69

28
Chi-Squared Test Example2: Is there a strong significance between console preference and passing etgg1803? RAW DATA Console Preference PS3PCXBox360WiiNone Pass5126415 Pass ETGG1803 ? Fail42542

29
Chi-Squared Test Steps: 1) Calculate row, column, and overall totals Hair Color LightDark Black3212 Eye ColorGreen/Blue1422 Other69 Hair Color LightDark Black321244 Eye ColorGreen/Blu e 142236 Other6915 524395

30
Chi-Squared Test 2) Calculate expected values of each cell – RowTotal * ColTotal / OverallTotal EXPECTED Hair Color LightDark Black24.0819.92 Eye ColorGreen/Blue19.716.3 Other8.216.8 Hair Color LightDark Black321244 Eye ColorGreen/Blu e 142236 Other6915 524395 52*44/95 36*43/95

31
Chi-Squared Test CHI-SQUARED Hair Color LightDark Black2.63.15 Eye ColorGreen/Blue1.652.0 Other0.60.71 (32-24.08) 2 /24.08 (22-16.3) 2 /16.3 EXPECTED LightDark Black24.0819.92 Green/Blue19.716.3 Other8.216.8 RAW LightDark Black3212 Green/Blue1422 Other69 χ 2 = 2.6 + 3.15 + 1.65 + 2.0 + 0.6 + 0.71 = 10.71

32
Chi-Squared test 4) Look up your chi-squared value in a table – The degrees-of-freedom (dof) is (numRows- 1)*(numCols-1) – http://home.comcast.net/~sharov/PopEcol/tables/chi sq.html http://home.comcast.net/~sharov/PopEcol/tables/chi sq.html If the table entry (usually for 0.05) is less than your chi- squared, it's statistically significant. – scipy (www.scipy.org)www.scipy.org import scipy.stats if 1.0 – scipy.stats.chi2.cdf(chiSquared, dof) > 0.05 : # Statistically insignificant

33
Chi-squared test We have a χ 2 value of 10.71 (dof = 2) The table entry for 5% probability (0.05) is 5.99 10.71 is bigger than 5.99, so this is statistically significant For the console example – χ 2 = 8.16 – dof = 4 – table entry for 5% probability is 9.49 – So…this isn't a statistically significant connection.

34
Chi-Squared Pruning Bottom-up – Do a depth-first traversal – do your test after calling the function recursively on your children

35
Original Example, cont. Look at "Burger?" first N N Fri Type N Y Patrons Some Full None [4Y,0N] [0Y,2N] [2Y,4N] French Thai Italian Burger [0Y,1N] [1Y,1N] Alt [1Y,1N] Yes No Yes No [1Y,0N] [0Y,1N][1Y,0N] [0Y,1N] N N Y Y [6Y,6N]

36
Original Example, cont. Do a Chi-squared test: Burger Alt [1Y,1N] Yes No [1Y,0N] [0Y,1N] N Y YesNo Yes: Wait10 No: Don't01 YesNo Yes: Wait101 No: Don't011 112 YesNo Yes: Wait0.5 No: Don't0.5 YesNo Yes: Wait0.5 No: Don't0.5 χ 2 = 0.5 + 0.5 + 0.5 + 0.5 = 2.0 dof = (2-1)*(2-1) = 1 Table(0.05, 1) = 3.84 So…prune it! Note: we'll have a similar case with Thai. So…prune it too! Original Totals Expected Chi's

37
Original Example, cont. Here's one possibility: N N Fri Type N Y Patrons Some Full None [4Y,0N] [0Y,2N] [2Y,4N] French Thai Italian Burger [0Y,1N] [1Y,1N] Alt [1Y,1N] Yes No Yes No [1Y,0N] [0Y,1N][1Y,0N] [0Y,1N] N N Y Y [6Y,6N] N N Type N Y Patrons Some Full None [4Y,0N] [0Y,2N] [2Y,4N] French Thai Italian Burger [0Y,1N] [1Y,1N] Y [6Y,6N] Y

38
Original Example, cont. N N Type [2Y,4N] French Thai Italian Burger [0Y,1N] [1Y,1N] Y Y I got a chi-squared value of 1.52, dof=3…prune it!

39
Original Example, cont. Here's one possibility: N N Y Patrons Some Full None [4Y,0N] [0Y,2N] [2Y,4N] [6Y,6N] N N Type N Y Patrons Some Full None [4Y,0N] [0Y,2N] [2Y,4N] French Thai Italian Burger [0Y,1N] [1Y,1N] Y [6Y,6N] Y

40
Pruning Example, cont. N N Y Patrons Some Full None [4Y,0N] [0Y,2N] [2Y,4N] [6Y,6N] I got a chi-squared value of 6.667, dof=2. So…keep it! Note: if the evidence were stronger (more training cases) in the burger, thai branch, we wouldn't have pruned it

41
Questions?

Similar presentations

OK

Inductive learning Simplest form: learn a function from examples

Inductive learning Simplest form: learn a function from examples

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Maths ppt on exponents and powers Ppt on pricing policy template Ppt on modern olympic games Ppt on building information modeling bim Ppt on eye os glasses Download ppt on the rime of ancient mariner part 1 Ppt on boilers operations manual Ppt on south african culture and food Ppt on p&g products promotion Ppt on linear programming in operations research