Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.

Similar presentations


Presentation on theme: "Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and."— Presentation transcript:

1 Decision Tree Learning CMPT 463

2 Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and project.zip Final Exam Review o Monday, May 9

3 Learning from Examples An agent is learning if it improves its performance on future tasks after making observations about the world. One class of learning problem: o from a collection of input-output pairs, learn a function that predicts the output for new inputs. o e.g., weather forecast, Google image

4 Why learning? The designer cannot anticipate all changes o A program designed to predict tomorrow’s stock market prices must learn to adapt when conditions change. Programmers sometimes have no idea how to program a solution o recognizing faces

5 Types of Learning Supervised learning o example input-output pairs and learns a function o e.g., spam detector Unsupervised learning o correct answers not given o e.g., clustering Reinforcement learning o rewards or punishments o taxi agent: lack of a tip

6

7 Supervised Learning Learning a function/rule from specific input- output pairs is also called inductive learning. Given a training set of N example pairs: o (x1,y1), (x2,y2),..., (xN, yN) o target unknown function y = f(x) Problem: find a hypothesis h such that h ≈ f h is generalized well if it correctly predicts the value of y for novel examples (test set).

8 Supervised Learning When the output y is one of the finite set of values (sunny, cloudy, rainy), the learning problem is called classification. o Boolean or binary classification o e.g., spam detector, male/female face When y is a number (tomorrow’s temperature), the problem is called regression.

9 Inductive learning method The points are in the (x,y) plane, where y = f(x). We approximate f with h selected from a hypothesis space H. Construct/adjust h to agree with f on training set

10 Inductive learning method Construct/adjust h to agree with f on training set E.g., linear fitting:

11 Inductive learning method Construct/adjust h to agree with f on training set E.g., curve fitting:

12 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

13 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: How to choose from among multiple consistent hypotheses?

14 Inductive learning method Ockham’s razor: prefer the simplest hypothesis consistent with data (14 th -century English philosopher William of Ockham)

15 Learning decision trees One of the simplest and yet most successful forms of machine learning. A decision tree represents a function that takes as input a vector of attribute values and returns a “decision” – a single output. o discrete input, Boolean classification

16 Learning decision trees Problem: decide whether to wait for a table at a restaurant, based on the following attributes: 1.Alternate : is there an alternative restaurant nearby? 2.Bar : is there a comfortable bar area to wait in? 3.Fri/Sat : is today Friday or Saturday? 4.Hungry : are we hungry? 5.Patrons : number of people in the restaurant (None, Some, Full) 6.Price : price range ($, $$, $$$) 7.Raining : is it raining outside? 8.Reservation : have we made a reservation? 9.Type : kind of restaurant (French, Italian, Thai, Burger) 10. WaitEstimate : estimated waiting time (0-10, 10-30, 30-60, >60)

17 Attribute-based representations Examples described by attribute values A training set of 12 examples E.g., situations where I will/won't wait for a table: Classification of examples is positive (T) or negative (F)

18 Decision trees One possible representation for hypotheses (no Price and Type) “true” tree for deciding whether to wait:

19 Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row → path to leaf: Trivially, there is a consistent decision tree for any training set with one path to leaf for each example.

20 Goal: to find the most compact decision trees

21 21 Constructing the Decision Tree Goal: Find the smallest decision tree consistent with the examples Divide-and-conquer: o Test the most important attribute first, divides the problem up into smaller subproblems that can be solved recursively. o “Most important”: attribute that best splits examples

22 Attribute-based representations

23

24

25 Constructing the Decision Tree Form tree with root = best attribute For each value v i (or range) of best attribute Selects those examples with best=v i Construct subtree i by recursively calling decision tree with subset of examples, all attributes except best Add a branch to tree with label=v i and subtree=subtree i

26 Decision tree learning Aim: find a small tree consistent with the training examples Idea: (recursively) choose "most significant" attribute as root of (sub)tree

27 Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Which is a better choice?

28 28 Choosing the Best Attribute: Binary Classification Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction Information theory (Shannon and Weaver 49) o Entropy: a measure of uncertainty of a random variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit) o Information gain: the expected reduction in entropy caused by partitioning the examples according to this attribute

29 29 Formula for Entropy Examples: Suppose we have a collection of 10 examples, 5 positive, 5 negative: H(1/2,1/2) = -1/2log 2 1/2 -1/2log 2 1/2 = 1 bit Suppose we have a collection of 100 examples, 1 positive and 99 negative: H(1/100,99/100) = -.01log 2.01 -.99log 2.99 =.08 bits

30 Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Which is a better choice?

31 Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement

32 Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement Information Gain (IG) or reduction in entropy from the attribute test: Choose the attribute with the largest IG

33 Information gain For the training set, p = n = 6, H(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root

34 Example contd. Decision tree learned from the 12 examples: Substantially simpler than “true”

35 DayOutlookTemperatureHumidityWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolNormalWeakYes D6RainCoolNormalStrongNo D7OvercastCoolNormalStrongYes D8SunnyMildHighWeakNo D9SunnyCoolNormalWeakYes D10RainMildNormalWeakYes D11SunnyMildNormalStrongYes D12OvercastMildHighStrongYes D13OvercastHotNormalWeakYes D14RainMildHighStrongNo


Download ppt "Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and."

Similar presentations


Ads by Google